Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Oleksandr Andrushchenko


On 08.02.22 15:38, Roger Pau Monné wrote:
> On Tue, Feb 08, 2022 at 11:13:41AM +, Oleksandr Andrushchenko wrote:
>>
>> On 08.02.22 12:50, Roger Pau Monné wrote:
>>> On Tue, Feb 08, 2022 at 07:35:34AM +, Oleksandr Andrushchenko wrote:
 5. You name it
 ==

From all the above I would recommend we go with option 2 which seems to 
 reliably
 solve ABBA and does not bring cons of the other approaches.
>>> 6. per-domain rwlock + per-device vpci lock
>>>
>>> Introduce vpci_header_write_lock(start, {end, size}) helper: return
>>> whether a range requires the per-domain lock in write mode. This will
>>> only return true if the range overlaps with the BAR ROM or the command
>>> register.
>>>
>>> In vpci_{read,write}:
>>>
>>> if ( vpci_header_write_lock(...) )
>>>   /* Gain exclusive access to all of the domain pdevs vpci. */
>>>   write_lock(d->vpci);
>>> else
>>> {
>>>   read_lock(d->vpci);
>>>   spin_lock(vpci->lock);
>>> }
>>> ...
>>>
>>> The vpci assign/deassign functions would need to be modified to write
>>> lock the per-domain rwlock. The MSI-X table MMIO handler will also
>>> need to read lock the per domain vpci lock.
>> Ok, so it seems you are in favor of this implementation and I have
>> no objection as well. The only limitation we should be aware of is
>> that once a path has acquired the read lock it is not possible to do
>> any write path operations in there.
>> vpci_process_pending will acquire write lock though as it can
>> lead to vpci_remove_device on its error path.
>>
>> So, I am going to implement pdev->vpci->lock + d->vpci_lock
> I think it's the less uncertain option.
>
> As said, if you want to investigate whether you can successfully move
> the checking into vpci_process_pending that would also be fine with
> me, but I cannot assert it's going to be successful. OTOH I think the
> per-domain rwlock + per-device spinlock seems quite likely to solve
> our issues.
Ok, then I'll go with per-domain rwlock + per-device spinlock
and write lock in vpci_write for cmd + ROM. Of course other
places such as vpci_remove_device and vpci_process_pending
will use write lock
>
> Thanks, Roger.
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Roger Pau Monné
On Tue, Feb 08, 2022 at 11:13:41AM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 08.02.22 12:50, Roger Pau Monné wrote:
> > On Tue, Feb 08, 2022 at 07:35:34AM +, Oleksandr Andrushchenko wrote:
> >> 5. You name it
> >> ==
> >>
> >>   From all the above I would recommend we go with option 2 which seems to 
> >> reliably
> >> solve ABBA and does not bring cons of the other approaches.
> > 6. per-domain rwlock + per-device vpci lock
> >
> > Introduce vpci_header_write_lock(start, {end, size}) helper: return
> > whether a range requires the per-domain lock in write mode. This will
> > only return true if the range overlaps with the BAR ROM or the command
> > register.
> >
> > In vpci_{read,write}:
> >
> > if ( vpci_header_write_lock(...) )
> >  /* Gain exclusive access to all of the domain pdevs vpci. */
> >  write_lock(d->vpci);
> > else
> > {
> >  read_lock(d->vpci);
> >  spin_lock(vpci->lock);
> > }
> > ...
> >
> > The vpci assign/deassign functions would need to be modified to write
> > lock the per-domain rwlock. The MSI-X table MMIO handler will also
> > need to read lock the per domain vpci lock.
> Ok, so it seems you are in favor of this implementation and I have
> no objection as well. The only limitation we should be aware of is
> that once a path has acquired the read lock it is not possible to do
> any write path operations in there.
> vpci_process_pending will acquire write lock though as it can
> lead to vpci_remove_device on its error path.
> 
> So, I am going to implement pdev->vpci->lock + d->vpci_lock

I think it's the less uncertain option.

As said, if you want to investigate whether you can successfully move
the checking into vpci_process_pending that would also be fine with
me, but I cannot assert it's going to be successful. OTOH I think the
per-domain rwlock + per-device spinlock seems quite likely to solve
our issues.

Thanks, Roger.



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Oleksandr Andrushchenko


On 08.02.22 12:50, Roger Pau Monné wrote:
> On Tue, Feb 08, 2022 at 07:35:34AM +, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 18:44, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 18:37, Jan Beulich wrote:
 On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
> On 07.02.22 18:15, Jan Beulich wrote:
>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 17:26, Jan Beulich wrote:
 1b. Make vpci_write use write lock for writes to command register and 
 BARs
 only; keep using the read lock for all other writes.
>>> I am not quite sure how to do that. Do you mean something like:
>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>      uint32_t data)
>>> [snip]
>>>      list_for_each_entry ( r, >vpci->handlers, node )
>>> {
>>> [snip]
>>>      if ( r->needs_write_lock)
>>>          write_lock(d->vpci_lock)
>>>      else
>>>          read_lock(d->vpci_lock)
>>> 
>>>
>>> And provide rw as an argument to:
>>>
>>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>>    vpci_write_t *write_handler, unsigned int 
>>> offset,
>>>    unsigned int size, void *data, --->>> bool 
>>> write_path <<<-)
>>>
>>> Is this what you mean?
>> This sounds overly complicated. You can derive locally in vpci_write(),
>> from just its "reg" and "size" parameters, whether the lock needs taking
>> in write mode.
> Yes, I started writing a reply with that. So, the summary (ROM
> position depends on header type):
> if ( (reg == PCI_COMMAND) || (reg == ROM) )
> {
>     read PCI_COMMAND and see if memory or IO decoding are enabled.
>     if ( enabled )
>         write_lock(d->vpci_lock)
>     else
>         read_lock(d->vpci_lock)
> }
 Hmm, yes, you can actually get away without using "size", since both
 command register and ROM BAR are 32-bit aligned registers, and 64-bit
 accesses get split in vpci_ecam_write().
>>> But, OS may want reading a single byte of ROM BAR, so I think
>>> I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
>>> ranges
 For the command register the memory- / IO-decoding-enabled check may
 end up a little more complicated, as the value to be written also
 matters. Maybe read the command register only for the ROM BAR write,
 using the write lock uniformly for all command register writes?
>>> Sounds good for the start.
>>> Another concern is that if we go with a read_lock and then in the
>>> underlying code we disable memory decoding and try doing
>>> something and calling cmd_write handler for any reason then
>>>
>>> I mean that the check in the vpci_write is somewhat we can tolerate,
>>> but then it is must be considered that no code in the read path
>>> is allowed to perform write path functions. Which brings a pretty
>>> valid use-case: say in read mode we detect an unrecoverable error
>>> and need to remove the device:
>>> vpci_process_pending -> ERROR -> vpci_remove_device or similar.
>>>
>>> What do we do then? It is all going to be fragile...
>> I have tried to summarize the options we have wrt locking
>> and would love to hear from @Roger and @Jan.
>>
>> In every variant there is a task of dealing with the overlap
>> detection in modify_bars, so this is the only place as of now
>> which needs special treatment.
>>
>> Existing limitations: there is no way to upgrade a read lock to a write
>> lock, so paths which may require write lock protection need to use
>> write lock from the very beginning. Workarounds can be applied.
>>
>> 1. Per-domain rw lock, aka d->vpci_lock
>> ==
>> Note: with per-domain rw lock it is possible to do without introducing
>> per-device locks, so pdev->vpci->lock can be removed and no pdev->vpci_lock
>> should be required.
> Er, no, I think you still need a per-device lock unless you intent to
> take the per-domain rwlock in write mode every time you modify data
> in vpci.
This is exactly the assumption stated below. I am trying to discuss
all the possible options, so this one is also listed
>   I still think you need pdev->vpci->lock. It's possible this
> approach doesn't require moving the lock outside of the vpci struct.
>
>> This is only going to work in case if vpci_write always takes the write lock
>> and vpci_read takes a read lock and no path in vpci_read is allowed to
>> perform write path operations.
> I think that's likely too strong?
>
> You could get away with both vpci_{read,write} only taking the read
> lock and use a per-device vpci lock?
But as discussed before:
- if pdev->vpci_lock is used this still leads to ABBA
- we should know about if to take the write lock beforehand
>
> Otherwise you are likely to 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Roger Pau Monné
On Tue, Feb 08, 2022 at 07:35:34AM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 18:44, Oleksandr Andrushchenko wrote:
> >
> > On 07.02.22 18:37, Jan Beulich wrote:
> >> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
> >>> On 07.02.22 18:15, Jan Beulich wrote:
>  On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
> > On 07.02.22 17:26, Jan Beulich wrote:
> >> 1b. Make vpci_write use write lock for writes to command register and 
> >> BARs
> >> only; keep using the read lock for all other writes.
> > I am not quite sure how to do that. Do you mean something like:
> > void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> >     uint32_t data)
> > [snip]
> >     list_for_each_entry ( r, >vpci->handlers, node )
> > {
> > [snip]
> >     if ( r->needs_write_lock)
> >         write_lock(d->vpci_lock)
> >     else
> >         read_lock(d->vpci_lock)
> > 
> >
> > And provide rw as an argument to:
> >
> > int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
> >   vpci_write_t *write_handler, unsigned int 
> > offset,
> >   unsigned int size, void *data, --->>> bool 
> > write_path <<<-)
> >
> > Is this what you mean?
>  This sounds overly complicated. You can derive locally in vpci_write(),
>  from just its "reg" and "size" parameters, whether the lock needs taking
>  in write mode.
> >>> Yes, I started writing a reply with that. So, the summary (ROM
> >>> position depends on header type):
> >>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
> >>> {
> >>>    read PCI_COMMAND and see if memory or IO decoding are enabled.
> >>>    if ( enabled )
> >>>        write_lock(d->vpci_lock)
> >>>    else
> >>>        read_lock(d->vpci_lock)
> >>> }
> >> Hmm, yes, you can actually get away without using "size", since both
> >> command register and ROM BAR are 32-bit aligned registers, and 64-bit
> >> accesses get split in vpci_ecam_write().
> > But, OS may want reading a single byte of ROM BAR, so I think
> > I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
> > ranges
> >> For the command register the memory- / IO-decoding-enabled check may
> >> end up a little more complicated, as the value to be written also
> >> matters. Maybe read the command register only for the ROM BAR write,
> >> using the write lock uniformly for all command register writes?
> > Sounds good for the start.
> > Another concern is that if we go with a read_lock and then in the
> > underlying code we disable memory decoding and try doing
> > something and calling cmd_write handler for any reason then
> >
> > I mean that the check in the vpci_write is somewhat we can tolerate,
> > but then it is must be considered that no code in the read path
> > is allowed to perform write path functions. Which brings a pretty
> > valid use-case: say in read mode we detect an unrecoverable error
> > and need to remove the device:
> > vpci_process_pending -> ERROR -> vpci_remove_device or similar.
> >
> > What do we do then? It is all going to be fragile...
> I have tried to summarize the options we have wrt locking
> and would love to hear from @Roger and @Jan.
> 
> In every variant there is a task of dealing with the overlap
> detection in modify_bars, so this is the only place as of now
> which needs special treatment.
> 
> Existing limitations: there is no way to upgrade a read lock to a write
> lock, so paths which may require write lock protection need to use
> write lock from the very beginning. Workarounds can be applied.
> 
> 1. Per-domain rw lock, aka d->vpci_lock
> ==
> Note: with per-domain rw lock it is possible to do without introducing
> per-device locks, so pdev->vpci->lock can be removed and no pdev->vpci_lock
> should be required.

Er, no, I think you still need a per-device lock unless you intent to
take the per-domain rwlock in write mode every time you modify data
in vpci. I still think you need pdev->vpci->lock. It's possible this
approach doesn't require moving the lock outside of the vpci struct.

> This is only going to work in case if vpci_write always takes the write lock
> and vpci_read takes a read lock and no path in vpci_read is allowed to
> perform write path operations.

I think that's likely too strong?

You could get away with both vpci_{read,write} only taking the read
lock and use a per-device vpci lock?

Otherwise you are likely to introduce contention in msix_write if a
guest makes heavy use of the MSI-X entry mask bit.

> vpci_process_pending uses write lock as it have vpci_remove_device in its
> error path.
> 
> Pros:
> - no per-device vpci lock is needed?
> - solves overlap code ABBA in modify_bars
> 
> Cons:
> - all writes are serialized
> - need to carefully select read paths, 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Oleksandr Andrushchenko


On 08.02.22 12:11, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 05:37:49PM +0100, Jan Beulich wrote:
>> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
>>>
>>> On 07.02.22 18:15, Jan Beulich wrote:
 On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
> On 07.02.22 17:26, Jan Beulich wrote:
>> 1b. Make vpci_write use write lock for writes to command register and 
>> BARs
>> only; keep using the read lock for all other writes.
> I am not quite sure how to do that. Do you mean something like:
> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>    uint32_t data)
> [snip]
>    list_for_each_entry ( r, >vpci->handlers, node )
> {
> [snip]
>    if ( r->needs_write_lock)
>        write_lock(d->vpci_lock)
>    else
>        read_lock(d->vpci_lock)
> 
>
> And provide rw as an argument to:
>
> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>  vpci_write_t *write_handler, unsigned int offset,
>  unsigned int size, void *data, --->>> bool 
> write_path <<<-)
>
> Is this what you mean?
 This sounds overly complicated. You can derive locally in vpci_write(),
 from just its "reg" and "size" parameters, whether the lock needs taking
 in write mode.
>>> Yes, I started writing a reply with that. So, the summary (ROM
>>> position depends on header type):
>>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
>>> {
>>>       read PCI_COMMAND and see if memory or IO decoding are enabled.
>>>       if ( enabled )
>>>           write_lock(d->vpci_lock)
>>>       else
>>>           read_lock(d->vpci_lock)
>>> }
>> Hmm, yes, you can actually get away without using "size", since both
>> command register and ROM BAR are 32-bit aligned registers, and 64-bit
>> accesses get split in vpci_ecam_write().
>>
>> For the command register the memory- / IO-decoding-enabled check may
>> end up a little more complicated, as the value to be written also
>> matters. Maybe read the command register only for the ROM BAR write,
>> using the write lock uniformly for all command register writes?
>>
>>> Do you also think we can drop pdev->vpci (or currently pdev->vpci->lock)
>>> at all then?
>> I haven't looked at this in any detail, sorry. It sounds possible,
>> yes.
> AFAICT you should avoid taking the per-device vpci lock when you take
> the per-domain lock in write mode. Otherwise you still need the
> per-device vpci lock in order to keep consistency between concurrent
> accesses to the device registers.
I have sent an e-mail this morning describing possible locking schemes.
Could we please move there and continue if you don't mind?
>
> Thanks, Roger.
Thank you in advance,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 05:37:49PM +0100, Jan Beulich wrote:
> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
> > 
> > 
> > On 07.02.22 18:15, Jan Beulich wrote:
> >> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
> >>> On 07.02.22 17:26, Jan Beulich wrote:
>  1b. Make vpci_write use write lock for writes to command register and 
>  BARs
>  only; keep using the read lock for all other writes.
> >>> I am not quite sure how to do that. Do you mean something like:
> >>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
> >>>       uint32_t data)
> >>> [snip]
> >>>       list_for_each_entry ( r, >vpci->handlers, node )
> >>> {
> >>> [snip]
> >>>       if ( r->needs_write_lock)
> >>>           write_lock(d->vpci_lock)
> >>>       else
> >>>           read_lock(d->vpci_lock)
> >>> 
> >>>
> >>> And provide rw as an argument to:
> >>>
> >>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
> >>>     vpci_write_t *write_handler, unsigned int offset,
> >>>     unsigned int size, void *data, --->>> bool 
> >>> write_path <<<-)
> >>>
> >>> Is this what you mean?
> >> This sounds overly complicated. You can derive locally in vpci_write(),
> >> from just its "reg" and "size" parameters, whether the lock needs taking
> >> in write mode.
> > Yes, I started writing a reply with that. So, the summary (ROM
> > position depends on header type):
> > if ( (reg == PCI_COMMAND) || (reg == ROM) )
> > {
> >      read PCI_COMMAND and see if memory or IO decoding are enabled.
> >      if ( enabled )
> >          write_lock(d->vpci_lock)
> >      else
> >          read_lock(d->vpci_lock)
> > }
> 
> Hmm, yes, you can actually get away without using "size", since both
> command register and ROM BAR are 32-bit aligned registers, and 64-bit
> accesses get split in vpci_ecam_write().
> 
> For the command register the memory- / IO-decoding-enabled check may
> end up a little more complicated, as the value to be written also
> matters. Maybe read the command register only for the ROM BAR write,
> using the write lock uniformly for all command register writes?
> 
> > Do you also think we can drop pdev->vpci (or currently pdev->vpci->lock)
> > at all then?
> 
> I haven't looked at this in any detail, sorry. It sounds possible,
> yes.

AFAICT you should avoid taking the per-device vpci lock when you take
the per-domain lock in write mode. Otherwise you still need the
per-device vpci lock in order to keep consistency between concurrent
accesses to the device registers.

Thanks, Roger.



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Oleksandr Andrushchenko


On 08.02.22 10:57, Jan Beulich wrote:
> On 08.02.2022 08:35, Oleksandr Andrushchenko wrote:
>> 1.1. Semi read lock upgrade in modify bars
>> --
>> In this case both vpci_read and vpci_write take a read lock and when it comes
>> to modify_bars:
>>
>> 1. read_unlock(d->vpci_lock)
>> 2. write_lock(d->vpci_lock)
>> 3. Check that pdev->vpci is still available and is the same object:
>> if (pdev->vpci && (pdev->vpci == old_vpci) )
>> {
>>       /* vpci structure is valid and can be used. */
>> }
>> else
>> {
>>       /* vpci has gone, return an error. */
>> }
>>
>> Pros:
>> - no per-device vpci lock is needed?
>> - solves overlap code ABBA in modify_bars
>> - readers and writers are NOT serialized
>> - NO need to carefully select read paths, so they are guaranteed not to lead
>>     to lock upgrade use-cases
>>
>> Cons:
>> - ???
> The "pdev->vpci == old_vpci" is fragile: The struct may have got re-
> allocated, and it just so happened that the two pointers are identical.
>
> Same then for the subsequent variant 2.
Yes, it is possible. We can add an ID number to pdev->vpci,
so each new allocated vpci structure has a unique ID which can be used
to compare vpci structures. It can be something like pdev->vpci->id = 
d->vpci_id++;
with id being uint32_t for example
>
> Jan
>
>


Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Oleksandr Andrushchenko


On 08.02.22 10:53, Jan Beulich wrote:
> On 07.02.2022 17:44, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 18:37, Jan Beulich wrote:
>>> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
 On 07.02.22 18:15, Jan Beulich wrote:
> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
>> On 07.02.22 17:26, Jan Beulich wrote:
>>> 1b. Make vpci_write use write lock for writes to command register and 
>>> BARs
>>> only; keep using the read lock for all other writes.
>> I am not quite sure how to do that. Do you mean something like:
>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>     uint32_t data)
>> [snip]
>>     list_for_each_entry ( r, >vpci->handlers, node )
>> {
>> [snip]
>>     if ( r->needs_write_lock)
>>         write_lock(d->vpci_lock)
>>     else
>>         read_lock(d->vpci_lock)
>> 
>>
>> And provide rw as an argument to:
>>
>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>   vpci_write_t *write_handler, unsigned int 
>> offset,
>>   unsigned int size, void *data, --->>> bool 
>> write_path <<<-)
>>
>> Is this what you mean?
> This sounds overly complicated. You can derive locally in vpci_write(),
> from just its "reg" and "size" parameters, whether the lock needs taking
> in write mode.
 Yes, I started writing a reply with that. So, the summary (ROM
 position depends on header type):
 if ( (reg == PCI_COMMAND) || (reg == ROM) )
 {
    read PCI_COMMAND and see if memory or IO decoding are enabled.
    if ( enabled )
        write_lock(d->vpci_lock)
    else
        read_lock(d->vpci_lock)
 }
>>> Hmm, yes, you can actually get away without using "size", since both
>>> command register and ROM BAR are 32-bit aligned registers, and 64-bit
>>> accesses get split in vpci_ecam_write().
>> But, OS may want reading a single byte of ROM BAR, so I think
>> I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
>> ranges
>>> For the command register the memory- / IO-decoding-enabled check may
>>> end up a little more complicated, as the value to be written also
>>> matters. Maybe read the command register only for the ROM BAR write,
>>> using the write lock uniformly for all command register writes?
>> Sounds good for the start.
>> Another concern is that if we go with a read_lock and then in the
>> underlying code we disable memory decoding and try doing
>> something and calling cmd_write handler for any reason then
>>
>> I mean that the check in the vpci_write is somewhat we can tolerate,
>> but then it is must be considered that no code in the read path
>> is allowed to perform write path functions. Which brings a pretty
>> valid use-case: say in read mode we detect an unrecoverable error
>> and need to remove the device:
>> vpci_process_pending -> ERROR -> vpci_remove_device or similar.
>>
>> What do we do then? It is all going to be fragile...
> Real hardware won't cause a device to disappear upon a problem with
> a read access. There shouldn't be any need to remove a passed-through
> device either; such problems (if any) need handling differently imo.
Yes, at the moment there is a single place in the code which
removes the device (besides normal use-cases such as
pci_add_device on fail path and PHYSDEVOP_manage_pci_remove):

bool vpci_process_pending(struct vcpu *v)
{
[snip]
     if ( rc )
     /*
  * FIXME: in case of failure remove the device from the domain.
  * Note that there might still be leftover mappings. While this is
  * safe for Dom0, for DomUs the domain will likely need to be
  * killed in order to avoid leaking stale p2m mappings on
  * failure.
  */
     vpci_remove_device(v->vpci.pdev);

>
> Jan
>
>


Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Jan Beulich
On 08.02.2022 08:35, Oleksandr Andrushchenko wrote:
> 1.1. Semi read lock upgrade in modify bars
> --
> In this case both vpci_read and vpci_write take a read lock and when it comes
> to modify_bars:
> 
> 1. read_unlock(d->vpci_lock)
> 2. write_lock(d->vpci_lock)
> 3. Check that pdev->vpci is still available and is the same object:
> if (pdev->vpci && (pdev->vpci == old_vpci) )
> {
>      /* vpci structure is valid and can be used. */
> }
> else
> {
>      /* vpci has gone, return an error. */
> }
> 
> Pros:
> - no per-device vpci lock is needed?
> - solves overlap code ABBA in modify_bars
> - readers and writers are NOT serialized
> - NO need to carefully select read paths, so they are guaranteed not to lead
>    to lock upgrade use-cases
> 
> Cons:
> - ???

The "pdev->vpci == old_vpci" is fragile: The struct may have got re-
allocated, and it just so happened that the two pointers are identical.

Same then for the subsequent variant 2.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-08 Thread Jan Beulich
On 07.02.2022 17:44, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 18:37, Jan Beulich wrote:
>> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
>>>
>>> On 07.02.22 18:15, Jan Beulich wrote:
 On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
> On 07.02.22 17:26, Jan Beulich wrote:
>> 1b. Make vpci_write use write lock for writes to command register and 
>> BARs
>> only; keep using the read lock for all other writes.
> I am not quite sure how to do that. Do you mean something like:
> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>    uint32_t data)
> [snip]
>    list_for_each_entry ( r, >vpci->handlers, node )
> {
> [snip]
>    if ( r->needs_write_lock)
>        write_lock(d->vpci_lock)
>    else
>        read_lock(d->vpci_lock)
> 
>
> And provide rw as an argument to:
>
> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>  vpci_write_t *write_handler, unsigned int offset,
>  unsigned int size, void *data, --->>> bool 
> write_path <<<-)
>
> Is this what you mean?
 This sounds overly complicated. You can derive locally in vpci_write(),
 from just its "reg" and "size" parameters, whether the lock needs taking
 in write mode.
>>> Yes, I started writing a reply with that. So, the summary (ROM
>>> position depends on header type):
>>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
>>> {
>>>       read PCI_COMMAND and see if memory or IO decoding are enabled.
>>>       if ( enabled )
>>>           write_lock(d->vpci_lock)
>>>       else
>>>           read_lock(d->vpci_lock)
>>> }
>> Hmm, yes, you can actually get away without using "size", since both
>> command register and ROM BAR are 32-bit aligned registers, and 64-bit
>> accesses get split in vpci_ecam_write().
> But, OS may want reading a single byte of ROM BAR, so I think
> I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
> ranges
>>
>> For the command register the memory- / IO-decoding-enabled check may
>> end up a little more complicated, as the value to be written also
>> matters. Maybe read the command register only for the ROM BAR write,
>> using the write lock uniformly for all command register writes?
> Sounds good for the start.
> Another concern is that if we go with a read_lock and then in the
> underlying code we disable memory decoding and try doing
> something and calling cmd_write handler for any reason then
> 
> I mean that the check in the vpci_write is somewhat we can tolerate,
> but then it is must be considered that no code in the read path
> is allowed to perform write path functions. Which brings a pretty
> valid use-case: say in read mode we detect an unrecoverable error
> and need to remove the device:
> vpci_process_pending -> ERROR -> vpci_remove_device or similar.
> 
> What do we do then? It is all going to be fragile...

Real hardware won't cause a device to disappear upon a problem with
a read access. There shouldn't be any need to remove a passed-through
device either; such problems (if any) need handling differently imo.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 18:44, Oleksandr Andrushchenko wrote:
>
> On 07.02.22 18:37, Jan Beulich wrote:
>> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 18:15, Jan Beulich wrote:
 On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
> On 07.02.22 17:26, Jan Beulich wrote:
>> 1b. Make vpci_write use write lock for writes to command register and 
>> BARs
>> only; keep using the read lock for all other writes.
> I am not quite sure how to do that. Do you mean something like:
> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>     uint32_t data)
> [snip]
>     list_for_each_entry ( r, >vpci->handlers, node )
> {
> [snip]
>     if ( r->needs_write_lock)
>         write_lock(d->vpci_lock)
>     else
>         read_lock(d->vpci_lock)
> 
>
> And provide rw as an argument to:
>
> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>   vpci_write_t *write_handler, unsigned int 
> offset,
>   unsigned int size, void *data, --->>> bool 
> write_path <<<-)
>
> Is this what you mean?
 This sounds overly complicated. You can derive locally in vpci_write(),
 from just its "reg" and "size" parameters, whether the lock needs taking
 in write mode.
>>> Yes, I started writing a reply with that. So, the summary (ROM
>>> position depends on header type):
>>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
>>> {
>>>    read PCI_COMMAND and see if memory or IO decoding are enabled.
>>>    if ( enabled )
>>>        write_lock(d->vpci_lock)
>>>    else
>>>        read_lock(d->vpci_lock)
>>> }
>> Hmm, yes, you can actually get away without using "size", since both
>> command register and ROM BAR are 32-bit aligned registers, and 64-bit
>> accesses get split in vpci_ecam_write().
> But, OS may want reading a single byte of ROM BAR, so I think
> I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
> ranges
>> For the command register the memory- / IO-decoding-enabled check may
>> end up a little more complicated, as the value to be written also
>> matters. Maybe read the command register only for the ROM BAR write,
>> using the write lock uniformly for all command register writes?
> Sounds good for the start.
> Another concern is that if we go with a read_lock and then in the
> underlying code we disable memory decoding and try doing
> something and calling cmd_write handler for any reason then
>
> I mean that the check in the vpci_write is somewhat we can tolerate,
> but then it is must be considered that no code in the read path
> is allowed to perform write path functions. Which brings a pretty
> valid use-case: say in read mode we detect an unrecoverable error
> and need to remove the device:
> vpci_process_pending -> ERROR -> vpci_remove_device or similar.
>
> What do we do then? It is all going to be fragile...
I have tried to summarize the options we have wrt locking
and would love to hear from @Roger and @Jan.

In every variant there is a task of dealing with the overlap
detection in modify_bars, so this is the only place as of now
which needs special treatment.

Existing limitations: there is no way to upgrade a read lock to a write
lock, so paths which may require write lock protection need to use
write lock from the very beginning. Workarounds can be applied.

1. Per-domain rw lock, aka d->vpci_lock
==
Note: with per-domain rw lock it is possible to do without introducing
per-device locks, so pdev->vpci->lock can be removed and no pdev->vpci_lock
should be required.

This is only going to work in case if vpci_write always takes the write lock
and vpci_read takes a read lock and no path in vpci_read is allowed to
perform write path operations.
vpci_process_pending uses write lock as it have vpci_remove_device in its
error path.

Pros:
- no per-device vpci lock is needed?
- solves overlap code ABBA in modify_bars

Cons:
- all writes are serialized
- need to carefully select read paths, so they are guaranteed not to lead
   to lock upgrade use-cases

1.1. Semi read lock upgrade in modify bars
--
In this case both vpci_read and vpci_write take a read lock and when it comes
to modify_bars:

1. read_unlock(d->vpci_lock)
2. write_lock(d->vpci_lock)
3. Check that pdev->vpci is still available and is the same object:
if (pdev->vpci && (pdev->vpci == old_vpci) )
{
     /* vpci structure is valid and can be used. */
}
else
{
     /* vpci has gone, return an error. */
}

Pros:
- no per-device vpci lock is needed?
- solves overlap code ABBA in modify_bars
- readers and writers are NOT serialized
- NO need to carefully select read paths, so they are guaranteed not to lead
   to lock upgrade use-cases

Cons:
- ???

2. per-device lock 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 18:37, Jan Beulich wrote:
> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 18:15, Jan Beulich wrote:
>>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
 On 07.02.22 17:26, Jan Beulich wrote:
> 1b. Make vpci_write use write lock for writes to command register and BARs
> only; keep using the read lock for all other writes.
 I am not quite sure how to do that. Do you mean something like:
 void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
    uint32_t data)
 [snip]
    list_for_each_entry ( r, >vpci->handlers, node )
 {
 [snip]
    if ( r->needs_write_lock)
        write_lock(d->vpci_lock)
    else
        read_lock(d->vpci_lock)
 

 And provide rw as an argument to:

 int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
  vpci_write_t *write_handler, unsigned int offset,
  unsigned int size, void *data, --->>> bool 
 write_path <<<-)

 Is this what you mean?
>>> This sounds overly complicated. You can derive locally in vpci_write(),
>>> from just its "reg" and "size" parameters, whether the lock needs taking
>>> in write mode.
>> Yes, I started writing a reply with that. So, the summary (ROM
>> position depends on header type):
>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
>> {
>>       read PCI_COMMAND and see if memory or IO decoding are enabled.
>>       if ( enabled )
>>           write_lock(d->vpci_lock)
>>       else
>>           read_lock(d->vpci_lock)
>> }
> Hmm, yes, you can actually get away without using "size", since both
> command register and ROM BAR are 32-bit aligned registers, and 64-bit
> accesses get split in vpci_ecam_write().
But, OS may want reading a single byte of ROM BAR, so I think
I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
ranges
>
> For the command register the memory- / IO-decoding-enabled check may
> end up a little more complicated, as the value to be written also
> matters. Maybe read the command register only for the ROM BAR write,
> using the write lock uniformly for all command register writes?
Sounds good for the start.
Another concern is that if we go with a read_lock and then in the
underlying code we disable memory decoding and try doing
something and calling cmd_write handler for any reason then

I mean that the check in the vpci_write is somewhat we can tolerate,
but then it is must be considered that no code in the read path
is allowed to perform write path functions. Which brings a pretty
valid use-case: say in read mode we detect an unrecoverable error
and need to remove the device:
vpci_process_pending -> ERROR -> vpci_remove_device or similar.

What do we do then? It is all going to be fragile...
>
>> Do you also think we can drop pdev->vpci (or currently pdev->vpci->lock)
>> at all then?
> I haven't looked at this in any detail, sorry. It sounds possible,
> yes.
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 18:15, Jan Beulich wrote:
>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 17:26, Jan Beulich wrote:
 1b. Make vpci_write use write lock for writes to command register and BARs
 only; keep using the read lock for all other writes.
>>> I am not quite sure how to do that. Do you mean something like:
>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>       uint32_t data)
>>> [snip]
>>>       list_for_each_entry ( r, >vpci->handlers, node )
>>> {
>>> [snip]
>>>       if ( r->needs_write_lock)
>>>           write_lock(d->vpci_lock)
>>>       else
>>>           read_lock(d->vpci_lock)
>>> 
>>>
>>> And provide rw as an argument to:
>>>
>>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>>     vpci_write_t *write_handler, unsigned int offset,
>>>     unsigned int size, void *data, --->>> bool 
>>> write_path <<<-)
>>>
>>> Is this what you mean?
>> This sounds overly complicated. You can derive locally in vpci_write(),
>> from just its "reg" and "size" parameters, whether the lock needs taking
>> in write mode.
> Yes, I started writing a reply with that. So, the summary (ROM
> position depends on header type):
> if ( (reg == PCI_COMMAND) || (reg == ROM) )
> {
>      read PCI_COMMAND and see if memory or IO decoding are enabled.
>      if ( enabled )
>          write_lock(d->vpci_lock)
>      else
>          read_lock(d->vpci_lock)
> }

Hmm, yes, you can actually get away without using "size", since both
command register and ROM BAR are 32-bit aligned registers, and 64-bit
accesses get split in vpci_ecam_write().

For the command register the memory- / IO-decoding-enabled check may
end up a little more complicated, as the value to be written also
matters. Maybe read the command register only for the ROM BAR write,
using the write lock uniformly for all command register writes?

> Do you also think we can drop pdev->vpci (or currently pdev->vpci->lock)
> at all then?

I haven't looked at this in any detail, sorry. It sounds possible,
yes.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 18:15, Jan Beulich wrote:
> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
>> On 07.02.22 17:26, Jan Beulich wrote:
>>> 1b. Make vpci_write use write lock for writes to command register and BARs
>>> only; keep using the read lock for all other writes.
>> I am not quite sure how to do that. Do you mean something like:
>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>       uint32_t data)
>> [snip]
>>       list_for_each_entry ( r, >vpci->handlers, node )
>> {
>> [snip]
>>       if ( r->needs_write_lock)
>>           write_lock(d->vpci_lock)
>>       else
>>           read_lock(d->vpci_lock)
>> 
>>
>> And provide rw as an argument to:
>>
>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>     vpci_write_t *write_handler, unsigned int offset,
>>     unsigned int size, void *data, --->>> bool 
>> write_path <<<-)
>>
>> Is this what you mean?
> This sounds overly complicated. You can derive locally in vpci_write(),
> from just its "reg" and "size" parameters, whether the lock needs taking
> in write mode.
Yes, I started writing a reply with that. So, the summary (ROM
position depends on header type):
if ( (reg == PCI_COMMAND) || (reg == ROM) )
{
     read PCI_COMMAND and see if memory or IO decoding are enabled.
     if ( enabled )
         write_lock(d->vpci_lock)
     else
         read_lock(d->vpci_lock)
}

Do you also think we can drop pdev->vpci (or currently pdev->vpci->lock)
at all then?
> Jan
>
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
> On 07.02.22 17:26, Jan Beulich wrote:
>> 1b. Make vpci_write use write lock for writes to command register and BARs
>> only; keep using the read lock for all other writes.
> I am not quite sure how to do that. Do you mean something like:
> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>      uint32_t data)
> [snip]
>      list_for_each_entry ( r, >vpci->handlers, node )
> {
> [snip]
>      if ( r->needs_write_lock)
>          write_lock(d->vpci_lock)
>      else
>          read_lock(d->vpci_lock)
> 
> 
> And provide rw as an argument to:
> 
> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>    vpci_write_t *write_handler, unsigned int offset,
>    unsigned int size, void *data, --->>> bool write_path 
> <<<-)
> 
> Is this what you mean?

This sounds overly complicated. You can derive locally in vpci_write(),
from just its "reg" and "size" parameters, whether the lock needs taking
in write mode.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 17:08, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 04:26:56PM +0100, Jan Beulich wrote:
>> On 07.02.2022 16:11, Oleksandr Andrushchenko wrote:
>>>
>>>
>>> On 07.02.22 16:35, Oleksandr Andrushchenko wrote:

 On 07.02.22 16:27, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 14:46, Roger Pau Monné wrote:
 I think the per-domain rwlock seems like a good option. I would do
 that as a pre-patch.
>>> It is. But it seems it won't solve the thing we started this adventure 
>>> for:
>>>
>>> With per-domain read lock and still ABBA in modify_bars (hope the below
>>> is correctly seen with a monospace font):
>>>
>>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
>>>    rom_write -> modify_bars: tmp (pdev2) ->lock
>>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
>>> modify_bars: tmp (pdev1) ->lock
>>>
>>> There is no API to upgrade read lock to write lock in modify_bars which 
>>> could help,
>>> so in both cases vpci_write should take write lock.
>> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>> to acquire the write lock, but its (perhaps indirect) caller. Effectively
>> vpci_write() would need to take the write lock if the range written
>> overlaps the BARs or the command register.
> I'm confused. If we use a per-domain rwlock approach there would be no
> need to lock tmp again in modify_bars, because we should hold the
> rwlock in write mode, so there's no ABBA?
 this is only possible with what you wrote below:
> We will have however to drop the per domain read and vpci locks and
> pick the per-domain lock in write mode.
 I think this is going to be unreliable. We need a reliable way to
 upgrade read lock to write lock.
 Then, we can drop pdev->vpci_lock at all, because we are always
 protected with d->rwlock and those who want to free pdev->vpci
 will use write lock.

 So, per-domain rwlock with write upgrade implemented minus pdev->vpci
 should do the trick
>>> Linux doesn't implement write upgrade and it seems for a reason [1]:
>>> "Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ 
>>> time
>>> need to do any changes (even if you don’t do it every time), you have to get
>>> the write-lock at the very beginning."
>>>
>>> So, I am not sure we can have the same for Xen...
>>>
>>> At the moment I see at least two possible ways to solve the issue:
>>> 1. Make vpci_write use write lock, thus make all write accesses synchronized
>>> for the given domain, read are fully parallel
>>
>> 1b. Make vpci_write use write lock for writes to command register and BARs
>> only; keep using the read lock for all other writes.
> 
> We do not support writing to the BARs with memory decoding enabled
> currently for dom0, so we would only need to pick the lock in write
> mode for the command register and ROM BAR write handler AFAICT.

Oh, right - this then makes for even less contention due to needing to
acquire the lock in write mode.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 04:26:56PM +0100, Jan Beulich wrote:
> On 07.02.2022 16:11, Oleksandr Andrushchenko wrote:
> > 
> > 
> > On 07.02.22 16:35, Oleksandr Andrushchenko wrote:
> >>
> >> On 07.02.22 16:27, Roger Pau Monné wrote:
> >>> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>  On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
> > On 07.02.22 14:46, Roger Pau Monné wrote:
> >> I think the per-domain rwlock seems like a good option. I would do
> >> that as a pre-patch.
> > It is. But it seems it won't solve the thing we started this adventure 
> > for:
> >
> > With per-domain read lock and still ABBA in modify_bars (hope the below
> > is correctly seen with a monospace font):
> >
> > cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
> >    rom_write -> modify_bars: tmp (pdev2) ->lock
> > cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
> > modify_bars: tmp (pdev1) ->lock
> >
> > There is no API to upgrade read lock to write lock in modify_bars which 
> > could help,
> > so in both cases vpci_write should take write lock.
>  Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>  to acquire the write lock, but its (perhaps indirect) caller. Effectively
>  vpci_write() would need to take the write lock if the range written
>  overlaps the BARs or the command register.
> >>> I'm confused. If we use a per-domain rwlock approach there would be no
> >>> need to lock tmp again in modify_bars, because we should hold the
> >>> rwlock in write mode, so there's no ABBA?
> >> this is only possible with what you wrote below:
> >>> We will have however to drop the per domain read and vpci locks and
> >>> pick the per-domain lock in write mode.
> >> I think this is going to be unreliable. We need a reliable way to
> >> upgrade read lock to write lock.
> >> Then, we can drop pdev->vpci_lock at all, because we are always
> >> protected with d->rwlock and those who want to free pdev->vpci
> >> will use write lock.
> >>
> >> So, per-domain rwlock with write upgrade implemented minus pdev->vpci
> >> should do the trick
> > Linux doesn't implement write upgrade and it seems for a reason [1]:
> > "Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ 
> > time
> > need to do any changes (even if you don’t do it every time), you have to get
> > the write-lock at the very beginning."
> > 
> > So, I am not sure we can have the same for Xen...
> > 
> > At the moment I see at least two possible ways to solve the issue:
> > 1. Make vpci_write use write lock, thus make all write accesses synchronized
> > for the given domain, read are fully parallel
> 
> 1b. Make vpci_write use write lock for writes to command register and BARs
> only; keep using the read lock for all other writes.

We do not support writing to the BARs with memory decoding enabled
currently for dom0, so we would only need to pick the lock in write
mode for the command register and ROM BAR write handler AFAICT.

Thanks, Roger.



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 17:26, Jan Beulich wrote:
> On 07.02.2022 16:11, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 16:35, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 16:27, Roger Pau Monné wrote:
 On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>> On 07.02.22 14:46, Roger Pau Monné wrote:
>>> I think the per-domain rwlock seems like a good option. I would do
>>> that as a pre-patch.
>> It is. But it seems it won't solve the thing we started this adventure 
>> for:
>>
>> With per-domain read lock and still ABBA in modify_bars (hope the below
>> is correctly seen with a monospace font):
>>
>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->
>>   rom_write -> modify_bars: tmp (pdev2) ->lock
>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
>> modify_bars: tmp (pdev1) ->lock
>>
>> There is no API to upgrade read lock to write lock in modify_bars which 
>> could help,
>> so in both cases vpci_write should take write lock.
> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
> to acquire the write lock, but its (perhaps indirect) caller. Effectively
> vpci_write() would need to take the write lock if the range written
> overlaps the BARs or the command register.
 I'm confused. If we use a per-domain rwlock approach there would be no
 need to lock tmp again in modify_bars, because we should hold the
 rwlock in write mode, so there's no ABBA?
>>> this is only possible with what you wrote below:
 We will have however to drop the per domain read and vpci locks and
 pick the per-domain lock in write mode.
>>> I think this is going to be unreliable. We need a reliable way to
>>> upgrade read lock to write lock.
>>> Then, we can drop pdev->vpci_lock at all, because we are always
>>> protected with d->rwlock and those who want to free pdev->vpci
>>> will use write lock.
>>>
>>> So, per-domain rwlock with write upgrade implemented minus pdev->vpci
>>> should do the trick
>> Linux doesn't implement write upgrade and it seems for a reason [1]:
>> "Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ 
>> time
>> need to do any changes (even if you don’t do it every time), you have to get
>> the write-lock at the very beginning."
>>
>> So, I am not sure we can have the same for Xen...
>>
>> At the moment I see at least two possible ways to solve the issue:
>> 1. Make vpci_write use write lock, thus make all write accesses synchronized
>> for the given domain, read are fully parallel
> 1b. Make vpci_write use write lock for writes to command register and BARs
> only; keep using the read lock for all other writes.
I am not quite sure how to do that. Do you mean something like:
void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
     uint32_t data)
[snip]
     list_for_each_entry ( r, >vpci->handlers, node )
{
[snip]
     if ( r->needs_write_lock)
         write_lock(d->vpci_lock)
     else
         read_lock(d->vpci_lock)


And provide rw as an argument to:

int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
   vpci_write_t *write_handler, unsigned int offset,
   unsigned int size, void *data, --->>> bool write_path 
<<<-)

Is this what you mean?

With the above, if we have d->vpci_lock, I think we can drop
pdev->vpci_lock at all

Thank you,
Oleksandr

P.S. I don't think you mean we just drop the read lock and acquire write lock
as it leads to the mentioned before unreliability.


Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 16:11, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 16:35, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 16:27, Roger Pau Monné wrote:
>>> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
 On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
> On 07.02.22 14:46, Roger Pau Monné wrote:
>> I think the per-domain rwlock seems like a good option. I would do
>> that as a pre-patch.
> It is. But it seems it won't solve the thing we started this adventure 
> for:
>
> With per-domain read lock and still ABBA in modify_bars (hope the below
> is correctly seen with a monospace font):
>
> cpu0: vpci_write-> d->RLock -> pdev1->lock -> 
>  rom_write -> modify_bars: tmp (pdev2) ->lock
> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
> modify_bars: tmp (pdev1) ->lock
>
> There is no API to upgrade read lock to write lock in modify_bars which 
> could help,
> so in both cases vpci_write should take write lock.
 Hmm, yes, I think you're right: It's not modify_bars() itself which needs
 to acquire the write lock, but its (perhaps indirect) caller. Effectively
 vpci_write() would need to take the write lock if the range written
 overlaps the BARs or the command register.
>>> I'm confused. If we use a per-domain rwlock approach there would be no
>>> need to lock tmp again in modify_bars, because we should hold the
>>> rwlock in write mode, so there's no ABBA?
>> this is only possible with what you wrote below:
>>> We will have however to drop the per domain read and vpci locks and
>>> pick the per-domain lock in write mode.
>> I think this is going to be unreliable. We need a reliable way to
>> upgrade read lock to write lock.
>> Then, we can drop pdev->vpci_lock at all, because we are always
>> protected with d->rwlock and those who want to free pdev->vpci
>> will use write lock.
>>
>> So, per-domain rwlock with write upgrade implemented minus pdev->vpci
>> should do the trick
> Linux doesn't implement write upgrade and it seems for a reason [1]:
> "Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ 
> time
> need to do any changes (even if you don’t do it every time), you have to get
> the write-lock at the very beginning."
> 
> So, I am not sure we can have the same for Xen...
> 
> At the moment I see at least two possible ways to solve the issue:
> 1. Make vpci_write use write lock, thus make all write accesses synchronized
> for the given domain, read are fully parallel

1b. Make vpci_write use write lock for writes to command register and BARs
only; keep using the read lock for all other writes.

Jan

> 2. Re-implement pdev/tmp overlapping detection with something which won't
> require pdev->vpci_lock/tmp->vpci_lock
> 
> 3. Drop read and acquire write lock in modify_bars... but this is not reliable
> and will hide a free(pdev->vpci) bug
> 
> @Roger, @Jan: Any other suggestions?
> 
> Thank you,
> Oleksandr
> 
> [1] 
> https://www.kernel.org/doc/html/latest/locking/spinlocks.html#lesson-2-reader-writer-spinlocks




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 16:35, Oleksandr Andrushchenko wrote:
>
> On 07.02.22 16:27, Roger Pau Monné wrote:
>> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>>> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
 On 07.02.22 14:46, Roger Pau Monné wrote:
> I think the per-domain rwlock seems like a good option. I would do
> that as a pre-patch.
 It is. But it seems it won't solve the thing we started this adventure for:

 With per-domain read lock and still ABBA in modify_bars (hope the below
 is correctly seen with a monospace font):

 cpu0: vpci_write-> d->RLock -> pdev1->lock ->  
     rom_write -> modify_bars: tmp (pdev2) ->lock
 cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
 modify_bars: tmp (pdev1) ->lock

 There is no API to upgrade read lock to write lock in modify_bars which 
 could help,
 so in both cases vpci_write should take write lock.
>>> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>>> to acquire the write lock, but its (perhaps indirect) caller. Effectively
>>> vpci_write() would need to take the write lock if the range written
>>> overlaps the BARs or the command register.
>> I'm confused. If we use a per-domain rwlock approach there would be no
>> need to lock tmp again in modify_bars, because we should hold the
>> rwlock in write mode, so there's no ABBA?
> this is only possible with what you wrote below:
>> We will have however to drop the per domain read and vpci locks and
>> pick the per-domain lock in write mode.
> I think this is going to be unreliable. We need a reliable way to
> upgrade read lock to write lock.
> Then, we can drop pdev->vpci_lock at all, because we are always
> protected with d->rwlock and those who want to free pdev->vpci
> will use write lock.
>
> So, per-domain rwlock with write upgrade implemented minus pdev->vpci
> should do the trick
Linux doesn't implement write upgrade and it seems for a reason [1]:
"Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ time
need to do any changes (even if you don’t do it every time), you have to get
the write-lock at the very beginning."

So, I am not sure we can have the same for Xen...

At the moment I see at least two possible ways to solve the issue:
1. Make vpci_write use write lock, thus make all write accesses synchronized
for the given domain, read are fully parallel

2. Re-implement pdev/tmp overlapping detection with something which won't
require pdev->vpci_lock/tmp->vpci_lock

3. Drop read and acquire write lock in modify_bars... but this is not reliable
and will hide a free(pdev->vpci) bug

@Roger, @Jan: Any other suggestions?

Thank you,
Oleksandr

[1] 
https://www.kernel.org/doc/html/latest/locking/spinlocks.html#lesson-2-reader-writer-spinlocks

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 16:27, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 14:46, Roger Pau Monné wrote:
 I think the per-domain rwlock seems like a good option. I would do
 that as a pre-patch.
>>> It is. But it seems it won't solve the thing we started this adventure for:
>>>
>>> With per-domain read lock and still ABBA in modify_bars (hope the below
>>> is correctly seen with a monospace font):
>>>
>>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
>>>    rom_write -> modify_bars: tmp (pdev2) ->lock
>>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
>>> tmp (pdev1) ->lock
>>>
>>> There is no API to upgrade read lock to write lock in modify_bars which 
>>> could help,
>>> so in both cases vpci_write should take write lock.
>> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>> to acquire the write lock, but its (perhaps indirect) caller. Effectively
>> vpci_write() would need to take the write lock if the range written
>> overlaps the BARs or the command register.
> I'm confused. If we use a per-domain rwlock approach there would be no
> need to lock tmp again in modify_bars, because we should hold the
> rwlock in write mode, so there's no ABBA?
this is only possible with what you wrote below:
>
> We will have however to drop the per domain read and vpci locks and
> pick the per-domain lock in write mode.
I think this is going to be unreliable. We need a reliable way to
upgrade read lock to write lock.
Then, we can drop pdev->vpci_lock at all, because we are always
protected with d->rwlock and those who want to free pdev->vpci
will use write lock.

So, per-domain rwlock with write upgrade implemented minus pdev->vpci
should do the trick
> Thanks, Roger.
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 15:27, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 14:46, Roger Pau Monné wrote:
 I think the per-domain rwlock seems like a good option. I would do
 that as a pre-patch.
>>> It is. But it seems it won't solve the thing we started this adventure for:
>>>
>>> With per-domain read lock and still ABBA in modify_bars (hope the below
>>> is correctly seen with a monospace font):
>>>
>>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
>>>    rom_write -> modify_bars: tmp (pdev2) ->lock
>>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
>>> tmp (pdev1) ->lock
>>>
>>> There is no API to upgrade read lock to write lock in modify_bars which 
>>> could help,
>>> so in both cases vpci_write should take write lock.
>>
>> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>> to acquire the write lock, but its (perhaps indirect) caller. Effectively
>> vpci_write() would need to take the write lock if the range written
>> overlaps the BARs or the command register.
> 
> I'm confused. If we use a per-domain rwlock approach there would be no
> need to lock tmp again in modify_bars, because we should hold the
> rwlock in write mode, so there's no ABBA?
> 
> We will have however to drop the per domain read and vpci locks and
> pick the per-domain lock in write mode.

Well, yes, with intermediate dropping of the lock acquiring in write mode
can be done in modify_bars(). I'm not convinced (yet) that such intermediate
dropping is actually going to be okay.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 16:11, Jan Beulich wrote:
> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>> On 07.02.22 14:46, Roger Pau Monné wrote:
>>> I think the per-domain rwlock seems like a good option. I would do
>>> that as a pre-patch.
>> It is. But it seems it won't solve the thing we started this adventure for:
>>
>> With per-domain read lock and still ABBA in modify_bars (hope the below
>> is correctly seen with a monospace font):
>>
>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->
>>   rom_write -> modify_bars: tmp (pdev2) ->lock
>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
>> tmp (pdev1) ->lock
>>
>> There is no API to upgrade read lock to write lock in modify_bars which 
>> could help,
>> so in both cases vpci_write should take write lock.
> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
> to acquire the write lock, but its (perhaps indirect) caller. Effectively
> vpci_write() would need to take the write lock if the range written
> overlaps the BARs or the command register.
Exactly, vpci_write needs a write lock, but it is not desirable.
And again, there is a single offending piece of code which wants that...
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 16:19, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 01:53:34PM +, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 14:46, Roger Pau Monné wrote:
>>> On Mon, Feb 07, 2022 at 11:08:39AM +, Oleksandr Andrushchenko wrote:
 ==

 Bottom line:
 ==

 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
 parallel with pci_remove_device which can remove pdev after 
 vpci_{read|write}
 acquired the pdev pointer. This may lead to a fail due to pdev dereference.

 So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.
>>> We would like to take the pcidevs_lock only while fetching the device
>>> (ie: pci_get_pdev_by_domain), afterwards it should be fine to lock the
>>> device using a vpci specific lock so calls to vpci_{read,write} can be
>>> partially concurrent across multiple domains.
>> This means this can't be done a pre-req patch, but as a part of the
>> patch which changes locking.
>>> In fact I think Jan had already pointed out that the pci lock would
>>> need taking while searching for the device in vpci_{read,write}.
>> I was referring to the time after we found pdev and it is currently
>> possible to free pdev while using it after the search
>>> It seems to me that if you implement option 3 below taking the
>>> per-domain rwlock in read mode in vpci_{read|write} will already
>>> protect you from the device being removed if the same per-domain lock
>>> is taken in write mode in vpci_remove_device.
>> Yes, it should. Again this can't be done as a pre-req patch because
>> this relies on pdev->vpci_lock
> Hm, no, I don't think so. You could introduce this per-domain rwlock
> in a prepatch, and then move the vpci lock outside of the vpci struct.
> I see no problem with that.
>
 2. The only offending place which is in the way of pci_dev->vpci_lock is
 modify_bars. If it can be re-worked to track already mapped and unmapped
 regions then we can avoid having a possible deadlock and can use
 pci_dev->vpci_lock (rangesets won't help here as we also need refcounting 
 be
 implemented).
>>> I think a refcounting based solution will be very complex to
>>> implement. I'm however happy to be proven wrong.
>> I can't estimate, but I have a feeling that all these plays around locking
>> is just because of this single piece of code. No other place suffer from
>> pdev->vpci_lock and no d->lock
 If pcidevs_lock is used for vpci_{read|write} then no deadlock is possible,
 but modify_bars code must be re-worked not to lock itself (pdev->vpci_lock 
 and
 tmp->vpci_lock when pdev == tmp, this is minor).
>>> Taking the pcidevs lock (a global lock) is out of the picture IMO, as
>>> it's going to serialize all calls of vpci_{read|write}, and would
>>> create too much contention on the pcidevs lock.
>> I understand that. But if we would like to fix the existing code I see
>> no other alternative.
 3. We may think about a per-domain rwlock and pdev->vpci_lock, so this 
 solves
 modify_bars's two pdevs access. But this doesn't solve possible pdev
 de-reference in vpci_{read|write} vs pci_remove_device.
>>> pci_remove device will call vpci_remove_device, so as long as
>>> vpci_remove_device taken the per-domain lock in write (exclusive) mode
>>> it should be fine.
>> I think I need to see if there are any other places which similarly
>> require the write lock
 @Roger, @Jan, I would like to hear what do you think about the above 
 analysis
 and how can we proceed with locking re-work?
>>> I think the per-domain rwlock seems like a good option. I would do
>>> that as a pre-patch.
>> It is. But it seems it won't solve the thing we started this adventure for:
>>
>> With per-domain read lock and still ABBA in modify_bars (hope the below
>> is correctly seen with a monospace font):
>>
>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->
>>   rom_write -> modify_bars: tmp (pdev2) ->lock
>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
>> tmp (pdev1) ->lock
>>
>> There is no API to upgrade read lock to write lock in modify_bars which 
>> could help,
>> so in both cases vpci_write should take write lock.
> I've thought more than once that it would be nice to have a
> write_{upgrade,downgrade} (read_downgrade maybe?) or similar helper.
Yes, this is the real use-case for that
>
> I think you could also drop the read lock, take the write lock and
> check that >vpci->header == header in order to be sure
> pdev->vpci hasn't been recreated.
And have pdev freed in between
>   You would have to do similar in
> order to get back again from a write lock into a read one.
Not sure this is reliable.
>
> We should avoid taking the rwlock in write mode in vpci_write
> unconditionally.
Yes, but without upgrading the read lock I see no way 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
> > On 07.02.22 14:46, Roger Pau Monné wrote:
> >> I think the per-domain rwlock seems like a good option. I would do
> >> that as a pre-patch.
> > It is. But it seems it won't solve the thing we started this adventure for:
> > 
> > With per-domain read lock and still ABBA in modify_bars (hope the below
> > is correctly seen with a monospace font):
> > 
> > cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
> >    rom_write -> modify_bars: tmp (pdev2) ->lock
> > cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
> > tmp (pdev1) ->lock
> > 
> > There is no API to upgrade read lock to write lock in modify_bars which 
> > could help,
> > so in both cases vpci_write should take write lock.
> 
> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
> to acquire the write lock, but its (perhaps indirect) caller. Effectively
> vpci_write() would need to take the write lock if the range written
> overlaps the BARs or the command register.

I'm confused. If we use a per-domain rwlock approach there would be no
need to lock tmp again in modify_bars, because we should hold the
rwlock in write mode, so there's no ABBA?

We will have however to drop the per domain read and vpci locks and
pick the per-domain lock in write mode.

Thanks, Roger.



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 01:53:34PM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 14:46, Roger Pau Monné wrote:
> > On Mon, Feb 07, 2022 at 11:08:39AM +, Oleksandr Andrushchenko wrote:
> >> ==
> >>
> >> Bottom line:
> >> ==
> >>
> >> 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
> >> parallel with pci_remove_device which can remove pdev after 
> >> vpci_{read|write}
> >> acquired the pdev pointer. This may lead to a fail due to pdev dereference.
> >>
> >> So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.
> > We would like to take the pcidevs_lock only while fetching the device
> > (ie: pci_get_pdev_by_domain), afterwards it should be fine to lock the
> > device using a vpci specific lock so calls to vpci_{read,write} can be
> > partially concurrent across multiple domains.
> This means this can't be done a pre-req patch, but as a part of the
> patch which changes locking.
> >
> > In fact I think Jan had already pointed out that the pci lock would
> > need taking while searching for the device in vpci_{read,write}.
> I was referring to the time after we found pdev and it is currently
> possible to free pdev while using it after the search
> >
> > It seems to me that if you implement option 3 below taking the
> > per-domain rwlock in read mode in vpci_{read|write} will already
> > protect you from the device being removed if the same per-domain lock
> > is taken in write mode in vpci_remove_device.
> Yes, it should. Again this can't be done as a pre-req patch because
> this relies on pdev->vpci_lock

Hm, no, I don't think so. You could introduce this per-domain rwlock
in a prepatch, and then move the vpci lock outside of the vpci struct.
I see no problem with that.

> >
> >> 2. The only offending place which is in the way of pci_dev->vpci_lock is
> >> modify_bars. If it can be re-worked to track already mapped and unmapped
> >> regions then we can avoid having a possible deadlock and can use
> >> pci_dev->vpci_lock (rangesets won't help here as we also need refcounting 
> >> be
> >> implemented).
> > I think a refcounting based solution will be very complex to
> > implement. I'm however happy to be proven wrong.
> I can't estimate, but I have a feeling that all these plays around locking
> is just because of this single piece of code. No other place suffer from
> pdev->vpci_lock and no d->lock
> >
> >> If pcidevs_lock is used for vpci_{read|write} then no deadlock is possible,
> >> but modify_bars code must be re-worked not to lock itself (pdev->vpci_lock 
> >> and
> >> tmp->vpci_lock when pdev == tmp, this is minor).
> > Taking the pcidevs lock (a global lock) is out of the picture IMO, as
> > it's going to serialize all calls of vpci_{read|write}, and would
> > create too much contention on the pcidevs lock.
> I understand that. But if we would like to fix the existing code I see
> no other alternative.
> >
> >> 3. We may think about a per-domain rwlock and pdev->vpci_lock, so this 
> >> solves
> >> modify_bars's two pdevs access. But this doesn't solve possible pdev
> >> de-reference in vpci_{read|write} vs pci_remove_device.
> > pci_remove device will call vpci_remove_device, so as long as
> > vpci_remove_device taken the per-domain lock in write (exclusive) mode
> > it should be fine.
> I think I need to see if there are any other places which similarly
> require the write lock
> >
> >> @Roger, @Jan, I would like to hear what do you think about the above 
> >> analysis
> >> and how can we proceed with locking re-work?
> > I think the per-domain rwlock seems like a good option. I would do
> > that as a pre-patch.
> It is. But it seems it won't solve the thing we started this adventure for:
> 
> With per-domain read lock and still ABBA in modify_bars (hope the below
> is correctly seen with a monospace font):
> 
> cpu0: vpci_write-> d->RLock -> pdev1->lock -> 
>  rom_write -> modify_bars: tmp (pdev2) ->lock
> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
> tmp (pdev1) ->lock
> 
> There is no API to upgrade read lock to write lock in modify_bars which could 
> help,
> so in both cases vpci_write should take write lock.

I've thought more than once that it would be nice to have a
write_{upgrade,downgrade} (read_downgrade maybe?) or similar helper.

I think you could also drop the read lock, take the write lock and
check that >vpci->header == header in order to be sure
pdev->vpci hasn't been recreated. You would have to do similar in
order to get back again from a write lock into a read one.

We should avoid taking the rwlock in write mode in vpci_write
unconditionally.

Thanks, Roger.



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
> On 07.02.22 14:46, Roger Pau Monné wrote:
>> I think the per-domain rwlock seems like a good option. I would do
>> that as a pre-patch.
> It is. But it seems it won't solve the thing we started this adventure for:
> 
> With per-domain read lock and still ABBA in modify_bars (hope the below
> is correctly seen with a monospace font):
> 
> cpu0: vpci_write-> d->RLock -> pdev1->lock -> 
>  rom_write -> modify_bars: tmp (pdev2) ->lock
> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
> tmp (pdev1) ->lock
> 
> There is no API to upgrade read lock to write lock in modify_bars which could 
> help,
> so in both cases vpci_write should take write lock.

Hmm, yes, I think you're right: It's not modify_bars() itself which needs
to acquire the write lock, but its (perhaps indirect) caller. Effectively
vpci_write() would need to take the write lock if the range written
overlaps the BARs or the command register.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 14:46, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 11:08:39AM +, Oleksandr Andrushchenko wrote:
>> Hello,
>>
>> On 04.02.22 16:57, Roger Pau Monné wrote:
>>> On Fri, Feb 04, 2022 at 02:43:07PM +, Oleksandr Andrushchenko wrote:
 On 04.02.22 15:06, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
>> On 04.02.22 14:47, Jan Beulich wrote:
>>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
 On 04.02.22 13:37, Jan Beulich wrote:
> On 04.02.2022 12:13, Roger Pau Monné wrote:
>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
 On 04.02.22 11:15, Jan Beulich wrote:
> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>> On 04.02.22 09:52, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 @@ -285,6 +286,12 @@ static int modify_bars(const struct 
 pci_dev *pdev, uint16_t cmd, bool rom_only)
 continue;
 }
 
 +spin_lock(>vpci_lock);
 +if ( !tmp->vpci )
 +{
 +spin_unlock(>vpci_lock);
 +continue;
 +}
 for ( i = 0; i < 
 ARRAY_SIZE(tmp->vpci->header.bars); i++ )
 {
 const struct vpci_bar *bar = 
 >vpci->header.bars[i];
 @@ -303,12 +310,14 @@ static int modify_bars(const struct 
 pci_dev *pdev, uint16_t cmd, bool rom_only)
 rc = rangeset_remove_range(mem, start, 
 end);
 if ( rc )
 {
 +spin_unlock(>vpci_lock);
 printk(XENLOG_G_WARNING "Failed to 
 remove [%lx, %lx]: %d\n",
start, end, rc);
 rangeset_destroy(mem);
 return rc;
 }
 }
 +spin_unlock(>vpci_lock);
 }
>>> At the first glance this simply looks like another unjustified 
>>> (in the
>>> description) change, as you're not converting anything here but 
>>> you
>>> actually add locking (and I realize this was there before, so 
>>> I'm sorry
>>> for not pointing this out earlier).
>> Well, I thought that the description already has "...the lock 
>> can be
>> used (and in a few cases is used right away) to check whether 
>> vpci
>> is present" and this is enough for such uses as here.
>>> But then I wonder whether you
>>> actually tested this, since I can't help getting the impression 
>>> that
>>> you're introducing a live-lock: The function is called from 
>>> cmd_write()
>>> and rom_write(), which in turn are called out of vpci_write(). 
>>> Yet that
>>> function already holds the lock, and the lock is not (currently)
>>> recursive. (For the 3rd caller of the function - init_bars() - 
>>> otoh
>>> the locking looks to be entirely unnecessary.)
>> Well, you are correct: if tmp != pdev then it is correct to 
>> acquire
>> the lock. But if tmp == pdev and rom_only == true
>> then we'll deadlock.
>>
>> It seems we need to have the locking conditional, e.g. only lock
>> if tmp != pdev
> Which will address the live-lock, but introduce ABBA deadlock 
> potential
> between the two locks.
 I am not sure I can suggest a better solution here
 @Roger, @Jan, could you please help here?
>>> Well, first of all I'd like to mention that while it may have been 
>>> okay to
>>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
>>> dealing
>>> with DomU-s' lists of PCI devices. The requirement really applies 
>>> to the
>>> other use of for_each_pdev() as well (in vpci_dump_msi()), except 
>>> that
>>> there it probably wants to be a try-lock.
>>>
>>> Next I'd like to point out that here we have the still pending 
>>> issue of
>>> how to deal with hidden devices, which Dom0 can access. See my 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 13:57, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 14:34, Jan Beulich wrote:
>> On 07.02.2022 12:08, Oleksandr Andrushchenko wrote:
>>> 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
>>> parallel with pci_remove_device which can remove pdev after 
>>> vpci_{read|write}
>>> acquired the pdev pointer. This may lead to a fail due to pdev dereference.
>>>
>>> So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.
>> I think this is not the only place where there is a theoretical race
>> against pci_remove_device().
> Not at all, that was just to demonstrate one of the possible sources of races.
>>   I would recommend to separate the
>> overall situation with pcidevs_lock from the issue here.
> Do you agree that there is already an issue with that? In the currently 
> existing code?
>>   I don't view
>> it as an option to acquire pcidevs_lock in vpci_{read,write}().
> Yes, that would hurt too much, I agree. But this needs to be solved
>>   If
>> anything, we need proper refcounting of PCI devices (at which point
>> likely a number of lock uses can go away).
> It seems so. Then not only pdev's need refcounting, but pdev->vpci as well
> 
> What's your view on how can we achieve both goals?
> pdev and pdev->vpci and locking/refcounting

I don't see why pdev->vpci might need refcounting. And just to state it
in different words: I'd like to suggest to leave aside the pdev locking
as long as it's _just_ to protect against hot remove of a device. That's
orthogonal to what you need for vPCI, where you need to protect
against the device disappearing from a guest (without at the same time
disappearing from the host).

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 14:34, Jan Beulich wrote:
> On 07.02.2022 12:08, Oleksandr Andrushchenko wrote:
>> 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
>> parallel with pci_remove_device which can remove pdev after vpci_{read|write}
>> acquired the pdev pointer. This may lead to a fail due to pdev dereference.
>>
>> So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.
> I think this is not the only place where there is a theoretical race
> against pci_remove_device().
Not at all, that was just to demonstrate one of the possible sources of races.
>   I would recommend to separate the
> overall situation with pcidevs_lock from the issue here.
Do you agree that there is already an issue with that? In the currently 
existing code?
>   I don't view
> it as an option to acquire pcidevs_lock in vpci_{read,write}().
Yes, that would hurt too much, I agree. But this needs to be solved
>   If
> anything, we need proper refcounting of PCI devices (at which point
> likely a number of lock uses can go away).
It seems so. Then not only pdev's need refcounting, but pdev->vpci as well

What's your view on how can we achieve both goals?
pdev and pdev->vpci and locking/refcounting
This is really crucial for all the code for PCI passthrough on Arm because
without this ground work done we can't accept all the patches which rely
on this: vPCI changes, MSI/MSI-X etc.
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 11:08:39AM +, Oleksandr Andrushchenko wrote:
> Hello,
> 
> On 04.02.22 16:57, Roger Pau Monné wrote:
> > On Fri, Feb 04, 2022 at 02:43:07PM +, Oleksandr Andrushchenko wrote:
> >>
> >> On 04.02.22 15:06, Roger Pau Monné wrote:
> >>> On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
>  On 04.02.22 14:47, Jan Beulich wrote:
> > On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 13:37, Jan Beulich wrote:
> >>> On 04.02.2022 12:13, Roger Pau Monné wrote:
>  On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> > On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 11:15, Jan Beulich wrote:
> >>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>  On 04.02.22 09:52, Jan Beulich wrote:
> > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> >> @@ -285,6 +286,12 @@ static int modify_bars(const struct 
> >> pci_dev *pdev, uint16_t cmd, bool rom_only)
> >>continue;
> >>}
> >>
> >> +spin_lock(>vpci_lock);
> >> +if ( !tmp->vpci )
> >> +{
> >> +spin_unlock(>vpci_lock);
> >> +continue;
> >> +}
> >>for ( i = 0; i < 
> >> ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> >>{
> >>const struct vpci_bar *bar = 
> >> >vpci->header.bars[i];
> >> @@ -303,12 +310,14 @@ static int modify_bars(const struct 
> >> pci_dev *pdev, uint16_t cmd, bool rom_only)
> >>rc = rangeset_remove_range(mem, start, end);
> >>if ( rc )
> >>{
> >> +spin_unlock(>vpci_lock);
> >>printk(XENLOG_G_WARNING "Failed to 
> >> remove [%lx, %lx]: %d\n",
> >>   start, end, rc);
> >>rangeset_destroy(mem);
> >>return rc;
> >>}
> >>}
> >> +spin_unlock(>vpci_lock);
> >>}
> > At the first glance this simply looks like another unjustified 
> > (in the
> > description) change, as you're not converting anything here but 
> > you
> > actually add locking (and I realize this was there before, so 
> > I'm sorry
> > for not pointing this out earlier).
>  Well, I thought that the description already has "...the lock 
>  can be
>  used (and in a few cases is used right away) to check whether 
>  vpci
>  is present" and this is enough for such uses as here.
> >But then I wonder whether you
> > actually tested this, since I can't help getting the impression 
> > that
> > you're introducing a live-lock: The function is called from 
> > cmd_write()
> > and rom_write(), which in turn are called out of vpci_write(). 
> > Yet that
> > function already holds the lock, and the lock is not (currently)
> > recursive. (For the 3rd caller of the function - init_bars() - 
> > otoh
> > the locking looks to be entirely unnecessary.)
>  Well, you are correct: if tmp != pdev then it is correct to 
>  acquire
>  the lock. But if tmp == pdev and rom_only == true
>  then we'll deadlock.
> 
>  It seems we need to have the locking conditional, e.g. only lock
>  if tmp != pdev
> >>> Which will address the live-lock, but introduce ABBA deadlock 
> >>> potential
> >>> between the two locks.
> >> I am not sure I can suggest a better solution here
> >> @Roger, @Jan, could you please help here?
> > Well, first of all I'd like to mention that while it may have been 
> > okay to
> > not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> > dealing
> > with DomU-s' lists of PCI devices. The requirement really applies 
> > to the
> > other use of for_each_pdev() as well (in vpci_dump_msi()), except 
> > that
> > there it probably wants to be a try-lock.
> >
> > Next I'd like to point out that here we have the still pending 
> > issue of
> > how to deal with hidden devices, which Dom0 can access. See my RFC 
> > patch
> > "vPCI: account for hidden devices in 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 12:08, Oleksandr Andrushchenko wrote:
> 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
> parallel with pci_remove_device which can remove pdev after vpci_{read|write}
> acquired the pdev pointer. This may lead to a fail due to pdev dereference.
> 
> So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.

I think this is not the only place where there is a theoretical race
against pci_remove_device(). I would recommend to separate the
overall situation with pcidevs_lock from the issue here. I don't view
it as an option to acquire pcidevs_lock in vpci_{read,write}(). If
anything, we need proper refcounting of PCI devices (at which point
likely a number of lock uses can go away).

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko
Hello,

On 04.02.22 16:57, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 02:43:07PM +, Oleksandr Andrushchenko wrote:
>>
>> On 04.02.22 15:06, Roger Pau Monné wrote:
>>> On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
 On 04.02.22 14:47, Jan Beulich wrote:
> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
>> On 04.02.22 13:37, Jan Beulich wrote:
>>> On 04.02.2022 12:13, Roger Pau Monné wrote:
 On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>> On 04.02.22 11:15, Jan Beulich wrote:
>>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
 On 04.02.22 09:52, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
>> *pdev, uint16_t cmd, bool rom_only)
>>continue;
>>}
>>
>> +spin_lock(>vpci_lock);
>> +if ( !tmp->vpci )
>> +{
>> +spin_unlock(>vpci_lock);
>> +continue;
>> +}
>>for ( i = 0; i < 
>> ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>{
>>const struct vpci_bar *bar = 
>> >vpci->header.bars[i];
>> @@ -303,12 +310,14 @@ static int modify_bars(const struct 
>> pci_dev *pdev, uint16_t cmd, bool rom_only)
>>rc = rangeset_remove_range(mem, start, end);
>>if ( rc )
>>{
>> +spin_unlock(>vpci_lock);
>>printk(XENLOG_G_WARNING "Failed to remove 
>> [%lx, %lx]: %d\n",
>>   start, end, rc);
>>rangeset_destroy(mem);
>>return rc;
>>}
>>}
>> +spin_unlock(>vpci_lock);
>>}
> At the first glance this simply looks like another unjustified 
> (in the
> description) change, as you're not converting anything here but 
> you
> actually add locking (and I realize this was there before, so I'm 
> sorry
> for not pointing this out earlier).
 Well, I thought that the description already has "...the lock can 
 be
 used (and in a few cases is used right away) to check whether vpci
 is present" and this is enough for such uses as here.
>But then I wonder whether you
> actually tested this, since I can't help getting the impression 
> that
> you're introducing a live-lock: The function is called from 
> cmd_write()
> and rom_write(), which in turn are called out of vpci_write(). 
> Yet that
> function already holds the lock, and the lock is not (currently)
> recursive. (For the 3rd caller of the function - init_bars() - 
> otoh
> the locking looks to be entirely unnecessary.)
 Well, you are correct: if tmp != pdev then it is correct to acquire
 the lock. But if tmp == pdev and rom_only == true
 then we'll deadlock.

 It seems we need to have the locking conditional, e.g. only lock
 if tmp != pdev
>>> Which will address the live-lock, but introduce ABBA deadlock 
>>> potential
>>> between the two locks.
>> I am not sure I can suggest a better solution here
>> @Roger, @Jan, could you please help here?
> Well, first of all I'd like to mention that while it may have been 
> okay to
> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> dealing
> with DomU-s' lists of PCI devices. The requirement really applies to 
> the
> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> there it probably wants to be a try-lock.
>
> Next I'd like to point out that here we have the still pending issue 
> of
> how to deal with hidden devices, which Dom0 can access. See my RFC 
> patch
> "vPCI: account for hidden devices in modify_bars()". Whatever the 
> solution
> here, I think it wants to at least account for the extra need there.
 Yes, sorry, I should take care of that.

> Now it is quite clear that pcidevs_lock isn't going to help with 
> avoiding
> the deadlock, as it's 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 02:43:07PM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 15:06, Roger Pau Monné wrote:
> > On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
> >>
> >> On 04.02.22 14:47, Jan Beulich wrote:
> >>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
>  On 04.02.22 13:37, Jan Beulich wrote:
> > On 04.02.2022 12:13, Roger Pau Monné wrote:
> >> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> >>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>  On 04.02.22 11:15, Jan Beulich wrote:
> > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 09:52, Jan Beulich wrote:
> >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>  @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
>  *pdev, uint16_t cmd, bool rom_only)
>    continue;
>    }
>    
>  +spin_lock(>vpci_lock);
>  +if ( !tmp->vpci )
>  +{
>  +spin_unlock(>vpci_lock);
>  +continue;
>  +}
>    for ( i = 0; i < 
>  ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>    {
>    const struct vpci_bar *bar = 
>  >vpci->header.bars[i];
>  @@ -303,12 +310,14 @@ static int modify_bars(const struct 
>  pci_dev *pdev, uint16_t cmd, bool rom_only)
>    rc = rangeset_remove_range(mem, start, end);
>    if ( rc )
>    {
>  +spin_unlock(>vpci_lock);
>    printk(XENLOG_G_WARNING "Failed to remove 
>  [%lx, %lx]: %d\n",
>   start, end, rc);
>    rangeset_destroy(mem);
>    return rc;
>    }
>    }
>  +spin_unlock(>vpci_lock);
>    }
> >>> At the first glance this simply looks like another unjustified 
> >>> (in the
> >>> description) change, as you're not converting anything here but 
> >>> you
> >>> actually add locking (and I realize this was there before, so I'm 
> >>> sorry
> >>> for not pointing this out earlier).
> >> Well, I thought that the description already has "...the lock can 
> >> be
> >> used (and in a few cases is used right away) to check whether vpci
> >> is present" and this is enough for such uses as here.
> >>>   But then I wonder whether you
> >>> actually tested this, since I can't help getting the impression 
> >>> that
> >>> you're introducing a live-lock: The function is called from 
> >>> cmd_write()
> >>> and rom_write(), which in turn are called out of vpci_write(). 
> >>> Yet that
> >>> function already holds the lock, and the lock is not (currently)
> >>> recursive. (For the 3rd caller of the function - init_bars() - 
> >>> otoh
> >>> the locking looks to be entirely unnecessary.)
> >> Well, you are correct: if tmp != pdev then it is correct to acquire
> >> the lock. But if tmp == pdev and rom_only == true
> >> then we'll deadlock.
> >>
> >> It seems we need to have the locking conditional, e.g. only lock
> >> if tmp != pdev
> > Which will address the live-lock, but introduce ABBA deadlock 
> > potential
> > between the two locks.
>  I am not sure I can suggest a better solution here
>  @Roger, @Jan, could you please help here?
> >>> Well, first of all I'd like to mention that while it may have been 
> >>> okay to
> >>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> >>> dealing
> >>> with DomU-s' lists of PCI devices. The requirement really applies to 
> >>> the
> >>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> >>> there it probably wants to be a try-lock.
> >>>
> >>> Next I'd like to point out that here we have the still pending issue 
> >>> of
> >>> how to deal with hidden devices, which Dom0 can access. See my RFC 
> >>> patch
> >>> "vPCI: account for hidden devices in modify_bars()". Whatever the 
> >>> solution
> >>> here, I think it wants to at least account for the extra need there.
> >> Yes, sorry, I should take care of that.
> >>
> >>> Now it is quite clear that pcidevs_lock isn't going to help with 
> >>> avoiding
> >>> the deadlock, as it's imo not an option at all to acquire that lock
> >>> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 15:06, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
>>
>> On 04.02.22 14:47, Jan Beulich wrote:
>>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
 On 04.02.22 13:37, Jan Beulich wrote:
> On 04.02.2022 12:13, Roger Pau Monné wrote:
>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
 On 04.02.22 11:15, Jan Beulich wrote:
> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>> On 04.02.22 09:52, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
 *pdev, uint16_t cmd, bool rom_only)
   continue;
   }
   
 +spin_lock(>vpci_lock);
 +if ( !tmp->vpci )
 +{
 +spin_unlock(>vpci_lock);
 +continue;
 +}
   for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); 
 i++ )
   {
   const struct vpci_bar *bar = 
 >vpci->header.bars[i];
 @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
 *pdev, uint16_t cmd, bool rom_only)
   rc = rangeset_remove_range(mem, start, end);
   if ( rc )
   {
 +spin_unlock(>vpci_lock);
   printk(XENLOG_G_WARNING "Failed to remove 
 [%lx, %lx]: %d\n",
  start, end, rc);
   rangeset_destroy(mem);
   return rc;
   }
   }
 +spin_unlock(>vpci_lock);
   }
>>> At the first glance this simply looks like another unjustified (in 
>>> the
>>> description) change, as you're not converting anything here but you
>>> actually add locking (and I realize this was there before, so I'm 
>>> sorry
>>> for not pointing this out earlier).
>> Well, I thought that the description already has "...the lock can be
>> used (and in a few cases is used right away) to check whether vpci
>> is present" and this is enough for such uses as here.
>>>   But then I wonder whether you
>>> actually tested this, since I can't help getting the impression that
>>> you're introducing a live-lock: The function is called from 
>>> cmd_write()
>>> and rom_write(), which in turn are called out of vpci_write(). Yet 
>>> that
>>> function already holds the lock, and the lock is not (currently)
>>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>>> the locking looks to be entirely unnecessary.)
>> Well, you are correct: if tmp != pdev then it is correct to acquire
>> the lock. But if tmp == pdev and rom_only == true
>> then we'll deadlock.
>>
>> It seems we need to have the locking conditional, e.g. only lock
>> if tmp != pdev
> Which will address the live-lock, but introduce ABBA deadlock 
> potential
> between the two locks.
 I am not sure I can suggest a better solution here
 @Roger, @Jan, could you please help here?
>>> Well, first of all I'd like to mention that while it may have been okay 
>>> to
>>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
>>> dealing
>>> with DomU-s' lists of PCI devices. The requirement really applies to the
>>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>>> there it probably wants to be a try-lock.
>>>
>>> Next I'd like to point out that here we have the still pending issue of
>>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>>> "vPCI: account for hidden devices in modify_bars()". Whatever the 
>>> solution
>>> here, I think it wants to at least account for the extra need there.
>> Yes, sorry, I should take care of that.
>>
>>> Now it is quite clear that pcidevs_lock isn't going to help with 
>>> avoiding
>>> the deadlock, as it's imo not an option at all to acquire that lock
>>> everywhere else you access ->vpci (or else the vpci lock itself would be
>>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>>> would acquire it in read mode, and here you'd acquire it in write mode 
>>> (in
>>> the former case around the vpci lock, while in the latter case there may
>>> then not be any need to 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 14:47, Jan Beulich wrote:
> > On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
> >>
> >> On 04.02.22 13:37, Jan Beulich wrote:
> >>> On 04.02.2022 12:13, Roger Pau Monné wrote:
>  On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> > On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 11:15, Jan Beulich wrote:
> >>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>  On 04.02.22 09:52, Jan Beulich wrote:
> > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> >> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
> >> *pdev, uint16_t cmd, bool rom_only)
> >>  continue;
> >>  }
> >>  
> >> +spin_lock(>vpci_lock);
> >> +if ( !tmp->vpci )
> >> +{
> >> +spin_unlock(>vpci_lock);
> >> +continue;
> >> +}
> >>  for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); 
> >> i++ )
> >>  {
> >>  const struct vpci_bar *bar = 
> >> >vpci->header.bars[i];
> >> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
> >> *pdev, uint16_t cmd, bool rom_only)
> >>  rc = rangeset_remove_range(mem, start, end);
> >>  if ( rc )
> >>  {
> >> +spin_unlock(>vpci_lock);
> >>  printk(XENLOG_G_WARNING "Failed to remove 
> >> [%lx, %lx]: %d\n",
> >> start, end, rc);
> >>  rangeset_destroy(mem);
> >>  return rc;
> >>  }
> >>  }
> >> +spin_unlock(>vpci_lock);
> >>  }
> > At the first glance this simply looks like another unjustified (in 
> > the
> > description) change, as you're not converting anything here but you
> > actually add locking (and I realize this was there before, so I'm 
> > sorry
> > for not pointing this out earlier).
>  Well, I thought that the description already has "...the lock can be
>  used (and in a few cases is used right away) to check whether vpci
>  is present" and this is enough for such uses as here.
> >  But then I wonder whether you
> > actually tested this, since I can't help getting the impression that
> > you're introducing a live-lock: The function is called from 
> > cmd_write()
> > and rom_write(), which in turn are called out of vpci_write(). Yet 
> > that
> > function already holds the lock, and the lock is not (currently)
> > recursive. (For the 3rd caller of the function - init_bars() - otoh
> > the locking looks to be entirely unnecessary.)
>  Well, you are correct: if tmp != pdev then it is correct to acquire
>  the lock. But if tmp == pdev and rom_only == true
>  then we'll deadlock.
> 
>  It seems we need to have the locking conditional, e.g. only lock
>  if tmp != pdev
> >>> Which will address the live-lock, but introduce ABBA deadlock 
> >>> potential
> >>> between the two locks.
> >> I am not sure I can suggest a better solution here
> >> @Roger, @Jan, could you please help here?
> > Well, first of all I'd like to mention that while it may have been okay 
> > to
> > not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> > dealing
> > with DomU-s' lists of PCI devices. The requirement really applies to the
> > other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> > there it probably wants to be a try-lock.
> >
> > Next I'd like to point out that here we have the still pending issue of
> > how to deal with hidden devices, which Dom0 can access. See my RFC patch
> > "vPCI: account for hidden devices in modify_bars()". Whatever the 
> > solution
> > here, I think it wants to at least account for the extra need there.
>  Yes, sorry, I should take care of that.
> 
> > Now it is quite clear that pcidevs_lock isn't going to help with 
> > avoiding
> > the deadlock, as it's imo not an option at all to acquire that lock
> > everywhere else you access ->vpci (or else the vpci lock itself would be
> > pointless). But a per-domain auxiliary r/w lock may help: Other paths
> > would acquire it in read mode, and here you'd acquire it in write mode 
> > (in
> > the former case around the vpci lock, while in the latter case there may
> > then not be any need to acquire the individual vpci locks at all). 
> > FTAOD:
> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 13:53, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 14:47, Jan Beulich wrote:
>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
>>>
>>> On 04.02.22 13:37, Jan Beulich wrote:
 On 04.02.2022 12:13, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 11:15, Jan Beulich wrote:
 On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
>>> *pdev, uint16_t cmd, bool rom_only)
>>>  continue;
>>>  }
>>>  
>>> +spin_lock(>vpci_lock);
>>> +if ( !tmp->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +continue;
>>> +}
>>>  for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); 
>>> i++ )
>>>  {
>>>  const struct vpci_bar *bar = 
>>> >vpci->header.bars[i];
>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
>>> *pdev, uint16_t cmd, bool rom_only)
>>>  rc = rangeset_remove_range(mem, start, end);
>>>  if ( rc )
>>>  {
>>> +spin_unlock(>vpci_lock);
>>>  printk(XENLOG_G_WARNING "Failed to remove 
>>> [%lx, %lx]: %d\n",
>>> start, end, rc);
>>>  rangeset_destroy(mem);
>>>  return rc;
>>>  }
>>>  }
>>> +spin_unlock(>vpci_lock);
>>>  }
>> At the first glance this simply looks like another unjustified (in 
>> the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm 
>> sorry
>> for not pointing this out earlier).
> Well, I thought that the description already has "...the lock can be
> used (and in a few cases is used right away) to check whether vpci
> is present" and this is enough for such uses as here.
>>  But then I wonder whether you
>> actually tested this, since I can't help getting the impression that
>> you're introducing a live-lock: The function is called from 
>> cmd_write()
>> and rom_write(), which in turn are called out of vpci_write(). Yet 
>> that
>> function already holds the lock, and the lock is not (currently)
>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>> the locking looks to be entirely unnecessary.)
> Well, you are correct: if tmp != pdev then it is correct to acquire
> the lock. But if tmp == pdev and rom_only == true
> then we'll deadlock.
>
> It seems we need to have the locking conditional, e.g. only lock
> if tmp != pdev
 Which will address the live-lock, but introduce ABBA deadlock potential
 between the two locks.
>>> I am not sure I can suggest a better solution here
>>> @Roger, @Jan, could you please help here?
>> Well, first of all I'd like to mention that while it may have been okay 
>> to
>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
>> dealing
>> with DomU-s' lists of PCI devices. The requirement really applies to the
>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>> there it probably wants to be a try-lock.
>>
>> Next I'd like to point out that here we have the still pending issue of
>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>> "vPCI: account for hidden devices in modify_bars()". Whatever the 
>> solution
>> here, I think it wants to at least account for the extra need there.
> Yes, sorry, I should take care of that.
>
>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
>> the deadlock, as it's imo not an option at all to acquire that lock
>> everywhere else you access ->vpci (or else the vpci lock itself would be
>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>> would acquire it in read mode, and here you'd acquire it in write mode 
>> (in
>> the former case around the vpci lock, while in the latter case there may
>> then not be any need to acquire the individual vpci locks at all). FTAOD:
>> I haven't fully thought through all implications (and hence whether this 
>> is
>> viable in the first place); I expect you 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 14:47, Jan Beulich wrote:
> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
>>
>> On 04.02.22 13:37, Jan Beulich wrote:
>>> On 04.02.2022 12:13, Roger Pau Monné wrote:
 On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>> On 04.02.22 11:15, Jan Beulich wrote:
>>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
 On 04.02.22 09:52, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
>> *pdev, uint16_t cmd, bool rom_only)
>>  continue;
>>  }
>>  
>> +spin_lock(>vpci_lock);
>> +if ( !tmp->vpci )
>> +{
>> +spin_unlock(>vpci_lock);
>> +continue;
>> +}
>>  for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); 
>> i++ )
>>  {
>>  const struct vpci_bar *bar = 
>> >vpci->header.bars[i];
>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
>> *pdev, uint16_t cmd, bool rom_only)
>>  rc = rangeset_remove_range(mem, start, end);
>>  if ( rc )
>>  {
>> +spin_unlock(>vpci_lock);
>>  printk(XENLOG_G_WARNING "Failed to remove [%lx, 
>> %lx]: %d\n",
>> start, end, rc);
>>  rangeset_destroy(mem);
>>  return rc;
>>  }
>>  }
>> +spin_unlock(>vpci_lock);
>>  }
> At the first glance this simply looks like another unjustified (in the
> description) change, as you're not converting anything here but you
> actually add locking (and I realize this was there before, so I'm 
> sorry
> for not pointing this out earlier).
 Well, I thought that the description already has "...the lock can be
 used (and in a few cases is used right away) to check whether vpci
 is present" and this is enough for such uses as here.
>  But then I wonder whether you
> actually tested this, since I can't help getting the impression that
> you're introducing a live-lock: The function is called from 
> cmd_write()
> and rom_write(), which in turn are called out of vpci_write(). Yet 
> that
> function already holds the lock, and the lock is not (currently)
> recursive. (For the 3rd caller of the function - init_bars() - otoh
> the locking looks to be entirely unnecessary.)
 Well, you are correct: if tmp != pdev then it is correct to acquire
 the lock. But if tmp == pdev and rom_only == true
 then we'll deadlock.

 It seems we need to have the locking conditional, e.g. only lock
 if tmp != pdev
>>> Which will address the live-lock, but introduce ABBA deadlock potential
>>> between the two locks.
>> I am not sure I can suggest a better solution here
>> @Roger, @Jan, could you please help here?
> Well, first of all I'd like to mention that while it may have been okay to
> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> dealing
> with DomU-s' lists of PCI devices. The requirement really applies to the
> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> there it probably wants to be a try-lock.
>
> Next I'd like to point out that here we have the still pending issue of
> how to deal with hidden devices, which Dom0 can access. See my RFC patch
> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
> here, I think it wants to at least account for the extra need there.
 Yes, sorry, I should take care of that.

> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
> the deadlock, as it's imo not an option at all to acquire that lock
> everywhere else you access ->vpci (or else the vpci lock itself would be
> pointless). But a per-domain auxiliary r/w lock may help: Other paths
> would acquire it in read mode, and here you'd acquire it in write mode (in
> the former case around the vpci lock, while in the latter case there may
> then not be any need to acquire the individual vpci locks at all). FTAOD:
> I haven't fully thought through all implications (and hence whether this 
> is
> viable in the first place); I expect you will, documenting what you've
> found in the resulting patch description. Of course the double lock
> acquire/release would then likely want hiding in helper functions.
 I've 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 13:37, Jan Beulich wrote:
>> On 04.02.2022 12:13, Roger Pau Monné wrote:
>>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
 On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> On 04.02.22 11:15, Jan Beulich wrote:
>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 09:52, Jan Beulich wrote:
 On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
> *pdev, uint16_t cmd, bool rom_only)
> continue;
> }
> 
> +spin_lock(>vpci_lock);
> +if ( !tmp->vpci )
> +{
> +spin_unlock(>vpci_lock);
> +continue;
> +}
> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> {
> const struct vpci_bar *bar = 
> >vpci->header.bars[i];
> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
> *pdev, uint16_t cmd, bool rom_only)
> rc = rangeset_remove_range(mem, start, end);
> if ( rc )
> {
> +spin_unlock(>vpci_lock);
> printk(XENLOG_G_WARNING "Failed to remove [%lx, 
> %lx]: %d\n",
>start, end, rc);
> rangeset_destroy(mem);
> return rc;
> }
> }
> +spin_unlock(>vpci_lock);
> }
 At the first glance this simply looks like another unjustified (in the
 description) change, as you're not converting anything here but you
 actually add locking (and I realize this was there before, so I'm sorry
 for not pointing this out earlier).
>>> Well, I thought that the description already has "...the lock can be
>>> used (and in a few cases is used right away) to check whether vpci
>>> is present" and this is enough for such uses as here.
 But then I wonder whether you
 actually tested this, since I can't help getting the impression that
 you're introducing a live-lock: The function is called from cmd_write()
 and rom_write(), which in turn are called out of vpci_write(). Yet that
 function already holds the lock, and the lock is not (currently)
 recursive. (For the 3rd caller of the function - init_bars() - otoh
 the locking looks to be entirely unnecessary.)
>>> Well, you are correct: if tmp != pdev then it is correct to acquire
>>> the lock. But if tmp == pdev and rom_only == true
>>> then we'll deadlock.
>>>
>>> It seems we need to have the locking conditional, e.g. only lock
>>> if tmp != pdev
>> Which will address the live-lock, but introduce ABBA deadlock potential
>> between the two locks.
> I am not sure I can suggest a better solution here
> @Roger, @Jan, could you please help here?
 Well, first of all I'd like to mention that while it may have been okay to
 not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
 with DomU-s' lists of PCI devices. The requirement really applies to the
 other use of for_each_pdev() as well (in vpci_dump_msi()), except that
 there it probably wants to be a try-lock.

 Next I'd like to point out that here we have the still pending issue of
 how to deal with hidden devices, which Dom0 can access. See my RFC patch
 "vPCI: account for hidden devices in modify_bars()". Whatever the solution
 here, I think it wants to at least account for the extra need there.
>>> Yes, sorry, I should take care of that.
>>>
 Now it is quite clear that pcidevs_lock isn't going to help with avoiding
 the deadlock, as it's imo not an option at all to acquire that lock
 everywhere else you access ->vpci (or else the vpci lock itself would be
 pointless). But a per-domain auxiliary r/w lock may help: Other paths
 would acquire it in read mode, and here you'd acquire it in write mode (in
 the former case around the vpci lock, while in the latter case there may
 then not be any need to acquire the individual vpci locks at all). FTAOD:
 I haven't fully thought through all implications (and hence whether this is
 viable in the first place); I expect you will, documenting what you've
 found in the resulting patch description. Of course the double lock
 acquire/release would then likely want hiding in helper functions.
>>> I've been also thinking about this, and whether it's really worth to
>>> have a per-device lock rather than a per-domain one that protects all
>>> vpci regions of the devices assigned to the domain.
>>>
>>> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 13:37, Jan Beulich wrote:
> On 04.02.2022 12:13, Roger Pau Monné wrote:
>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
 On 04.02.22 11:15, Jan Beulich wrote:
> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>> On 04.02.22 09:52, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
 *pdev, uint16_t cmd, bool rom_only)
 continue;
 }
 
 +spin_lock(>vpci_lock);
 +if ( !tmp->vpci )
 +{
 +spin_unlock(>vpci_lock);
 +continue;
 +}
 for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
 {
 const struct vpci_bar *bar = 
 >vpci->header.bars[i];
 @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
 *pdev, uint16_t cmd, bool rom_only)
 rc = rangeset_remove_range(mem, start, end);
 if ( rc )
 {
 +spin_unlock(>vpci_lock);
 printk(XENLOG_G_WARNING "Failed to remove [%lx, 
 %lx]: %d\n",
start, end, rc);
 rangeset_destroy(mem);
 return rc;
 }
 }
 +spin_unlock(>vpci_lock);
 }
>>> At the first glance this simply looks like another unjustified (in the
>>> description) change, as you're not converting anything here but you
>>> actually add locking (and I realize this was there before, so I'm sorry
>>> for not pointing this out earlier).
>> Well, I thought that the description already has "...the lock can be
>> used (and in a few cases is used right away) to check whether vpci
>> is present" and this is enough for such uses as here.
>>> But then I wonder whether you
>>> actually tested this, since I can't help getting the impression that
>>> you're introducing a live-lock: The function is called from cmd_write()
>>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>>> function already holds the lock, and the lock is not (currently)
>>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>>> the locking looks to be entirely unnecessary.)
>> Well, you are correct: if tmp != pdev then it is correct to acquire
>> the lock. But if tmp == pdev and rom_only == true
>> then we'll deadlock.
>>
>> It seems we need to have the locking conditional, e.g. only lock
>> if tmp != pdev
> Which will address the live-lock, but introduce ABBA deadlock potential
> between the two locks.
 I am not sure I can suggest a better solution here
 @Roger, @Jan, could you please help here?
>>> Well, first of all I'd like to mention that while it may have been okay to
>>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
>>> with DomU-s' lists of PCI devices. The requirement really applies to the
>>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>>> there it probably wants to be a try-lock.
>>>
>>> Next I'd like to point out that here we have the still pending issue of
>>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>>> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
>>> here, I think it wants to at least account for the extra need there.
>> Yes, sorry, I should take care of that.
>>
>>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
>>> the deadlock, as it's imo not an option at all to acquire that lock
>>> everywhere else you access ->vpci (or else the vpci lock itself would be
>>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>>> would acquire it in read mode, and here you'd acquire it in write mode (in
>>> the former case around the vpci lock, while in the latter case there may
>>> then not be any need to acquire the individual vpci locks at all). FTAOD:
>>> I haven't fully thought through all implications (and hence whether this is
>>> viable in the first place); I expect you will, documenting what you've
>>> found in the resulting patch description. Of course the double lock
>>> acquire/release would then likely want hiding in helper functions.
>> I've been also thinking about this, and whether it's really worth to
>> have a per-device lock rather than a per-domain one that protects all
>> vpci regions of the devices assigned to the domain.
>>
>> The OS is likely to serialize accesses to the PCI config space anyway,
>> and the only place I could see a benefit of having per-device locks is
>> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 11:37:50AM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 04.02.22 13:13, Roger Pau Monné wrote:
> > On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> >> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> >>> On 04.02.22 11:15, Jan Beulich wrote:
>  On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> > On 04.02.22 09:52, Jan Beulich wrote:
> >> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> >>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
> >>> *pdev, uint16_t cmd, bool rom_only)
> >>> continue;
> >>> }
> >>> 
> >>> +spin_lock(>vpci_lock);
> >>> +if ( !tmp->vpci )
> >>> +{
> >>> +spin_unlock(>vpci_lock);
> >>> +continue;
> >>> +}
> >>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> >>> {
> >>> const struct vpci_bar *bar = 
> >>> >vpci->header.bars[i];
> >>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
> >>> *pdev, uint16_t cmd, bool rom_only)
> >>> rc = rangeset_remove_range(mem, start, end);
> >>> if ( rc )
> >>> {
> >>> +spin_unlock(>vpci_lock);
> >>> printk(XENLOG_G_WARNING "Failed to remove [%lx, 
> >>> %lx]: %d\n",
> >>>start, end, rc);
> >>> rangeset_destroy(mem);
> >>> return rc;
> >>> }
> >>> }
> >>> +spin_unlock(>vpci_lock);
> >>> }
> >> At the first glance this simply looks like another unjustified (in the
> >> description) change, as you're not converting anything here but you
> >> actually add locking (and I realize this was there before, so I'm sorry
> >> for not pointing this out earlier).
> > Well, I thought that the description already has "...the lock can be
> > used (and in a few cases is used right away) to check whether vpci
> > is present" and this is enough for such uses as here.
> >> But then I wonder whether you
> >> actually tested this, since I can't help getting the impression that
> >> you're introducing a live-lock: The function is called from cmd_write()
> >> and rom_write(), which in turn are called out of vpci_write(). Yet that
> >> function already holds the lock, and the lock is not (currently)
> >> recursive. (For the 3rd caller of the function - init_bars() - otoh
> >> the locking looks to be entirely unnecessary.)
> > Well, you are correct: if tmp != pdev then it is correct to acquire
> > the lock. But if tmp == pdev and rom_only == true
> > then we'll deadlock.
> >
> > It seems we need to have the locking conditional, e.g. only lock
> > if tmp != pdev
>  Which will address the live-lock, but introduce ABBA deadlock potential
>  between the two locks.
> >>> I am not sure I can suggest a better solution here
> >>> @Roger, @Jan, could you please help here?
> >> Well, first of all I'd like to mention that while it may have been okay to
> >> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
> >> with DomU-s' lists of PCI devices. The requirement really applies to the
> >> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> >> there it probably wants to be a try-lock.
> >>
> >> Next I'd like to point out that here we have the still pending issue of
> >> how to deal with hidden devices, which Dom0 can access. See my RFC patch
> >> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
> >> here, I think it wants to at least account for the extra need there.
> > Yes, sorry, I should take care of that.
> >
> >> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
> >> the deadlock, as it's imo not an option at all to acquire that lock
> >> everywhere else you access ->vpci (or else the vpci lock itself would be
> >> pointless). But a per-domain auxiliary r/w lock may help: Other paths
> >> would acquire it in read mode, and here you'd acquire it in write mode (in
> >> the former case around the vpci lock, while in the latter case there may
> >> then not be any need to acquire the individual vpci locks at all). FTAOD:
> >> I haven't fully thought through all implications (and hence whether this is
> >> viable in the first place); I expect you will, documenting what you've
> >> found in the resulting patch description. Of course the double lock
> >> acquire/release would then likely want hiding in helper functions.
> > I've been also thinking about this, and whether it's really worth to
> > have a per-device lock rather than a per-domain one that protects all
> > vpci regions of the devices assigned to the domain.
> >
> > The OS is likely to 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko


On 04.02.22 13:13, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 11:15, Jan Beulich wrote:
 On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>>> uint16_t cmd, bool rom_only)
>>> continue;
>>> }
>>> 
>>> +spin_lock(>vpci_lock);
>>> +if ( !tmp->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +continue;
>>> +}
>>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>> {
>>> const struct vpci_bar *bar = >vpci->header.bars[i];
>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
>>> *pdev, uint16_t cmd, bool rom_only)
>>> rc = rangeset_remove_range(mem, start, end);
>>> if ( rc )
>>> {
>>> +spin_unlock(>vpci_lock);
>>> printk(XENLOG_G_WARNING "Failed to remove [%lx, 
>>> %lx]: %d\n",
>>>start, end, rc);
>>> rangeset_destroy(mem);
>>> return rc;
>>> }
>>> }
>>> +spin_unlock(>vpci_lock);
>>> }
>> At the first glance this simply looks like another unjustified (in the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm sorry
>> for not pointing this out earlier).
> Well, I thought that the description already has "...the lock can be
> used (and in a few cases is used right away) to check whether vpci
> is present" and this is enough for such uses as here.
>> But then I wonder whether you
>> actually tested this, since I can't help getting the impression that
>> you're introducing a live-lock: The function is called from cmd_write()
>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>> function already holds the lock, and the lock is not (currently)
>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>> the locking looks to be entirely unnecessary.)
> Well, you are correct: if tmp != pdev then it is correct to acquire
> the lock. But if tmp == pdev and rom_only == true
> then we'll deadlock.
>
> It seems we need to have the locking conditional, e.g. only lock
> if tmp != pdev
 Which will address the live-lock, but introduce ABBA deadlock potential
 between the two locks.
>>> I am not sure I can suggest a better solution here
>>> @Roger, @Jan, could you please help here?
>> Well, first of all I'd like to mention that while it may have been okay to
>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
>> with DomU-s' lists of PCI devices. The requirement really applies to the
>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>> there it probably wants to be a try-lock.
>>
>> Next I'd like to point out that here we have the still pending issue of
>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
>> here, I think it wants to at least account for the extra need there.
> Yes, sorry, I should take care of that.
>
>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
>> the deadlock, as it's imo not an option at all to acquire that lock
>> everywhere else you access ->vpci (or else the vpci lock itself would be
>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>> would acquire it in read mode, and here you'd acquire it in write mode (in
>> the former case around the vpci lock, while in the latter case there may
>> then not be any need to acquire the individual vpci locks at all). FTAOD:
>> I haven't fully thought through all implications (and hence whether this is
>> viable in the first place); I expect you will, documenting what you've
>> found in the resulting patch description. Of course the double lock
>> acquire/release would then likely want hiding in helper functions.
> I've been also thinking about this, and whether it's really worth to
> have a per-device lock rather than a per-domain one that protects all
> vpci regions of the devices assigned to the domain.
>
> The OS is likely to serialize accesses to the PCI config space anyway,
> and the only place I could see a benefit of having per-device locks is
> in the handling of MSI-X tables, as the handling of the mask bit is
> likely very performance sensitive, so adding a per-domain lock there
> 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 12:13, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 11:15, Jan Beulich wrote:
 On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>>> uint16_t cmd, bool rom_only)
>>>continue;
>>>}
>>>
>>> +spin_lock(>vpci_lock);
>>> +if ( !tmp->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +continue;
>>> +}
>>>for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>>{
>>>const struct vpci_bar *bar = >vpci->header.bars[i];
>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
>>> *pdev, uint16_t cmd, bool rom_only)
>>>rc = rangeset_remove_range(mem, start, end);
>>>if ( rc )
>>>{
>>> +spin_unlock(>vpci_lock);
>>>printk(XENLOG_G_WARNING "Failed to remove [%lx, 
>>> %lx]: %d\n",
>>>   start, end, rc);
>>>rangeset_destroy(mem);
>>>return rc;
>>>}
>>>}
>>> +spin_unlock(>vpci_lock);
>>>}
>> At the first glance this simply looks like another unjustified (in the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm sorry
>> for not pointing this out earlier).
> Well, I thought that the description already has "...the lock can be
> used (and in a few cases is used right away) to check whether vpci
> is present" and this is enough for such uses as here.
>>But then I wonder whether you
>> actually tested this, since I can't help getting the impression that
>> you're introducing a live-lock: The function is called from cmd_write()
>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>> function already holds the lock, and the lock is not (currently)
>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>> the locking looks to be entirely unnecessary.)
> Well, you are correct: if tmp != pdev then it is correct to acquire
> the lock. But if tmp == pdev and rom_only == true
> then we'll deadlock.
>
> It seems we need to have the locking conditional, e.g. only lock
> if tmp != pdev
 Which will address the live-lock, but introduce ABBA deadlock potential
 between the two locks.
>>> I am not sure I can suggest a better solution here
>>> @Roger, @Jan, could you please help here?
>>
>> Well, first of all I'd like to mention that while it may have been okay to
>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
>> with DomU-s' lists of PCI devices. The requirement really applies to the
>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
>> there it probably wants to be a try-lock.
>>
>> Next I'd like to point out that here we have the still pending issue of
>> how to deal with hidden devices, which Dom0 can access. See my RFC patch
>> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
>> here, I think it wants to at least account for the extra need there.
> 
> Yes, sorry, I should take care of that.
> 
>> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
>> the deadlock, as it's imo not an option at all to acquire that lock
>> everywhere else you access ->vpci (or else the vpci lock itself would be
>> pointless). But a per-domain auxiliary r/w lock may help: Other paths
>> would acquire it in read mode, and here you'd acquire it in write mode (in
>> the former case around the vpci lock, while in the latter case there may
>> then not be any need to acquire the individual vpci locks at all). FTAOD:
>> I haven't fully thought through all implications (and hence whether this is
>> viable in the first place); I expect you will, documenting what you've
>> found in the resulting patch description. Of course the double lock
>> acquire/release would then likely want hiding in helper functions.
> 
> I've been also thinking about this, and whether it's really worth to
> have a per-device lock rather than a per-domain one that protects all
> vpci regions of the devices assigned to the domain.
> 
> The OS is likely to serialize accesses to the PCI config space anyway,
> and the only place I could see a benefit of having per-device locks is
> in the handling of MSI-X tables, as the handling of the mask bit is
> likely very performance sensitive, so adding a per-domain lock there
> could 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> > On 04.02.22 11:15, Jan Beulich wrote:
> >> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> >>> On 04.02.22 09:52, Jan Beulich wrote:
>  On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> > @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
> > uint16_t cmd, bool rom_only)
> >continue;
> >}
> >
> > +spin_lock(>vpci_lock);
> > +if ( !tmp->vpci )
> > +{
> > +spin_unlock(>vpci_lock);
> > +continue;
> > +}
> >for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> >{
> >const struct vpci_bar *bar = >vpci->header.bars[i];
> > @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev 
> > *pdev, uint16_t cmd, bool rom_only)
> >rc = rangeset_remove_range(mem, start, end);
> >if ( rc )
> >{
> > +spin_unlock(>vpci_lock);
> >printk(XENLOG_G_WARNING "Failed to remove [%lx, 
> > %lx]: %d\n",
> >   start, end, rc);
> >rangeset_destroy(mem);
> >return rc;
> >}
> >}
> > +spin_unlock(>vpci_lock);
> >}
>  At the first glance this simply looks like another unjustified (in the
>  description) change, as you're not converting anything here but you
>  actually add locking (and I realize this was there before, so I'm sorry
>  for not pointing this out earlier).
> >>> Well, I thought that the description already has "...the lock can be
> >>> used (and in a few cases is used right away) to check whether vpci
> >>> is present" and this is enough for such uses as here.
> But then I wonder whether you
>  actually tested this, since I can't help getting the impression that
>  you're introducing a live-lock: The function is called from cmd_write()
>  and rom_write(), which in turn are called out of vpci_write(). Yet that
>  function already holds the lock, and the lock is not (currently)
>  recursive. (For the 3rd caller of the function - init_bars() - otoh
>  the locking looks to be entirely unnecessary.)
> >>> Well, you are correct: if tmp != pdev then it is correct to acquire
> >>> the lock. But if tmp == pdev and rom_only == true
> >>> then we'll deadlock.
> >>>
> >>> It seems we need to have the locking conditional, e.g. only lock
> >>> if tmp != pdev
> >> Which will address the live-lock, but introduce ABBA deadlock potential
> >> between the two locks.
> > I am not sure I can suggest a better solution here
> > @Roger, @Jan, could you please help here?
> 
> Well, first of all I'd like to mention that while it may have been okay to
> not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
> with DomU-s' lists of PCI devices. The requirement really applies to the
> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> there it probably wants to be a try-lock.
> 
> Next I'd like to point out that here we have the still pending issue of
> how to deal with hidden devices, which Dom0 can access. See my RFC patch
> "vPCI: account for hidden devices in modify_bars()". Whatever the solution
> here, I think it wants to at least account for the extra need there.

Yes, sorry, I should take care of that.

> Now it is quite clear that pcidevs_lock isn't going to help with avoiding
> the deadlock, as it's imo not an option at all to acquire that lock
> everywhere else you access ->vpci (or else the vpci lock itself would be
> pointless). But a per-domain auxiliary r/w lock may help: Other paths
> would acquire it in read mode, and here you'd acquire it in write mode (in
> the former case around the vpci lock, while in the latter case there may
> then not be any need to acquire the individual vpci locks at all). FTAOD:
> I haven't fully thought through all implications (and hence whether this is
> viable in the first place); I expect you will, documenting what you've
> found in the resulting patch description. Of course the double lock
> acquire/release would then likely want hiding in helper functions.

I've been also thinking about this, and whether it's really worth to
have a per-device lock rather than a per-domain one that protects all
vpci regions of the devices assigned to the domain.

The OS is likely to serialize accesses to the PCI config space anyway,
and the only place I could see a benefit of having per-device locks is
in the handling of MSI-X tables, as the handling of the mask bit is
likely very performance sensitive, so adding a per-domain lock there
could be a bottleneck.

We could alternatively do a per-domain rwlock for vpci and special case
the 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Roger Pau Monné
On Fri, Feb 04, 2022 at 10:12:46AM +, Oleksandr Andrushchenko wrote:
> Hi, Jan!
> 
> On 04.02.22 11:15, Jan Beulich wrote:
> > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 09:52, Jan Beulich wrote:
> >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>  @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>  uint16_t cmd, bool rom_only)
> continue;
> }
> 
>  +spin_lock(>vpci_lock);
>  +if ( !tmp->vpci )
>  +{
>  +spin_unlock(>vpci_lock);
>  +continue;
>  +}
> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> {
> const struct vpci_bar *bar = >vpci->header.bars[i];
>  @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
>  uint16_t cmd, bool rom_only)
> rc = rangeset_remove_range(mem, start, end);
> if ( rc )
> {
>  +spin_unlock(>vpci_lock);
> printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
>  %d\n",
>    start, end, rc);
> rangeset_destroy(mem);
> return rc;
> }
> }
>  +spin_unlock(>vpci_lock);
> }
> >>> At the first glance this simply looks like another unjustified (in the
> >>> description) change, as you're not converting anything here but you
> >>> actually add locking (and I realize this was there before, so I'm sorry
> >>> for not pointing this out earlier).
> >> Well, I thought that the description already has "...the lock can be
> >> used (and in a few cases is used right away) to check whether vpci
> >> is present" and this is enough for such uses as here.
> >>>But then I wonder whether you
> >>> actually tested this, since I can't help getting the impression that
> >>> you're introducing a live-lock: The function is called from cmd_write()
> >>> and rom_write(), which in turn are called out of vpci_write(). Yet that
> >>> function already holds the lock, and the lock is not (currently)
> >>> recursive. (For the 3rd caller of the function - init_bars() - otoh
> >>> the locking looks to be entirely unnecessary.)
> >> Well, you are correct: if tmp != pdev then it is correct to acquire
> >> the lock. But if tmp == pdev and rom_only == true
> >> then we'll deadlock.
> >>
> >> It seems we need to have the locking conditional, e.g. only lock
> >> if tmp != pdev
> > Which will address the live-lock, but introduce ABBA deadlock potential
> > between the two locks.
> I am not sure I can suggest a better solution here
> @Roger, @Jan, could you please help here?

I think we could set the locking order based on the memory address of
the locks, ie:

if ( >vpci_lock < >vpci_lock )
{
spin_unlock(>vpci_lock);
spin_lock(>vpci_lock);
spin_lock(>vpci_lock);
if ( !pdev->vpci || >vpci->header != header )
/* ERROR: vpci removed or recreated. */
}
else
spin_lock(>vpci_lock);

That however creates a window where the address of the BARs on the
current device (pdev) could be changed, so the result of the mapping
might be skewed. I think the guest would only have itself to blame for
that, as changing the position of the BARs while toggling memory
decoding is not something sensible to do.

> >
>  @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long 
>  addr, unsigned int len,
> break;
> }
> 
>  +msix_put(msix);
> return X86EMUL_OKAY;
> }
> 
>  -spin_lock(>pdev->vpci->lock);
> entry = get_entry(msix, addr);
> offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
> >>> You're increasing the locked region quite a bit here. If this is really
> >>> needed, it wants explaining. And if this is deemed acceptable as a
> >>> "side effect", it wants justifying or at least stating imo. Same for
> >>> msix_write() then, obviously.
> >> Yes, I do increase the locking region here, but the msix variable needs
> >> to be protected all the time, so it seems to be obvious that it remains
> >> under the lock
> > What does the msix variable have to do with the vPCI lock? If you see
> > a need to grow the locked region here, then surely this is independent
> > of your conversion of the lock, and hence wants to be a prereq fix
> > (which may in fact want/need backporting).
> First of all, the implementation of msix_get is wrong and needs to be:
> 
> /*
>   * Note: if vpci_msix found, then this function returns with
>   * pdev->vpci_lock held. Use msix_put to unlock.
>   */
> static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr)
> {
>      struct vpci_msix *msix;
> 
>      list_for_each_entry ( msix, >arch.hvm.msix_tables, next )


Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> On 04.02.22 11:15, Jan Beulich wrote:
>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 09:52, Jan Beulich wrote:
 On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>continue;
>}
>
> +spin_lock(>vpci_lock);
> +if ( !tmp->vpci )
> +{
> +spin_unlock(>vpci_lock);
> +continue;
> +}
>for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>{
>const struct vpci_bar *bar = >vpci->header.bars[i];
> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>rc = rangeset_remove_range(mem, start, end);
>if ( rc )
>{
> +spin_unlock(>vpci_lock);
>printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
> %d\n",
>   start, end, rc);
>rangeset_destroy(mem);
>return rc;
>}
>}
> +spin_unlock(>vpci_lock);
>}
 At the first glance this simply looks like another unjustified (in the
 description) change, as you're not converting anything here but you
 actually add locking (and I realize this was there before, so I'm sorry
 for not pointing this out earlier).
>>> Well, I thought that the description already has "...the lock can be
>>> used (and in a few cases is used right away) to check whether vpci
>>> is present" and this is enough for such uses as here.
But then I wonder whether you
 actually tested this, since I can't help getting the impression that
 you're introducing a live-lock: The function is called from cmd_write()
 and rom_write(), which in turn are called out of vpci_write(). Yet that
 function already holds the lock, and the lock is not (currently)
 recursive. (For the 3rd caller of the function - init_bars() - otoh
 the locking looks to be entirely unnecessary.)
>>> Well, you are correct: if tmp != pdev then it is correct to acquire
>>> the lock. But if tmp == pdev and rom_only == true
>>> then we'll deadlock.
>>>
>>> It seems we need to have the locking conditional, e.g. only lock
>>> if tmp != pdev
>> Which will address the live-lock, but introduce ABBA deadlock potential
>> between the two locks.
> I am not sure I can suggest a better solution here
> @Roger, @Jan, could you please help here?

Well, first of all I'd like to mention that while it may have been okay to
not hold pcidevs_lock here for Dom0, it surely needs acquiring when dealing
with DomU-s' lists of PCI devices. The requirement really applies to the
other use of for_each_pdev() as well (in vpci_dump_msi()), except that
there it probably wants to be a try-lock.

Next I'd like to point out that here we have the still pending issue of
how to deal with hidden devices, which Dom0 can access. See my RFC patch
"vPCI: account for hidden devices in modify_bars()". Whatever the solution
here, I think it wants to at least account for the extra need there.

Now it is quite clear that pcidevs_lock isn't going to help with avoiding
the deadlock, as it's imo not an option at all to acquire that lock
everywhere else you access ->vpci (or else the vpci lock itself would be
pointless). But a per-domain auxiliary r/w lock may help: Other paths
would acquire it in read mode, and here you'd acquire it in write mode (in
the former case around the vpci lock, while in the latter case there may
then not be any need to acquire the individual vpci locks at all). FTAOD:
I haven't fully thought through all implications (and hence whether this is
viable in the first place); I expect you will, documenting what you've
found in the resulting patch description. Of course the double lock
acquire/release would then likely want hiding in helper functions.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko
Hi, Jan!

On 04.02.22 11:15, Jan Beulich wrote:
> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>> On 04.02.22 09:52, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
 uint16_t cmd, bool rom_only)
continue;
}

 +spin_lock(>vpci_lock);
 +if ( !tmp->vpci )
 +{
 +spin_unlock(>vpci_lock);
 +continue;
 +}
for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
{
const struct vpci_bar *bar = >vpci->header.bars[i];
 @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
 uint16_t cmd, bool rom_only)
rc = rangeset_remove_range(mem, start, end);
if ( rc )
{
 +spin_unlock(>vpci_lock);
printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
 %d\n",
   start, end, rc);
rangeset_destroy(mem);
return rc;
}
}
 +spin_unlock(>vpci_lock);
}
>>> At the first glance this simply looks like another unjustified (in the
>>> description) change, as you're not converting anything here but you
>>> actually add locking (and I realize this was there before, so I'm sorry
>>> for not pointing this out earlier).
>> Well, I thought that the description already has "...the lock can be
>> used (and in a few cases is used right away) to check whether vpci
>> is present" and this is enough for such uses as here.
>>>But then I wonder whether you
>>> actually tested this, since I can't help getting the impression that
>>> you're introducing a live-lock: The function is called from cmd_write()
>>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>>> function already holds the lock, and the lock is not (currently)
>>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>>> the locking looks to be entirely unnecessary.)
>> Well, you are correct: if tmp != pdev then it is correct to acquire
>> the lock. But if tmp == pdev and rom_only == true
>> then we'll deadlock.
>>
>> It seems we need to have the locking conditional, e.g. only lock
>> if tmp != pdev
> Which will address the live-lock, but introduce ABBA deadlock potential
> between the two locks.
I am not sure I can suggest a better solution here
@Roger, @Jan, could you please help here?
>
 @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long 
 addr, unsigned int len,
break;
}

 +msix_put(msix);
return X86EMUL_OKAY;
}

 -spin_lock(>pdev->vpci->lock);
entry = get_entry(msix, addr);
offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
>>> You're increasing the locked region quite a bit here. If this is really
>>> needed, it wants explaining. And if this is deemed acceptable as a
>>> "side effect", it wants justifying or at least stating imo. Same for
>>> msix_write() then, obviously.
>> Yes, I do increase the locking region here, but the msix variable needs
>> to be protected all the time, so it seems to be obvious that it remains
>> under the lock
> What does the msix variable have to do with the vPCI lock? If you see
> a need to grow the locked region here, then surely this is independent
> of your conversion of the lock, and hence wants to be a prereq fix
> (which may in fact want/need backporting).
First of all, the implementation of msix_get is wrong and needs to be:

/*
  * Note: if vpci_msix found, then this function returns with
  * pdev->vpci_lock held. Use msix_put to unlock.
  */
static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr)
{
     struct vpci_msix *msix;

     list_for_each_entry ( msix, >arch.hvm.msix_tables, next )
     {
     const struct vpci_bar *bars;
     unsigned int i;

     spin_lock(>pdev->vpci_lock);
     if ( !msix->pdev->vpci )
     {
     spin_unlock(>pdev->vpci_lock);
     continue;
     }

     bars = msix->pdev->vpci->header.bars;
     for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
     if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
  VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
     return msix;

     spin_unlock(>pdev->vpci_lock);
     }

     return NULL;
}

Then, both msix_{read|write} can dereference msix->pdev->vpci early,
this is why Roger suggested we move to msix_{get|put} here.
And yes, we grow the locked region here and yes this might want a
prereq fix. Or just be fixed while at it.

>
 @@ -327,7 +334,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
 unsigned int 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>>> uint16_t cmd, bool rom_only)
>>>   continue;
>>>   }
>>>   
>>> +spin_lock(>vpci_lock);
>>> +if ( !tmp->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +continue;
>>> +}
>>>   for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>>   {
>>>   const struct vpci_bar *bar = >vpci->header.bars[i];
>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
>>> uint16_t cmd, bool rom_only)
>>>   rc = rangeset_remove_range(mem, start, end);
>>>   if ( rc )
>>>   {
>>> +spin_unlock(>vpci_lock);
>>>   printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
>>> %d\n",
>>>  start, end, rc);
>>>   rangeset_destroy(mem);
>>>   return rc;
>>>   }
>>>   }
>>> +spin_unlock(>vpci_lock);
>>>   }
>> At the first glance this simply looks like another unjustified (in the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm sorry
>> for not pointing this out earlier).
> Well, I thought that the description already has "...the lock can be
> used (and in a few cases is used right away) to check whether vpci
> is present" and this is enough for such uses as here.
>>   But then I wonder whether you
>> actually tested this, since I can't help getting the impression that
>> you're introducing a live-lock: The function is called from cmd_write()
>> and rom_write(), which in turn are called out of vpci_write(). Yet that
>> function already holds the lock, and the lock is not (currently)
>> recursive. (For the 3rd caller of the function - init_bars() - otoh
>> the locking looks to be entirely unnecessary.)
> Well, you are correct: if tmp != pdev then it is correct to acquire
> the lock. But if tmp == pdev and rom_only == true
> then we'll deadlock.
> 
> It seems we need to have the locking conditional, e.g. only lock
> if tmp != pdev

Which will address the live-lock, but introduce ABBA deadlock potential
between the two locks.

>>> @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long 
>>> addr, unsigned int len,
>>>   break;
>>>   }
>>>   
>>> +msix_put(msix);
>>>   return X86EMUL_OKAY;
>>>   }
>>>   
>>> -spin_lock(>pdev->vpci->lock);
>>>   entry = get_entry(msix, addr);
>>>   offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
>> You're increasing the locked region quite a bit here. If this is really
>> needed, it wants explaining. And if this is deemed acceptable as a
>> "side effect", it wants justifying or at least stating imo. Same for
>> msix_write() then, obviously.
> Yes, I do increase the locking region here, but the msix variable needs
> to be protected all the time, so it seems to be obvious that it remains
> under the lock

What does the msix variable have to do with the vPCI lock? If you see
a need to grow the locked region here, then surely this is independent
of your conversion of the lock, and hence wants to be a prereq fix
(which may in fact want/need backporting).

>>> @@ -327,7 +334,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
>>> unsigned int size)
>>>   if ( !pdev )
>>>   return vpci_read_hw(sbdf, reg, size);
>>>   
>>> -spin_lock(>vpci->lock);
>>> +spin_lock(>vpci_lock);
>>> +if ( !pdev->vpci )
>>> +{
>>> +spin_unlock(>vpci_lock);
>>> +return vpci_read_hw(sbdf, reg, size);
>>> +}
>> Didn't you say you would add justification of this part of the change
>> (and its vpci_write() counterpart) to the description?
> Again, I am referring to the commit message as described above

No, sorry - that part applies only to what inside the parentheses of
if(). But on the intermediate version (post-v5 in a 4-patch series) I
did say:

"In this case as well as in its write counterpart it becomes even more
 important to justify (in the description) the new behavior. It is not
 obvious at all that the absence of a struct vpci should be taken as
 an indication that the underlying device needs accessing instead.
 This also cannot be inferred from the "!pdev" case visible in context.
 In that case we have no record of a device at this SBDF, and hence the
 fallback pretty clearly is a "just in case" one. Yet if we know of a
 device, the absence of a struct vpci may mean various possible things."

If it wasn't obvious: The comment was on the use of vpci_read_hw() on
this path, not redundant with the earlier one regarding the added
"is vpci non-NULL" in a few places.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko
Hi, Jan!

On 04.02.22 09:52, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
>> uint16_t cmd, bool rom_only)
>>   continue;
>>   }
>>   
>> +spin_lock(>vpci_lock);
>> +if ( !tmp->vpci )
>> +{
>> +spin_unlock(>vpci_lock);
>> +continue;
>> +}
>>   for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>   {
>>   const struct vpci_bar *bar = >vpci->header.bars[i];
>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
>> uint16_t cmd, bool rom_only)
>>   rc = rangeset_remove_range(mem, start, end);
>>   if ( rc )
>>   {
>> +spin_unlock(>vpci_lock);
>>   printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
>> %d\n",
>>  start, end, rc);
>>   rangeset_destroy(mem);
>>   return rc;
>>   }
>>   }
>> +spin_unlock(>vpci_lock);
>>   }
> At the first glance this simply looks like another unjustified (in the
> description) change, as you're not converting anything here but you
> actually add locking (and I realize this was there before, so I'm sorry
> for not pointing this out earlier).
Well, I thought that the description already has "...the lock can be
used (and in a few cases is used right away) to check whether vpci
is present" and this is enough for such uses as here.
>   But then I wonder whether you
> actually tested this, since I can't help getting the impression that
> you're introducing a live-lock: The function is called from cmd_write()
> and rom_write(), which in turn are called out of vpci_write(). Yet that
> function already holds the lock, and the lock is not (currently)
> recursive. (For the 3rd caller of the function - init_bars() - otoh
> the locking looks to be entirely unnecessary.)
Well, you are correct: if tmp != pdev then it is correct to acquire
the lock. But if tmp == pdev and rom_only == true
then we'll deadlock.

It seems we need to have the locking conditional, e.g. only lock
if tmp != pdev
>
> Then again this was present already even in Roger's original patch, so
> I guess I must be missing something ...
>
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -138,7 +138,7 @@ static void control_write(const struct pci_dev *pdev, 
>> unsigned int reg,
>>   pci_conf_write16(pdev->sbdf, reg, val);
>>   }
>>   
>> -static struct vpci_msix *msix_find(const struct domain *d, unsigned long 
>> addr)
>> +static struct vpci_msix *msix_get(const struct domain *d, unsigned long 
>> addr)
>>   {
>>   struct vpci_msix *msix;
>>   
>> @@ -150,15 +150,29 @@ static struct vpci_msix *msix_find(const struct domain 
>> *d, unsigned long addr)
>>   for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>>   if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
>>VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
>> +{
>> +spin_lock(>pdev->vpci_lock);
>>   return msix;
>> +}
> I think deliberately returning with a lock held requires a respective
> comment ahead of the function.
Ok, will add a comment
>
>>   }
>>   
>>   return NULL;
>>   }
>>   
>> +static void msix_put(struct vpci_msix *msix)
>> +{
>> +if ( !msix )
>> +return;
>> +
>> +spin_unlock(>pdev->vpci_lock);
>> +}
> Maybe shorter
>
>  if ( msix )
>  spin_unlock(>pdev->vpci_lock);
Looks good
>
> ? Yet there's only one case where you may pass NULL in here, so
> maybe it's better anyway to move the conditional ...
>
>>   static int msix_accept(struct vcpu *v, unsigned long addr)
>>   {
>> -return !!msix_find(v->domain, addr);
>> +struct vpci_msix *msix = msix_get(v->domain, addr);
>> +
>> +msix_put(msix);
>> +return !!msix;
>>   }
> ... here?
Yes, I can have that check here, but what if there is yet
another caller of the same? I am not sure whether it is better
to have the check in msix_get or at the caller site.
At the moment (with a single place with NULL possible) I can
move the check. @Roger?
>
>> @@ -186,7 +200,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, 
>> unsigned int len,
>>unsigned long *data)
>>   {
>>   const struct domain *d = v->domain;
>> -struct vpci_msix *msix = msix_find(d, addr);
>> +struct vpci_msix *msix = msix_get(d, addr);
>>   const struct vpci_msix_entry *entry;
>>   unsigned int offset;
>>   
>> @@ -196,7 +210,10 @@ static int msix_read(struct vcpu *v, unsigned long 
>> addr, unsigned int len,
>>   return X86EMUL_RETRY;
>>   
>>   if ( !access_allowed(msix->pdev, addr, len) )
>> +{
>> +msix_put(msix);
>>   return X86EMUL_OKAY;
>> +}
>>   
>>   if ( 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Jan Beulich
On 04.02.2022 09:13, Oleksandr Andrushchenko wrote:
> On 04.02.22 09:52, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>
>> At the first glance this simply looks like another unjustified (in the
>> description) change, as you're not converting anything here but you
>> actually add locking (and I realize this was there before, so I'm sorry
>> for not pointing this out earlier). But then I wonder whether you
>> actually tested this
> This is already stated in the cover letter that I have tested two x86
> configurations and tested that on Arm...

Okay, I'm sorry then. But could you then please point out where I'm
wrong with my analysis?

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-04 Thread Oleksandr Andrushchenko
Hi, Jan!

On 04.02.22 09:52, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>
> At the first glance this simply looks like another unjustified (in the
> description) change, as you're not converting anything here but you
> actually add locking (and I realize this was there before, so I'm sorry
> for not pointing this out earlier). But then I wonder whether you
> actually tested this
This is already stated in the cover letter that I have tested two x86
configurations and tested that on Arm...
Would you like to see the relevant logs?

Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-03 Thread Jan Beulich
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>  continue;
>  }
>  
> +spin_lock(>vpci_lock);
> +if ( !tmp->vpci )
> +{
> +spin_unlock(>vpci_lock);
> +continue;
> +}
>  for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>  {
>  const struct vpci_bar *bar = >vpci->header.bars[i];
> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
> uint16_t cmd, bool rom_only)
>  rc = rangeset_remove_range(mem, start, end);
>  if ( rc )
>  {
> +spin_unlock(>vpci_lock);
>  printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: %d\n",
> start, end, rc);
>  rangeset_destroy(mem);
>  return rc;
>  }
>  }
> +spin_unlock(>vpci_lock);
>  }

At the first glance this simply looks like another unjustified (in the
description) change, as you're not converting anything here but you
actually add locking (and I realize this was there before, so I'm sorry
for not pointing this out earlier). But then I wonder whether you
actually tested this, since I can't help getting the impression that
you're introducing a live-lock: The function is called from cmd_write()
and rom_write(), which in turn are called out of vpci_write(). Yet that
function already holds the lock, and the lock is not (currently)
recursive. (For the 3rd caller of the function - init_bars() - otoh
the locking looks to be entirely unnecessary.)

Then again this was present already even in Roger's original patch, so
I guess I must be missing something ...

> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -138,7 +138,7 @@ static void control_write(const struct pci_dev *pdev, 
> unsigned int reg,
>  pci_conf_write16(pdev->sbdf, reg, val);
>  }
>  
> -static struct vpci_msix *msix_find(const struct domain *d, unsigned long 
> addr)
> +static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr)
>  {
>  struct vpci_msix *msix;
>  
> @@ -150,15 +150,29 @@ static struct vpci_msix *msix_find(const struct domain 
> *d, unsigned long addr)
>  for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>  if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
>   VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
> +{
> +spin_lock(>pdev->vpci_lock);
>  return msix;
> +}

I think deliberately returning with a lock held requires a respective
comment ahead of the function.

>  }
>  
>  return NULL;
>  }
>  
> +static void msix_put(struct vpci_msix *msix)
> +{
> +if ( !msix )
> +return;
> +
> +spin_unlock(>pdev->vpci_lock);
> +}

Maybe shorter

if ( msix )
spin_unlock(>pdev->vpci_lock);

? Yet there's only one case where you may pass NULL in here, so
maybe it's better anyway to move the conditional ...

>  static int msix_accept(struct vcpu *v, unsigned long addr)
>  {
> -return !!msix_find(v->domain, addr);
> +struct vpci_msix *msix = msix_get(v->domain, addr);
> +
> +msix_put(msix);
> +return !!msix;
>  }

... here?

> @@ -186,7 +200,7 @@ static int msix_read(struct vcpu *v, unsigned long addr, 
> unsigned int len,
>   unsigned long *data)
>  {
>  const struct domain *d = v->domain;
> -struct vpci_msix *msix = msix_find(d, addr);
> +struct vpci_msix *msix = msix_get(d, addr);
>  const struct vpci_msix_entry *entry;
>  unsigned int offset;
>  
> @@ -196,7 +210,10 @@ static int msix_read(struct vcpu *v, unsigned long addr, 
> unsigned int len,
>  return X86EMUL_RETRY;
>  
>  if ( !access_allowed(msix->pdev, addr, len) )
> +{
> +msix_put(msix);
>  return X86EMUL_OKAY;
> +}
>  
>  if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
>  {
> @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long 
> addr, unsigned int len,
>  break;
>  }
>  
> +msix_put(msix);
>  return X86EMUL_OKAY;
>  }
>  
> -spin_lock(>pdev->vpci->lock);
>  entry = get_entry(msix, addr);
>  offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);

You're increasing the locked region quite a bit here. If this is really
needed, it wants explaining. And if this is deemed acceptable as a
"side effect", it wants justifying or at least stating imo. Same for
msix_write() then, obviously. (I'm not sure Roger actually implied this
when suggesting to switch to the get/put pair.)

> @@ -327,7 +334,12 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
> unsigned int size)
>  if ( !pdev )
>  return vpci_read_hw(sbdf, reg, size);
>  
> -spin_lock(>vpci->lock);
> +

[PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-03 Thread Oleksandr Andrushchenko
From: Roger Pau Monné 

This way the lock can be used (and in a few cases is used right away)
to check whether vpci is present, and removal can be performed while
holding the lock, in order to make sure there are no accesses to the
contents of the vpci struct.
Previously removal could race with vpci_read for example, since the
lock was dropped prior to freeing pdev->vpci.

Signed-off-by: Roger Pau Monné 
Signed-off-by: Oleksandr Andrushchenko 
---
Cc: Andrew Cooper 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Stefano Stabellini 
---
New in v5 of this series: this is an updated version of the patch published at
https://lore.kernel.org/xen-devel/20180717094830.54806-2-roger@citrix.com/

Changes since v5:
 - vpci_lock in test_pdev is already initialized to false by default
 - introduce msix_{get|put} to protect former msix_find's result
 - add comments to vpci_{add|remove}_registers about pdev->vpci_lock must
   be held.
 - do not split code into vpci_remove_device_handlers_locked yet
 - move INIT_LIST_HEAD outside the locked region (Jan)
 - stripped out locking optimizations for vpci_{read|write} into a
   dedicated patch
Changes since v2:
 - fixed pdev->vpci = xzalloc(struct vpci); under spin_lock (Jan)
Changes since v1:
 - Assert that vpci_lock is locked in vpci_remove_device_locked.
 - Remove double newline.
 - Shrink critical section in vpci_{read/write}.
---
 tools/tests/vpci/emul.h   |  5 ++-
 tools/tests/vpci/main.c   |  3 +-
 xen/arch/x86/hvm/vmsi.c   |  8 ++---
 xen/drivers/passthrough/pci.c |  1 +
 xen/drivers/vpci/header.c | 21 +++
 xen/drivers/vpci/msi.c| 11 --
 xen/drivers/vpci/msix.c   | 39 -
 xen/drivers/vpci/vpci.c   | 65 ++-
 xen/include/xen/pci.h |  1 +
 xen/include/xen/vpci.h|  3 +-
 10 files changed, 106 insertions(+), 51 deletions(-)

diff --git a/tools/tests/vpci/emul.h b/tools/tests/vpci/emul.h
index 2e1d3057c9d8..d018fb5eef21 100644
--- a/tools/tests/vpci/emul.h
+++ b/tools/tests/vpci/emul.h
@@ -44,6 +44,7 @@ struct domain {
 };
 
 struct pci_dev {
+bool vpci_lock;
 struct vpci *vpci;
 };
 
@@ -53,10 +54,8 @@ struct vcpu
 };
 
 extern const struct vcpu *current;
-extern const struct pci_dev test_pdev;
+extern struct pci_dev test_pdev;
 
-typedef bool spinlock_t;
-#define spin_lock_init(l) (*(l) = false)
 #define spin_lock(l) (*(l) = true)
 #define spin_unlock(l) (*(l) = false)
 
diff --git a/tools/tests/vpci/main.c b/tools/tests/vpci/main.c
index b9a0a6006bb9..3b86ed232eb1 100644
--- a/tools/tests/vpci/main.c
+++ b/tools/tests/vpci/main.c
@@ -23,7 +23,7 @@ static struct vpci vpci;
 
 const static struct domain d;
 
-const struct pci_dev test_pdev = {
+struct pci_dev test_pdev = {
 .vpci = ,
 };
 
@@ -158,7 +158,6 @@ main(int argc, char **argv)
 int rc;
 
 INIT_LIST_HEAD();
-spin_lock_init();
 
 VPCI_ADD_REG(vpci_read32, vpci_write32, 0, 4, r0);
 VPCI_READ_CHECK(0, 4, r0);
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 13e2a190b439..1f7a37f78264 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -910,14 +910,14 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
 {
 struct pci_dev *pdev = msix->pdev;
 
-spin_unlock(>pdev->vpci->lock);
+spin_unlock(>pdev->vpci_lock);
 process_pending_softirqs();
 /* NB: we assume that pdev cannot go away for an alive domain. */
-if ( !pdev->vpci || !spin_trylock(>vpci->lock) )
+if ( !spin_trylock(>vpci_lock) )
 return -EBUSY;
-if ( pdev->vpci->msix != msix )
+if ( !pdev->vpci || pdev->vpci->msix != msix )
 {
-spin_unlock(>vpci->lock);
+spin_unlock(>vpci_lock);
 return -EAGAIN;
 }
 }
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e8b09d77d880..50dec3bb73d0 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -397,6 +397,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 
bus, u8 devfn)
 *((u8*) >bus) = bus;
 *((u8*) >devfn) = devfn;
 pdev->domain = NULL;
+spin_lock_init(>vpci_lock);
 
 arch_pci_init_pdev(pdev);
 
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 40ff79c33f8f..bd23c0274d48 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -142,12 +142,13 @@ bool vpci_process_pending(struct vcpu *v)
 if ( rc == -ERESTART )
 return true;
 
-spin_lock(>vpci.pdev->vpci->lock);
-/* Disable memory decoding unconditionally on failure. */
-modify_decoding(v->vpci.pdev,
-rc ? v->vpci.cmd & ~PCI_COMMAND_MEMORY : v->vpci.cmd,
-!rc && v->vpci.rom_only);
-spin_unlock(>vpci.pdev->vpci->lock);
+spin_lock(>vpci.pdev->vpci_lock);