Re: [PATCH v3 3/3] vfio_pci: make use of update_irq_devid and optimize irq ops

2019-08-22 Thread luoben



在 2019/8/20 下午11:27, Liu, Jiang 写道:



On Aug 20, 2019, at 11:24 PM, luoben  wrote:



在 2019/8/16 上午12:45, Thomas Gleixner 写道:

On Thu, 15 Aug 2019, Ben Luo wrote:


if (vdev->ctx[vector].trigger) {
-   free_irq(irq, vdev->ctx[vector].trigger);
-   irq_bypass_unregister_producer(>ctx[vector].producer);
-   kfree(vdev->ctx[vector].name);
-   eventfd_ctx_put(vdev->ctx[vector].trigger);
-   vdev->ctx[vector].trigger = NULL;
+   if (vdev->ctx[vector].trigger == trigger) {
+   
irq_bypass_unregister_producer(>ctx[vector].producer);


What's the point of unregistering the producer, just to register it again below?

Whether producer token changed or not, irq_bypass_register_producer() is a way 
(seems the

only way) to update IRTE by VFIO for posted interrupt.

When posted interrupt is in use, the target IRTE will be used by IOMMU directly 
to find the

target CPU for an interrupt posted to VM, since hypervisor is bypassed.

irq_bypass_register_producer() will modify IRTE based on the information 
retrieved from KVM,


0x8150a920 : modify_irte+0x0/0x180 [kernel]
0x8150ab94 : intel_ir_set_vcpu_affinity+0xf4/0x150 [kernel]
0x81125f3c : irq_set_vcpu_affinity+0x5c/0xa0 [kernel]
0xa0550818 : vmx_update_pi_irte+0x178/0x290 [kvm_intel]// get 
pi_desc etc. from KVM
0xa052b789 : kvm_arch_irq_bypass_add_producer+0x39/0x50 [kvm_intel] 
(inexact)
0xa024a50b : __connect+0x7b/0xa0 [kvm] (inexact)
0xa024a6a8 : irq_bypass_register_producer+0x108/0x140 [kvm] (inexact)
0xa0338386 : vfio_msi_set_vector_signal+0x1b6/0x2c0 [vfio_pci] (inexact)

+   } else if (trigger) {
+   ret = update_irq_devid(irq,
+   vdev->ctx[vector].trigger, trigger);
+   if (unlikely(ret)) {
+   dev_info(>dev,
+"update_irq_devid %d (token %p) fails: 
%d\n",


s/fails/failed/



+irq, vdev->ctx[vector].trigger, ret);
+   eventfd_ctx_put(trigger);
+   return ret;
+   }


This lacks any form of comment why this is correct. dev_id is updated and
the producer with the old token is still registered.


ok, I will add comment like below:

For KVM-VFIO case, once producer and consumer connected successfully, interrupt 
from passthrough

device will be directly delivered to VM instead of triggering interrupt process 
in HOST. If producer and

consumer are disconnected, this interrupt process will fall back to remap mode, 
the handler vfio_msihandler()

registered in corresponding irqaction will use the dev_id to find the right way 
to deliver the interrupt to VM.

So, it is safe to update dev_id before unregistration of irq-bypass producer, 
i.e. switch back from posted

mode to remap mode, since only in remap mode the 'dev_id' will be used by 
interrupt handler. To producer

and consumer, dev_id is only a token for pairing them togather.

+   
irq_bypass_unregister_producer(>ctx[vector].producer);


Now it's unregistered.



+   eventfd_ctx_put(vdev->ctx[vector].trigger);
+   vdev->ctx[vector].producer.token = trigger;


The token is updated and then it's newly registered below.



+   vdev->ctx[vector].trigger = trigger;
-   vdev->ctx[vector].producer.token = trigger;
-   vdev->ctx[vector].producer.irq = irq;
+   /* always update irte for posted mode */
ret = irq_bypass_register_producer(>ctx[vector].producer);
if (unlikely(ret))
dev_info(>dev,
"irq bypass producer (token %p) registration fails: %d\n",
vdev->ctx[vector].producer.token, ret);


I know this code already existed, but again this looks inconsistent. If the
registration fails then a subsequent update will try to unregister a not
registered producer. Does not make any sense to me.


irq_bypass_register_producer() also fails on duplicated registration, so maybe 
an unconditional try to

unregistration is a easy way to keep consistent.

Maybe it's better to change these function names to 
irq_bypass_try_register_producer() and

irq_bypass_try_unregister_producer()  :)



Hi Ben,
The point is that we shouldn’t blindly try to register again  if we 
fails to unregister a posted interrupt producer. By this way, the fast path 
(posted interrupt) may get disabled, but it’s safer than blindly ignoring the 
failure of ungistration.
Thanks,
Gerry


This may need another patchset to enhance the irq_bypass 
register/unregister functions first, at least to return some error code 
when irq_bypass_try_unregister_producer() fails. I would like to have a 
try later.


Thanks,

    Ben



Re: [PATCH v3 3/3] vfio_pci: make use of update_irq_devid and optimize irq ops

2019-08-20 Thread Liu, Jiang



> On Aug 20, 2019, at 11:24 PM, luoben  wrote:
> 
> 
> 
> 在 2019/8/16 上午12:45, Thomas Gleixner 写道:
>> On Thu, 15 Aug 2019, Ben Luo wrote:
>> 
>>> if (vdev->ctx[vector].trigger) {
>>> -   free_irq(irq, vdev->ctx[vector].trigger);
>>> -   irq_bypass_unregister_producer(>ctx[vector].producer);
>>> -   kfree(vdev->ctx[vector].name);
>>> -   eventfd_ctx_put(vdev->ctx[vector].trigger);
>>> -   vdev->ctx[vector].trigger = NULL;
>>> +   if (vdev->ctx[vector].trigger == trigger) {
>>> +   
>>> irq_bypass_unregister_producer(>ctx[vector].producer);
>>> 
>> What's the point of unregistering the producer, just to register it again 
>> below?
> Whether producer token changed or not, irq_bypass_register_producer() is a 
> way (seems the
> 
> only way) to update IRTE by VFIO for posted interrupt.
> 
> When posted interrupt is in use, the target IRTE will be used by IOMMU 
> directly to find the
> 
> target CPU for an interrupt posted to VM, since hypervisor is bypassed.
> 
> irq_bypass_register_producer() will modify IRTE based on the information 
> retrieved from KVM,
> 
> 
> 0x8150a920 : modify_irte+0x0/0x180 [kernel]
> 0x8150ab94 : intel_ir_set_vcpu_affinity+0xf4/0x150 [kernel]
> 0x81125f3c : irq_set_vcpu_affinity+0x5c/0xa0 [kernel]
> 0xa0550818 : vmx_update_pi_irte+0x178/0x290 [kvm_intel]// get 
> pi_desc etc. from KVM
> 0xa052b789 : kvm_arch_irq_bypass_add_producer+0x39/0x50 [kvm_intel] 
> (inexact)
> 0xa024a50b : __connect+0x7b/0xa0 [kvm] (inexact)
> 0xa024a6a8 : irq_bypass_register_producer+0x108/0x140 [kvm] (inexact)
> 0xa0338386 : vfio_msi_set_vector_signal+0x1b6/0x2c0 [vfio_pci] 
> (inexact)
>> 
>>> +   } else if (trigger) {
>>> +   ret = update_irq_devid(irq,
>>> +   vdev->ctx[vector].trigger, trigger);
>>> +   if (unlikely(ret)) {
>>> +   dev_info(>dev,
>>> +"update_irq_devid %d (token %p) fails: 
>>> %d\n",
>>> 
>> s/fails/failed/
>> 
>> 
>>> +irq, vdev->ctx[vector].trigger, ret);
>>> +   eventfd_ctx_put(trigger);
>>> +   return ret;
>>> +   }
>>> 
>> This lacks any form of comment why this is correct. dev_id is updated and
>> the producer with the old token is still registered.
>> 
> ok, I will add comment like below:
> 
> For KVM-VFIO case, once producer and consumer connected successfully, 
> interrupt from passthrough
> 
> device will be directly delivered to VM instead of triggering interrupt 
> process in HOST. If producer and
> 
> consumer are disconnected, this interrupt process will fall back to remap 
> mode, the handler vfio_msihandler()
> 
> registered in corresponding irqaction will use the dev_id to find the right 
> way to deliver the interrupt to VM.
> 
> So, it is safe to update dev_id before unregistration of irq-bypass producer, 
> i.e. switch back from posted
> 
> mode to remap mode, since only in remap mode the 'dev_id' will be used by 
> interrupt handler. To producer
> 
> and consumer, dev_id is only a token for pairing them togather.
>> 
>>> +   
>>> irq_bypass_unregister_producer(>ctx[vector].producer);
>>> 
>> Now it's unregistered.
>> 
>> 
>>> +   eventfd_ctx_put(vdev->ctx[vector].trigger);
>>> +   vdev->ctx[vector].producer.token = trigger;
>>> 
>> The token is updated and then it's newly registered below.
>> 
>> 
>>> +   vdev->ctx[vector].trigger = trigger;
>>> -   vdev->ctx[vector].producer.token = trigger;
>>> -   vdev->ctx[vector].producer.irq = irq;
>>> +   /* always update irte for posted mode */
>>> ret = irq_bypass_register_producer(>ctx[vector].producer);
>>> if (unlikely(ret))
>>> dev_info(>dev,
>>> "irq bypass producer (token %p) registration fails: %d\n",
>>> vdev->ctx[vector].producer.token, ret);
>>> 
>> I know this code already existed, but again this looks inconsistent. If the
>> registration fails then a subsequent update will try to unregister a not
>> registered producer. Does not make any sense to me.
>> 
> irq_bypass_register_producer() also fails on duplicated registration, so 
> maybe an unconditional try to
> 
> unregistration is a easy way to keep consistent. 
> 
> Maybe it's better to change these function names to 
> irq_bypass_try_register_producer() and
> 
> irq_bypass_try_unregister_producer()  :)
> 
> 
Hi Ben,
The point is that we shouldn’t blindly try to register again  if we 
fails to unregister a posted interrupt producer. By this way, the fast path 
(posted interrupt) may get disabled, but it’s safer than blindly ignoring the 
failure of ungistration.
Thanks,
Gerry 
> 
> Thanks,
> 
> Ben



Re: [PATCH v3 3/3] vfio_pci: make use of update_irq_devid and optimize irq ops

2019-08-15 Thread Thomas Gleixner
On Thu, 15 Aug 2019, Ben Luo wrote:
>   if (vdev->ctx[vector].trigger) {
> - free_irq(irq, vdev->ctx[vector].trigger);
> - irq_bypass_unregister_producer(>ctx[vector].producer);
> - kfree(vdev->ctx[vector].name);
> - eventfd_ctx_put(vdev->ctx[vector].trigger);
> - vdev->ctx[vector].trigger = NULL;
> + if (vdev->ctx[vector].trigger == trigger) {
> + 
> irq_bypass_unregister_producer(>ctx[vector].producer);

What's the point of unregistering the producer, just to register it again below?

> + } else if (trigger) {
> + ret = update_irq_devid(irq,
> + vdev->ctx[vector].trigger, trigger);
> + if (unlikely(ret)) {
> + dev_info(>dev,
> +  "update_irq_devid %d (token %p) fails: 
> %d\n",

s/fails/failed/

> +  irq, vdev->ctx[vector].trigger, ret);
> + eventfd_ctx_put(trigger);
> + return ret;
> + }

This lacks any form of comment why this is correct. dev_id is updated and
the producer with the old token is still registered.

> + 
> irq_bypass_unregister_producer(>ctx[vector].producer);

Now it's unregistered.

> + eventfd_ctx_put(vdev->ctx[vector].trigger);
> + vdev->ctx[vector].producer.token = trigger;

The token is updated and then it's newly registered below.

> + vdev->ctx[vector].trigger = trigger;
> - vdev->ctx[vector].producer.token = trigger;
> - vdev->ctx[vector].producer.irq = irq;
> + /* always update irte for posted mode */
>   ret = irq_bypass_register_producer(>ctx[vector].producer);
>   if (unlikely(ret))
>   dev_info(>dev,
>   "irq bypass producer (token %p) registration fails: %d\n",
>   vdev->ctx[vector].producer.token, ret);

I know this code already existed, but again this looks inconsistent. If the
registration fails then a subsequent update will try to unregister a not
registered producer. Does not make any sense to me.

Thanks,

tglx


[PATCH v3 3/3] vfio_pci: make use of update_irq_devid and optimize irq ops

2019-08-15 Thread Ben Luo
When userspace (e.g. qemu) triggers a switch between KVM
irqfd and userspace eventfd, only dev_id of irq action
(i.e. the "trigger" in this patch's context) will be
changed, but a free-then-request-irq action is taken in
current code. And, irq affinity setting in VM will also
trigger a free-then-request-irq action, which actually
changes nothing, but only fires a producer re-registration
to update irte in case that posted-interrupt is in use.

This patch makes use of update_irq_devid() and optimize
both cases above, which reduces the risk of losing interrupt
and also cuts some overhead.

Signed-off-by: Ben Luo 
Reviewed-by: Liu Jiang 
Reviewed-by: Zou Nanhai 
Reviewed-by: Yunsheng Lin 
---
 drivers/vfio/pci/vfio_pci_intrs.c | 101 --
 1 file changed, 63 insertions(+), 38 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 3fa3f72..b2654ba 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -284,70 +284,95 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, 
int nvec, bool msix)
 static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
  int vector, int fd, bool msix)
 {
+   struct eventfd_ctx *trigger = NULL;
struct pci_dev *pdev = vdev->pdev;
-   struct eventfd_ctx *trigger;
int irq, ret;
 
if (vector < 0 || vector >= vdev->num_ctx)
return -EINVAL;
 
+   if (fd >= 0) {
+   trigger = eventfd_ctx_fdget(fd);
+   if (IS_ERR(trigger))
+   return PTR_ERR(trigger);
+   }
+
irq = pci_irq_vector(pdev, vector);
 
if (vdev->ctx[vector].trigger) {
-   free_irq(irq, vdev->ctx[vector].trigger);
-   irq_bypass_unregister_producer(>ctx[vector].producer);
-   kfree(vdev->ctx[vector].name);
-   eventfd_ctx_put(vdev->ctx[vector].trigger);
-   vdev->ctx[vector].trigger = NULL;
+   if (vdev->ctx[vector].trigger == trigger) {
+   
irq_bypass_unregister_producer(>ctx[vector].producer);
+   } else if (trigger) {
+   ret = update_irq_devid(irq,
+   vdev->ctx[vector].trigger, trigger);
+   if (unlikely(ret)) {
+   dev_info(>dev,
+"update_irq_devid %d (token %p) fails: 
%d\n",
+irq, vdev->ctx[vector].trigger, ret);
+   eventfd_ctx_put(trigger);
+   return ret;
+   }
+   
irq_bypass_unregister_producer(>ctx[vector].producer);
+   eventfd_ctx_put(vdev->ctx[vector].trigger);
+   vdev->ctx[vector].producer.token = trigger;
+   vdev->ctx[vector].trigger = trigger;
+   } else {
+   free_irq(irq, vdev->ctx[vector].trigger);
+   
irq_bypass_unregister_producer(>ctx[vector].producer);
+   kfree(vdev->ctx[vector].name);
+   eventfd_ctx_put(vdev->ctx[vector].trigger);
+   vdev->ctx[vector].trigger = NULL;
+   }
}
 
if (fd < 0)
return 0;
 
-   vdev->ctx[vector].name = kasprintf(GFP_KERNEL, "vfio-msi%s[%d](%s)",
-  msix ? "x" : "", vector,
-  pci_name(pdev));
-   if (!vdev->ctx[vector].name)
-   return -ENOMEM;
+   if (!vdev->ctx[vector].trigger) {
+   vdev->ctx[vector].name = kasprintf(GFP_KERNEL,
+  "vfio-msi%s[%d](%s)",
+  msix ? "x" : "", vector,
+  pci_name(pdev));
+   if (!vdev->ctx[vector].name) {
+   eventfd_ctx_put(trigger);
+   return -ENOMEM;
+   }
 
-   trigger = eventfd_ctx_fdget(fd);
-   if (IS_ERR(trigger)) {
-   kfree(vdev->ctx[vector].name);
-   return PTR_ERR(trigger);
-   }
+   /*
+* The MSIx vector table resides in device memory which may be
+* cleared via backdoor resets. We don't allow direct access to
+* the vector table so even if a userspace driver attempts to
+* save/restore around such a reset it would be unsuccessful.
+* To avoid this, restore the cached value of the message prior
+* to enabling.
+*/
+   if (msix) {
+   struct msi_msg msg;
 
-   /*
-* The MSIx vector table resides in device memory which may be cleared
-