Re: [PATCH v3 05/19] vdpa_sim: remove the limit of IOTLB entries

2020-12-09 Thread Jason Wang


On 2020/12/9 下午6:58, Stefano Garzarella wrote:

On Mon, Dec 07, 2020 at 12:00:07PM +0800, Jason Wang wrote:


On 2020/12/4 上午1:04, Stefano Garzarella wrote:

The simulated devices can support multiple queues, so this limit
should be defined according to the number of queues supported by
the device.

Since we are in a simulator, let's simply remove that limit.

Suggested-by: Jason Wang 
Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 



Rethink about this, since simulator can be used by VM, so the 
allocation is actually guest trigger-able when vIOMMU is enabled.


This means we need a limit somehow, (e.g I remember swiotlb is about 
64MB by default). Or having a module parameter for this.


Btw, have you met any issue when using 2048, I guess it can happen 
when we run several processes in parallel?




No, I didn't try with the limit.
This came from the reviews to Max's patches.

Anyway I can add a module parameter to control that limit, do you 
think is better to set a limit per queue (the parameter per number of 
queues), or just a value for the entire device?



Per-device should be ok.

Thanks




Thanks,
Stefano



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE

2020-12-09 Thread Thomas Gleixner
On Fri, Nov 20 2020 at 12:46, Juergen Gross wrote:
> SWAPGS is used only for interrupts coming from user mode or for
> returning to user mode. So there is no reason to use the PARAVIRT
> framework, as it can easily be replaced by an ALTERNATIVE depending
> on X86_FEATURE_XENPV.
>
> There are several instances using the PV-aware SWAPGS macro in paths
> which are never executed in a Xen PV guest. Replace those with the
> plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies.
>
> Signed-off-by: Juergen Gross 
> Acked-by: Andy Lutomirski 
> Acked-by: Peter Zijlstra (Intel) 

Reviewed-by: Thomas Gleixner 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-09 Thread Thomas Gleixner
On Wed, Dec 09 2020 at 18:15, Mark Rutland wrote:
> In arch/x86/kernel/apic/io_apic.c's timer_irq_works() we do:
>
>   local_irq_save(flags);
>   local_irq_enable();
>
>   [ trigger an IRQ here ]
>
>   local_irq_restore(flags);
>
> ... and in check_timer() we call that a number of times after either a
> local_irq_save() or local_irq_disable(), eventually trailing with a
> local_irq_disable() that will balance things up before calling
> local_irq_restore().
>
> I guess that timer_irq_works() should instead do:
>
>   local_irq_save(flags);
>   local_irq_enable();
>   ...
>   local_irq_disable();
>   local_irq_restore(flags);
>
> ... assuming we consider that legitimate?

Nah. That's old and insane gunk.

Thanks,

tglx
---
 arch/x86/kernel/apic/io_apic.c |   22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1618,21 +1618,16 @@ static void __init delay_without_tsc(voi
 static int __init timer_irq_works(void)
 {
unsigned long t1 = jiffies;
-   unsigned long flags;
 
if (no_timer_check)
return 1;
 
-   local_save_flags(flags);
local_irq_enable();
-
if (boot_cpu_has(X86_FEATURE_TSC))
delay_with_tsc();
else
delay_without_tsc();
 
-   local_irq_restore(flags);
-
/*
 * Expect a few ticks at least, to be sure some possible
 * glue logic does not lock up after one or two first
@@ -1641,10 +1636,10 @@ static int __init timer_irq_works(void)
 * least one tick may be lost due to delays.
 */
 
-   /* jiffies wrap? */
-   if (time_after(jiffies, t1 + 4))
-   return 1;
-   return 0;
+   local_irq_disable();
+
+   /* Did jiffies advance? */
+   return time_after(jiffies, t1 + 4);
 }
 
 /*
@@ -2117,13 +2112,12 @@ static inline void __init check_timer(vo
struct irq_cfg *cfg = irqd_cfg(irq_data);
int node = cpu_to_node(0);
int apic1, pin1, apic2, pin2;
-   unsigned long flags;
int no_pin1 = 0;
 
if (!global_clock_event)
return;
 
-   local_irq_save(flags);
+   local_irq_disable();
 
/*
 * get/set the timer IRQ vector:
@@ -2191,7 +2185,6 @@ static inline void __init check_timer(vo
goto out;
}
panic_if_irq_remap("timer doesn't work through 
Interrupt-remapped IO-APIC");
-   local_irq_disable();
clear_IO_APIC_pin(apic1, pin1);
if (!no_pin1)
apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
@@ -2215,7 +2208,6 @@ static inline void __init check_timer(vo
/*
 * Cleanup, just in case ...
 */
-   local_irq_disable();
legacy_pic->mask(0);
clear_IO_APIC_pin(apic2, pin2);
apic_printk(APIC_QUIET, KERN_INFO "... failed.\n");
@@ -2232,7 +2224,6 @@ static inline void __init check_timer(vo
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
legacy_pic->mask(0);
apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
apic_printk(APIC_QUIET, KERN_INFO ". failed.\n");
@@ -2251,7 +2242,6 @@ static inline void __init check_timer(vo
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
apic_printk(APIC_QUIET, KERN_INFO ". failed :(.\n");
if (apic_is_x2apic_enabled())
apic_printk(APIC_QUIET, KERN_INFO
@@ -2260,7 +2250,7 @@ static inline void __init check_timer(vo
panic("IO-APIC + timer doesn't work!  Boot with apic=debug and send a "
"report.  Then try booting with the 'noapic' option.\n");
 out:
-   local_irq_restore(flags);
+   local_irq_enable();
 }
 
 /*
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-09 Thread Mark Rutland
On Fri, Nov 20, 2020 at 12:59:43PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
> > +static __always_inline void arch_local_irq_restore(unsigned long flags)
> > +{
> > +   if (!arch_irqs_disabled_flags(flags))
> > +   arch_local_irq_enable();
> > +}
> 
> If someone were to write horrible code like:
> 
>   local_irq_disable();
>   local_irq_save(flags);
>   local_irq_enable();
>   local_irq_restore(flags);
> 
> we'd be up some creek without a paddle... now I don't _think_ we have
> genius code like that, but I'd feel saver if we can haz an assertion in
> there somewhere...

I've cobbled that together locally (i'll post it momentarily), and gave it a
spin on both arm64 and x86, whereupon it exploded at boot time on x86.

In arch/x86/kernel/apic/io_apic.c's timer_irq_works() we do:

local_irq_save(flags);
local_irq_enable();

[ trigger an IRQ here ]

local_irq_restore(flags);

... and in check_timer() we call that a number of times after either a
local_irq_save() or local_irq_disable(), eventually trailing with a
local_irq_disable() that will balance things up before calling
local_irq_restore().

I guess that timer_irq_works() should instead do:

local_irq_save(flags);
local_irq_enable();
...
local_irq_disable();
local_irq_restore(flags);

... assuming we consider that legitimate?

With that, and all the calls to local_irq_disable() in check_timer() removed
(diff below) I get a clean boot under QEMU with the assertion hacked in and
DEBUG_LOCKDEP enabled.

Thanks
Mark.

>8
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7b3c7e0d4a09..e79e665a3aeb 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1631,6 +1631,7 @@ static int __init timer_irq_works(void)
else
delay_without_tsc();
 
+   local_irq_disable();
local_irq_restore(flags);
 
/*
@@ -2191,7 +2192,6 @@ static inline void __init check_timer(void)
goto out;
}
panic_if_irq_remap("timer doesn't work through 
Interrupt-remapped IO-APIC");
-   local_irq_disable();
clear_IO_APIC_pin(apic1, pin1);
if (!no_pin1)
apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
@@ -2215,7 +2215,6 @@ static inline void __init check_timer(void)
/*
 * Cleanup, just in case ...
 */
-   local_irq_disable();
legacy_pic->mask(0);
clear_IO_APIC_pin(apic2, pin2);
apic_printk(APIC_QUIET, KERN_INFO "... failed.\n");
@@ -2232,7 +2231,6 @@ static inline void __init check_timer(void)
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
legacy_pic->mask(0);
apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector);
apic_printk(APIC_QUIET, KERN_INFO ". failed.\n");
@@ -2251,7 +2249,6 @@ static inline void __init check_timer(void)
apic_printk(APIC_QUIET, KERN_INFO ". works.\n");
goto out;
}
-   local_irq_disable();
apic_printk(APIC_QUIET, KERN_INFO ". failed :(.\n");
if (apic_is_x2apic_enabled())
apic_printk(APIC_QUIET, KERN_INFO
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 13/20] drm/nouveau: Remove references to struct drm_device.pdev

2020-12-09 Thread Jeremy Cline
Hi,

On Tue, Dec 01, 2020 at 11:35:35AM +0100, Thomas Zimmermann wrote:
> Using struct drm_device.pdev is deprecated. Convert nouveau to struct
> drm_device.dev. No functional changes.
> 
> Signed-off-by: Thomas Zimmermann 
> Cc: Ben Skeggs 
> ---
>  drivers/gpu/drm/nouveau/dispnv04/arb.c  | 12 +++-
>  drivers/gpu/drm/nouveau/dispnv04/disp.h | 14 --
>  drivers/gpu/drm/nouveau/dispnv04/hw.c   | 10 ++
>  drivers/gpu/drm/nouveau/nouveau_abi16.c |  7 ---
>  drivers/gpu/drm/nouveau/nouveau_acpi.c  |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_bios.c  | 11 ---
>  drivers/gpu/drm/nouveau/nouveau_connector.c | 10 ++
>  drivers/gpu/drm/nouveau/nouveau_drm.c   |  5 ++---
>  drivers/gpu/drm/nouveau/nouveau_fbcon.c |  6 --
>  drivers/gpu/drm/nouveau/nouveau_vga.c   | 20 
>  10 files changed, 58 insertions(+), 39 deletions(-)
> 

I believe there's a use of drm_device.pdev in
drivers/gpu/drm/nouveau/dispnv04/dfp.c in the
nv04_dfp_update_backlight() function.

Other than that, this looks good to me.

> diff --git a/drivers/gpu/drm/nouveau/dispnv04/arb.c 
> b/drivers/gpu/drm/nouveau/dispnv04/arb.c
> index 9d4a2d97507e..1d3542d6006b 100644
> --- a/drivers/gpu/drm/nouveau/dispnv04/arb.c
> +++ b/drivers/gpu/drm/nouveau/dispnv04/arb.c
> @@ -200,16 +200,17 @@ nv04_update_arb(struct drm_device *dev, int VClk, int 
> bpp,
>   int MClk = nouveau_hw_get_clock(dev, PLL_MEMORY);
>   int NVClk = nouveau_hw_get_clock(dev, PLL_CORE);
>   uint32_t cfg1 = nvif_rd32(device, NV04_PFB_CFG1);
> + struct pci_dev *pdev = to_pci_dev(dev->dev);
>  
>   sim_data.pclk_khz = VClk;
>   sim_data.mclk_khz = MClk;
>   sim_data.nvclk_khz = NVClk;
>   sim_data.bpp = bpp;
>   sim_data.two_heads = nv_two_heads(dev);
> - if ((dev->pdev->device & 0x) == 0x01a0 /*CHIPSET_NFORCE*/ ||
> - (dev->pdev->device & 0x) == 0x01f0 /*CHIPSET_NFORCE2*/) {
> + if ((pdev->device & 0x) == 0x01a0 /*CHIPSET_NFORCE*/ ||
> + (pdev->device & 0x) == 0x01f0 /*CHIPSET_NFORCE2*/) {
>   uint32_t type;
> - int domain = pci_domain_nr(dev->pdev->bus);
> + int domain = pci_domain_nr(pdev->bus);
>  
>   pci_read_config_dword(pci_get_domain_bus_and_slot(domain, 0, 1),
> 0x7c, );
> @@ -251,11 +252,12 @@ void
>  nouveau_calc_arb(struct drm_device *dev, int vclk, int bpp, int *burst, int 
> *lwm)
>  {
>   struct nouveau_drm *drm = nouveau_drm(dev);
> + struct pci_dev *pdev = to_pci_dev(dev->dev);
>  
>   if (drm->client.device.info.family < NV_DEVICE_INFO_V0_KELVIN)
>   nv04_update_arb(dev, vclk, bpp, burst, lwm);
> - else if ((dev->pdev->device & 0xfff0) == 0x0240 /*CHIPSET_C51*/ ||
> -  (dev->pdev->device & 0xfff0) == 0x03d0 /*CHIPSET_C512*/) {
> + else if ((pdev->device & 0xfff0) == 0x0240 /*CHIPSET_C51*/ ||
> +  (pdev->device & 0xfff0) == 0x03d0 /*CHIPSET_C512*/) {
>   *burst = 128;
>   *lwm = 0x0480;
>   } else
> diff --git a/drivers/gpu/drm/nouveau/dispnv04/disp.h 
> b/drivers/gpu/drm/nouveau/dispnv04/disp.h
> index 5ace5e906949..f0a24126641a 100644
> --- a/drivers/gpu/drm/nouveau/dispnv04/disp.h
> +++ b/drivers/gpu/drm/nouveau/dispnv04/disp.h
> @@ -130,7 +130,7 @@ static inline bool
>  nv_two_heads(struct drm_device *dev)
>  {
>   struct nouveau_drm *drm = nouveau_drm(dev);
> - const int impl = dev->pdev->device & 0x0ff0;
> + const int impl = to_pci_dev(dev->dev)->device & 0x0ff0;
>  
>   if (drm->client.device.info.family >= NV_DEVICE_INFO_V0_CELSIUS && impl 
> != 0x0100 &&
>   impl != 0x0150 && impl != 0x01a0 && impl != 0x0200)
> @@ -142,14 +142,14 @@ nv_two_heads(struct drm_device *dev)
>  static inline bool
>  nv_gf4_disp_arch(struct drm_device *dev)
>  {
> - return nv_two_heads(dev) && (dev->pdev->device & 0x0ff0) != 0x0110;
> + return nv_two_heads(dev) && (to_pci_dev(dev->dev)->device & 0x0ff0) != 
> 0x0110;
>  }
>  
>  static inline bool
>  nv_two_reg_pll(struct drm_device *dev)
>  {
>   struct nouveau_drm *drm = nouveau_drm(dev);
> - const int impl = dev->pdev->device & 0x0ff0;
> + const int impl = to_pci_dev(dev->dev)->device & 0x0ff0;
>  
>   if (impl == 0x0310 || impl == 0x0340 || drm->client.device.info.family 
> >= NV_DEVICE_INFO_V0_CURIE)
>   return true;
> @@ -160,9 +160,11 @@ static inline bool
>  nv_match_device(struct drm_device *dev, unsigned device,
>   unsigned sub_vendor, unsigned sub_device)
>  {
> - return dev->pdev->device == device &&
> - dev->pdev->subsystem_vendor == sub_vendor &&
> - dev->pdev->subsystem_device == sub_device;
> + struct pci_dev *pdev = to_pci_dev(dev->dev);
> +
> + return pdev->device == device &&
> + pdev->subsystem_vendor == sub_vendor &&
> + pdev->subsystem_device 

Re: [RFC PATCH 04/27] vhost: add vhost_kernel_set_vring_enable

2020-12-09 Thread Stefan Hajnoczi
On Wed, Dec 09, 2020 at 01:00:19PM +0100, Eugenio Perez Martin wrote:
> On Mon, Dec 7, 2020 at 5:43 PM Stefan Hajnoczi  wrote:
> >
> > On Fri, Nov 20, 2020 at 07:50:42PM +0100, Eugenio Pérez wrote:
> > > Signed-off-by: Eugenio Pérez 
> > > ---
> > >  hw/virtio/vhost-backend.c | 29 +
> > >  1 file changed, 29 insertions(+)
> > >
> > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > > index 222bbcc62d..317f1f96fa 100644
> > > --- a/hw/virtio/vhost-backend.c
> > > +++ b/hw/virtio/vhost-backend.c
> > > @@ -201,6 +201,34 @@ static int vhost_kernel_get_vq_index(struct 
> > > vhost_dev *dev, int idx)
> > >  return idx - dev->vq_index;
> > >  }
> > >
> > > +static int vhost_kernel_set_vq_enable(struct vhost_dev *dev, unsigned 
> > > idx,
> > > +  bool enable)
> > > +{
> > > +struct vhost_vring_file file = {
> > > +.index = idx,
> > > +};
> > > +
> > > +if (!enable) {
> > > +file.fd = -1; /* Pass -1 to unbind from file. */
> > > +} else {
> > > +struct vhost_net *vn_dev = container_of(dev, struct vhost_net, 
> > > dev);
> > > +file.fd = vn_dev->backend;
> > > +}
> > > +
> > > +return vhost_kernel_net_set_backend(dev, );
> >
> > This is vhost-net specific even though the function appears to be
> > generic. Is there a plan to extend this to all devices?
> >
> 
> I expected each vhost backend to enable-disable in its own terms, but
> I think it could be 100% virtio-device generic with something like the
> device state capability:
> https://lists.oasis-open.org/archives/virtio-comment/202012/msg5.html
> .

Great, thanks for the link!

> > > +}
> > > +
> > > +static int vhost_kernel_set_vring_enable(struct vhost_dev *dev, int 
> > > enable)
> > > +{
> > > +int i;
> > > +
> > > +for (i = 0; i < dev->nvqs; ++i) {
> > > +vhost_kernel_set_vq_enable(dev, i, enable);
> > > +}
> > > +
> > > +return 0;
> > > +}
> >
> > I suggest exposing the per-vq interface (vhost_kernel_set_vq_enable())
> > in VhostOps so it follows the ioctl interface.
> 
> It was actually the initial plan, I left as all-or-nothing to make less 
> changes.
> 
> > vhost_kernel_set_vring_enable() can be moved to vhost.c can loop over
> > all vqs if callers find it convenient to loop over all vqs.
> 
> I'm ok with it. Thinking out loud, I don't know if it is easier for
> some devices to enable/disable all of it (less syscalls? less downtime
> somehow?) but I find more generic and useful the per-virtqueue
> approach.

That's an interesting question, the ability to enable/disable specific
virtqueues seems like it could be useful. For example, guests with vCPU
hotplug may want to enable/disable virtqueues so that multi-queue
adapts as the number of vCPUs changes. A per-vq interface is needed for
that.

I'm a little worried that some device types might not cope well with
quiescing individual vqs. Here "quiesce" means to complete in flight
requests. This would be where two or more vqs have a relationship and
disabling one vq could cause a deadlock when trying to disable the other
one. However, I can't think of a case where this happens.

virtio-vsock is the closest example but luckily we don't need complete
in flight requests, we can just stop the vq immediately. So although
there is a dependency on the other vq it won't deadlock in this case.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 0/8] vhost: allow userspace to control vq cpu affinity

2020-12-09 Thread Stefano Garzarella

Hi Mike,
sorry for the delay but there were holidays.

On Fri, Dec 04, 2020 at 11:33:11AM -0600, Mike Christie wrote:

On 12/4/20 11:10 AM, Mike Christie wrote:

On 12/4/20 10:06 AM, Stefano Garzarella wrote:

Hi Mike,

On Fri, Dec 04, 2020 at 01:56:25AM -0600, Mike Christie wrote:

These patches were made over mst's vhost branch.

The following patches, made over mst's vhost branch, allow userspace
to set each vq's cpu affinity. Currently, with cgroups the worker thread
inherits the affinity settings, but we are at the mercy of the CPU
scheduler for where the vq's IO will be executed on. This can result in
the scheduler sometimes hammering a couple queues on the host instead of
spreading it out like how the guest's app might have intended if it was
mq aware.

This version of the patches is not what you guys were talking about
initially like with the interface that was similar to nbd's old
(3.x kernel days) NBD_DO_IT ioctl where userspace calls down to the
kernel and we run from that context. These patches instead just
allow userspace to tell the kernel which CPU a vq should run on.
We then use the kernel's workqueue code to handle the thread
management.


I agree that reusing kernel's workqueue code would be a good strategy.

One concern is how easy it is to implement an adaptive polling 
strategy using workqueues. From what I've seen, adding some 
polling of both backend and virtqueue helps to eliminate 
interrupts and reduce latency.


Would the polling you need to do be similar to the vhost net poll 
code like in vhost_net_busy_poll (different algorithm though)? But, 
we want to be able to poll multiple devs/vqs from the same CPU 
right? Something like:


retry:

for each poller on CPU N
    if poller has work
        driver->run work fn

if (poll limit hit)
    return
else
    cpu_relax();
goto retry:

?


Yeah, something similar. IIUC vhost_net_busy_poll() polls both vring and 
backend (socket).


Maybe we need to limit the work->fn amount of work to avoid starvation.



If so, I had an idea for it. Let me send an additional patch on top 
of this set.


Sure :-)



Oh yeah, just to make sure I am on the same page for vdpa, because 
scsi and net work so differnetly.


Were you thinking that you would initially run from

vhost_poll_wakeup -> work->fn

then in the vdpa work->fn you would do the kick_vq still, but then 
also kick off a group backend/vq poller. This would then poll the 
vqs/devs that were bound to that CPU from the worker/wq thread.


Yes, this seams reasonable!



So I was thinking you want something similar to network's NAPI. Here 


I don't know NAPI very well, but IIUC the goal is the same: try to avoid 
notifications (IRQs from the device, vm-exit from the guest) doing an 
adaptive polling.


our work->fn is the hard irq, and then the worker is like their softirq 
we poll from.




I'm a little lost here...

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH 00/27] vDPA software assisted live migration

2020-12-09 Thread Stefan Hajnoczi
On Wed, Dec 09, 2020 at 04:26:50AM -0500, Jason Wang wrote:
> - Original Message -
> > On Fri, Nov 20, 2020 at 07:50:38PM +0100, Eugenio Pérez wrote:
> > > This series enable vDPA software assisted live migration for vhost-net
> > > devices. This is a new method of vhost devices migration: Instead of
> > > relay on vDPA device's dirty logging capability, SW assisted LM
> > > intercepts dataplane, forwarding the descriptors between VM and device.
> > 
> > Pros:
> > + vhost/vDPA devices don't need to implement dirty memory logging
> > + Obsoletes ioctl(VHOST_SET_LOG_BASE) and friends
> > 
> > Cons:
> > - Not generic, relies on vhost-net-specific ioctls
> > - Doesn't support VIRTIO Shared Memory Regions
> >   https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex
> 
> I may miss something but my understanding is that it's the
> responsiblity of device to migrate this part?

Good point. You're right.

> > - Performance (see below)
> > 
> > I think performance will be significantly lower when the shadow vq is
> > enabled. Imagine a vDPA device with hardware vq doorbell registers
> > mapped into the guest so the guest driver can directly kick the device.
> > When the shadow vq is enabled a vmexit is needed to write to the shadow
> > vq ioeventfd, then the host kernel scheduler switches to a QEMU thread
> > to read the ioeventfd, the descriptors are translated, QEMU writes to
> > the vhost hdev kick fd, the host kernel scheduler switches to the vhost
> > worker thread, vhost/vDPA notifies the virtqueue, and finally the
> > vDPA driver writes to the hardware vq doorbell register. That is a lot
> > of overhead compared to writing to an exitless MMIO register!
> 
> I think it's a balance. E.g we can poll the virtqueue to have an
> exitless doorbell.
> 
> > 
> > If the shadow vq was implemented in drivers/vhost/ and QEMU used the
> > existing ioctl(VHOST_SET_LOG_BASE) approach, then the overhead would be
> > reduced to just one set of ioeventfd/irqfd. In other words, the QEMU
> > dirty memory logging happens asynchronously and isn't in the dataplane.
> > 
> > In addition, hardware that supports dirty memory logging as well as
> > software vDPA devices could completely eliminate the shadow vq for even
> > better performance.
> 
> Yes. That's our plan. But the interface might require more thought.
> 
> E.g is the bitmap a good approach? To me reporting dirty pages via
> virqueue is better since it get less footprint and is self throttled.
> 
> And we need an address space other than the one used by guest for
> either bitmap for virtqueue.
> 
> > 
> > But performance is a question of "is it good enough?". Maybe this
> > approach is okay and users don't expect good performance while dirty
> > memory logging is enabled.
> 
> Yes, and actually such slow down may help for the converge of the
> migration.
> 
> Note that the whole idea is try to have a generic solution for all
> types of devices. It's good to consider the performance but for the
> first stage, it should be sufficient to make it work and consider to
> optimize on top.

Moving the shadow vq to the kernel later would be quite a big change
requiring rewriting much of the code. That's why I mentioned this now
before a lot of effort is invested in a QEMU implementation.

> > I just wanted to share the idea of moving the
> > shadow vq into the kernel in case you like that approach better.
> 
> My understanding is to keep kernel as simple as possible and leave the
> polices to userspace as much as possible. E.g it requires us to
> disable doorbell mapping and irq offloading, all of which were under
> the control of userspace.

If the performance is acceptable with the QEMU approach then I think
that's the best place to implement it. It looks high-overhead though so
maybe one of the first things to do is to run benchmarks to collect data
on how it performs?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net v2] tun: fix ubuf refcount incorrectly on error path

2020-12-09 Thread Willem de Bruijn
On Wed, Dec 9, 2020 at 8:03 AM wangyunjian  wrote:
>
> From: Yunjian Wang 
>
> After setting callback for ubuf_info of skb, the callback
> (vhost_net_zerocopy_callback) will be called to decrease
> the refcount when freeing skb. But when an exception occurs

With exception, you mean if tun_get_user returns an error that
propagates to the sendmsg call in vhost handle_tx, correct?

> afterwards, the error handling in vhost handle_tx() will
> try to decrease the same refcount again. This is wrong and
> fix this by delay copying ubuf_info until we're sure
> there's no errors.

I think the right approach is to address this in the error paths,
rather than complicate the normal datapath.

Is it sufficient to suppress the call to vhost_net_ubuf_put in the
handle_tx sendmsg error path, given that vhost_zerocopy_callback
will be called on kfree_skb?

Or alternatively clear the destructor in drop:

>
> Fixes: 4477138fa0ae ("tun: properly test for IFF_UP")
> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
>
> Signed-off-by: Yunjian Wang 
> ---
> v2:
>Updated code, fix by delay copying ubuf_info
> ---
>  drivers/net/tun.c | 29 +++--
>  1 file changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 2dc1988a8973..2ea822328e73 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1637,6 +1637,20 @@ static struct sk_buff *tun_build_skb(struct tun_struct 
> *tun,
> return NULL;
>  }
>
> +/* copy ubuf_info for callback when skb has no error */
> +static inline void tun_copy_ubuf_info(struct sk_buff *skb, bool zerocopy, 
> void *msg_control)
> +{
> +   if (zerocopy) {
> +   skb_shinfo(skb)->destructor_arg = msg_control;
> +   skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
> +   skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
> +   } else if (msg_control) {
> +   struct ubuf_info *uarg = msg_control;
> +
> +   uarg->callback(uarg, false);
> +   }
> +}
> +
>  /* Get packet from user space buffer */
>  static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> void *msg_control, struct iov_iter *from,
> @@ -1812,16 +1826,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
> struct tun_file *tfile,
> break;
> }
>
> -   /* copy skb_ubuf_info for callback when skb has no error */
> -   if (zerocopy) {
> -   skb_shinfo(skb)->destructor_arg = msg_control;
> -   skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
> -   skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
> -   } else if (msg_control) {
> -   struct ubuf_info *uarg = msg_control;
> -   uarg->callback(uarg, false);
> -   }
> -
> skb_reset_network_header(skb);
> skb_probe_transport_header(skb);
> skb_record_rx_queue(skb, tfile->queue_index);
> @@ -1830,6 +1834,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
> struct tun_file *tfile,
> struct bpf_prog *xdp_prog;
> int ret;
>
> +   tun_copy_ubuf_info(skb, zerocopy, msg_control);
> local_bh_disable();
> rcu_read_lock();
> xdp_prog = rcu_dereference(tun->xdp_prog);
> @@ -1881,6 +1886,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
> struct tun_file *tfile,
> return -ENOMEM;
> }
>
> +   tun_copy_ubuf_info(skb, zerocopy, msg_control);
> local_bh_disable();
> napi_gro_frags(>napi);
> local_bh_enable();
> @@ -1889,6 +1895,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
> struct tun_file *tfile,
> struct sk_buff_head *queue = >sk.sk_write_queue;
> int queue_len;
>
> +   tun_copy_ubuf_info(skb, zerocopy, msg_control);
> spin_lock_bh(>lock);
> __skb_queue_tail(queue, skb);
> queue_len = skb_queue_len(queue);
> @@ -1899,8 +1906,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
> struct tun_file *tfile,
>
> local_bh_enable();
> } else if (!IS_ENABLED(CONFIG_4KSTACKS)) {
> +   tun_copy_ubuf_info(skb, zerocopy, msg_control);
> tun_rx_batched(tun, tfile, skb, more);
> } else {
> +   tun_copy_ubuf_info(skb, zerocopy, msg_control);
> netif_rx_ni(skb);
> }
> rcu_read_unlock();
> --
> 2.23.0
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 3/8] dma-buf: Add vmap_local and vnumap_local operations

2020-12-09 Thread Thomas Zimmermann

Hi

Am 09.12.20 um 15:25 schrieb Thomas Zimmermann:

The existing dma-buf calls dma_buf_vmap() and dma_buf_vunmap() are
allowed to pin the buffer or acquire the buffer's reservation object
lock.

This is a problem for callers that only require a short-term mapping
of the buffer without the pinning, or callers that have special locking
requirements. These may suffer from unnecessary overhead or interfere
with regular pin operations.

The new interfaces dma_buf_vmap_local(), dma_buf_vunmapo_local(), and
their rsp callbacks in struct dma_buf_ops provide an alternative without
pinning or reservation locking. Callers are responsible for these
operations.

Signed-off-by: Thomas Zimmermann 


Before merging, this patch would get a Suggested-by tag to give Daniel 
credit. I only notices after hitting the send button.


Best regards
Thomas


---
  drivers/dma-buf/dma-buf.c | 80 +++
  include/linux/dma-buf.h   | 34 +
  2 files changed, 114 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e63684d4cd90..be9f80190a66 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1265,6 +1265,86 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
dma_buf_map *map)
  }
  EXPORT_SYMBOL_GPL(dma_buf_vunmap);
  
+/**

+ * dma_buf_vmap_local - Create virtual mapping for the buffer object into 
kernel
+ * address space.
+ * @dmabuf:[in]buffer to vmap
+ * @map:   [out]   returns the vmap pointer
+ *
+ * This call may fail due to lack of virtual mapping address space.
+ * These calls are optional in drivers. The intended use for them
+ * is for mapping objects linear in kernel space for high use objects.
+ * Please attempt to use kmap/kunmap before thinking about these interfaces.
+ *
+ * Returns:
+ * 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_vmap_local(struct dma_buf *dmabuf, struct dma_buf_map *map)
+{
+   struct dma_buf_map ptr;
+   int ret = 0;
+
+   dma_buf_map_clear(map);
+
+   if (WARN_ON(!dmabuf))
+   return -EINVAL;
+
+   dma_resv_assert_held(dmabuf->resv);
+
+   if (!dmabuf->ops->vmap_local)
+   return -EINVAL;
+
+   mutex_lock(>lock);
+   if (dmabuf->vmapping_counter) {
+   dmabuf->vmapping_counter++;
+   BUG_ON(dma_buf_map_is_null(>vmap_ptr));
+   *map = dmabuf->vmap_ptr;
+   goto out_unlock;
+   }
+
+   BUG_ON(dma_buf_map_is_set(>vmap_ptr));
+
+   ret = dmabuf->ops->vmap_local(dmabuf, );
+   if (WARN_ON_ONCE(ret))
+   goto out_unlock;
+
+   dmabuf->vmap_ptr = ptr;
+   dmabuf->vmapping_counter = 1;
+
+   *map = dmabuf->vmap_ptr;
+
+out_unlock:
+   mutex_unlock(>lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(dma_buf_vmap_local);
+
+/**
+ * dma_buf_vunmap_local - Unmap a vmap obtained by dma_buf_vmap_local.
+ * @dmabuf:[in]buffer to vunmap
+ * @map:   [in]vmap pointer to vunmap
+ */
+void dma_buf_vunmap_local(struct dma_buf *dmabuf, struct dma_buf_map *map)
+{
+   if (WARN_ON(!dmabuf))
+   return;
+
+   dma_resv_assert_held(dmabuf->resv);
+
+   BUG_ON(dma_buf_map_is_null(>vmap_ptr));
+   BUG_ON(dmabuf->vmapping_counter == 0);
+   BUG_ON(!dma_buf_map_is_equal(>vmap_ptr, map));
+
+   mutex_lock(>lock);
+   if (--dmabuf->vmapping_counter == 0) {
+   if (dmabuf->ops->vunmap_local)
+   dmabuf->ops->vunmap_local(dmabuf, map);
+   dma_buf_map_clear(>vmap_ptr);
+   }
+   mutex_unlock(>lock);
+}
+EXPORT_SYMBOL_GPL(dma_buf_vunmap_local);
+
  #ifdef CONFIG_DEBUG_FS
  static int dma_buf_debug_show(struct seq_file *s, void *unused)
  {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index cf72699cb2bc..f66580d23a9b 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -269,6 +269,38 @@ struct dma_buf_ops {
  
  	int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);

void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+   /**
+* @vmap_local:
+*
+* Creates a virtual mapping for the buffer into kernel address space.
+*
+* This callback establishes short-term mappings for situations where
+* callers only use the buffer for a bounded amount of time; such as
+* updates to the framebuffer or reading back contained information.
+* In contrast to the regular @vmap callback, vmap_local does never pin
+* the buffer to a specific domain or acquire the buffer's reservation
+* lock.
+*
+* This is called with the dmabuf->resv object locked. Callers must hold
+* the lock until after removing the mapping with @vunmap_local.
+*
+* This callback is optional.
+*
+* Returns:
+*
+* 0 on success or a negative error code on failure.
+*/

[PATCH v3 5/8] drm/cma-helper: Provide a vmap function for short-term mappings

2020-12-09 Thread Thomas Zimmermann
Implementations of the vmap/vunmap GEM callbacks may perform pinning
of the BO and may acquire the associated reservation object's lock.
Callers that only require a mapping of the contained memory can thus
interfere with other tasks that require exact pinning, such as scanout.
This is less of an issue with private CMA buffers, but may happen
with imported ones.

Therefore provide the new interface drm_gem_cma_vmap_local(), which only
performs the vmap operations. Callers have to hold the reservation lock
while the mapping persists.

This patch also connects GEM CMA helpers to the GEM object function with
equivalent functionality.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 35 
 drivers/gpu/drm/vc4/vc4_bo.c | 13 +++
 drivers/gpu/drm/vc4/vc4_drv.h|  1 +
 include/drm/drm_gem_cma_helper.h |  1 +
 4 files changed, 50 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
b/drivers/gpu/drm/drm_gem_cma_helper.c
index 7942cf05cd93..40b3e8e3fc42 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -38,6 +38,7 @@ static const struct drm_gem_object_funcs 
drm_gem_cma_default_funcs = {
.print_info = drm_gem_cma_print_info,
.get_sg_table = drm_gem_cma_get_sg_table,
.vmap = drm_gem_cma_vmap,
+   .vmap_local = drm_gem_cma_vmap_local,
.mmap = drm_gem_cma_mmap,
.vm_ops = _gem_cma_vm_ops,
 };
@@ -471,6 +472,40 @@ int drm_gem_cma_vmap(struct drm_gem_object *obj, struct 
dma_buf_map *map)
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_vmap);
 
+/**
+ * drm_gem_cma_vmap_local - map a CMA GEM object into the kernel's virtual
+ * address space
+ * @obj: GEM object
+ * @map: Returns the kernel virtual address of the CMA GEM object's backing
+ *   store.
+ *
+ * This function maps a buffer into the kernel's
+ * virtual address space. Since the CMA buffers are already mapped into the
+ * kernel virtual address space this simply returns the cached virtual
+ * address. Drivers using the CMA helpers should set this as their DRM
+ * driver's _gem_object_funcs.vmap_local callback.
+ *
+ * Returns:
+ * 0 on success, or a negative error code otherwise.
+ */
+int drm_gem_cma_vmap_local(struct drm_gem_object *obj, struct dma_buf_map *map)
+{
+   struct drm_gem_cma_object *cma_obj = to_drm_gem_cma_obj(obj);
+
+   /*
+* TODO: The code in drm_gem_cma_prime_import_sg_table_vmap()
+*   establishes this mapping. The correct solution would
+*   be to call dma_buf_vmap_local() here.
+*
+*   If we find a case where we absolutely have to call
+*   dma_buf_vmap_local(), the code needs to be restructured.
+*/
+   dma_buf_map_set_vaddr(map, cma_obj->vaddr);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_vmap_local);
+
 /**
  * drm_gem_cma_mmap - memory-map an exported CMA GEM object
  * @obj: GEM object
diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
index dc316cb79e00..ec57326c69c4 100644
--- a/drivers/gpu/drm/vc4/vc4_bo.c
+++ b/drivers/gpu/drm/vc4/vc4_bo.c
@@ -387,6 +387,7 @@ static const struct drm_gem_object_funcs 
vc4_gem_object_funcs = {
.export = vc4_prime_export,
.get_sg_table = drm_gem_cma_get_sg_table,
.vmap = vc4_prime_vmap,
+   .vmap_local = vc4_prime_vmap_local,
.vm_ops = _vm_ops,
 };
 
@@ -797,6 +798,18 @@ int vc4_prime_vmap(struct drm_gem_object *obj, struct 
dma_buf_map *map)
return drm_gem_cma_vmap(obj, map);
 }
 
+int vc4_prime_vmap_local(struct drm_gem_object *obj, struct dma_buf_map *map)
+{
+   struct vc4_bo *bo = to_vc4_bo(obj);
+
+   if (bo->validated_shader) {
+   DRM_DEBUG("mmaping of shader BOs not allowed.\n");
+   return -EINVAL;
+   }
+
+   return drm_gem_cma_vmap_local(obj, map);
+}
+
 struct drm_gem_object *
 vc4_prime_import_sg_table(struct drm_device *dev,
  struct dma_buf_attachment *attach,
diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
index 43a1af110b3e..efb6c47d318f 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.h
+++ b/drivers/gpu/drm/vc4/vc4_drv.h
@@ -812,6 +812,7 @@ struct drm_gem_object *vc4_prime_import_sg_table(struct 
drm_device *dev,
 struct dma_buf_attachment 
*attach,
 struct sg_table *sgt);
 int vc4_prime_vmap(struct drm_gem_object *obj, struct dma_buf_map *map);
+int vc4_prime_vmap_local(struct drm_gem_object *obj, struct dma_buf_map *map);
 int vc4_bo_cache_init(struct drm_device *dev);
 int vc4_bo_inc_usecnt(struct vc4_bo *bo);
 void vc4_bo_dec_usecnt(struct vc4_bo *bo);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 0a9711caa3e8..05122e71bc6d 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -99,6 +99,7 @@ 

[PATCH v3 8/8] drm/fb-helper: Move BO locking from DRM client to fbdev damage worker

2020-12-09 Thread Thomas Zimmermann
Fbdev emulation has to lock the BO into place while flushing the shadow
buffer into the BO's memory. Remove any interference with pinning by
using vmap_local functionality (instead of full vmap). This requires
BO reservation locking in fbdev's damage worker.

The new DRM client functions for locking and vmap_local functionality
are added for consistency with the existing style.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_client.c| 91 +
 drivers/gpu/drm/drm_fb_helper.c | 41 +++
 include/drm/drm_client.h|  4 ++
 3 files changed, 116 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
index ce45e380f4a2..795f5cb052ba 100644
--- a/drivers/gpu/drm/drm_client.c
+++ b/drivers/gpu/drm/drm_client.c
@@ -288,6 +288,37 @@ drm_client_buffer_create(struct drm_client_dev *client, 
u32 width, u32 height, u
return ERR_PTR(ret);
 }
 
+/**
+ * drm_client_buffer_lock - Locks the DRM client buffer
+ * @buffer: DRM client buffer
+ *
+ * This function locks the client buffer by acquiring the buffer
+ * object's reservation lock.
+ *
+ * Unlock the buffer with drm_client_buffer_unlock().
+ *
+ * Returns:
+ * 0 on success, or a negative errno code otherwise.
+ */
+int
+drm_client_buffer_lock(struct drm_client_buffer *buffer)
+{
+   return dma_resv_lock(buffer->gem->resv, NULL);
+}
+EXPORT_SYMBOL(drm_client_buffer_lock);
+
+/**
+ * drm_client_buffer_unlock - Unlock DRM client buffer
+ * @buffer: DRM client buffer
+ *
+ * Unlocks a client buffer. See drm_client_buffer_lock().
+ */
+void drm_client_buffer_unlock(struct drm_client_buffer *buffer)
+{
+   dma_resv_unlock(buffer->gem->resv);
+}
+EXPORT_SYMBOL(drm_client_buffer_unlock);
+
 /**
  * drm_client_buffer_vmap - Map DRM client buffer into address space
  * @buffer: DRM client buffer
@@ -348,6 +379,66 @@ void drm_client_buffer_vunmap(struct drm_client_buffer 
*buffer)
 }
 EXPORT_SYMBOL(drm_client_buffer_vunmap);
 
+/**
+ * drm_client_buffer_vmap_local - Map DRM client buffer into address space
+ * @buffer: DRM client buffer
+ * @map_copy: Returns the mapped memory's address
+ *
+ * This function maps a client buffer into kernel address space. If the
+ * buffer is already mapped, it returns the existing mapping's address.
+ *
+ * Client buffer mappings are not ref'counted. Each call to
+ * drm_client_buffer_vmap_local() should be followed by a call to
+ * drm_client_buffer_vunmap_local(); or the client buffer should be mapped
+ * throughout its lifetime.
+ *
+ * The returned address is a copy of the internal value. In contrast to
+ * other vmap interfaces, you don't need it for the client's vunmap
+ * function. So you can modify it at will during blit and draw operations.
+ *
+ * Returns:
+ * 0 on success, or a negative errno code otherwise.
+ */
+int
+drm_client_buffer_vmap_local(struct drm_client_buffer *buffer, struct 
dma_buf_map *map_copy)
+{
+   struct dma_buf_map *map = >map;
+   int ret;
+
+   /*
+* FIXME: The dependency on GEM here isn't required, we could
+* convert the driver handle to a dma-buf instead and use the
+* backend-agnostic dma-buf vmap_local support instead. This would
+* require that the handle2fd prime ioctl is reworked to pull the
+* fd_install step out of the driver backend hooks, to make that
+* final step optional for internal users.
+*/
+   ret = drm_gem_vmap_local(buffer->gem, map);
+   if (ret)
+   return ret;
+
+   *map_copy = *map;
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_client_buffer_vmap_local);
+
+/**
+ * drm_client_buffer_vunmap_local - Unmap DRM client buffer
+ * @buffer: DRM client buffer
+ *
+ * This function removes a client buffer's memory mapping. Calling this
+ * function is only required by clients that manage their buffer mappings
+ * by themselves.
+ */
+void drm_client_buffer_vunmap_local(struct drm_client_buffer *buffer)
+{
+   struct dma_buf_map *map = >map;
+
+   drm_gem_vunmap_local(buffer->gem, map);
+}
+EXPORT_SYMBOL(drm_client_buffer_vunmap_local);
+
 static void drm_client_buffer_rmfb(struct drm_client_buffer *buffer)
 {
int ret;
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index e82db0f4e771..a56a7d9f7e35 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -399,28 +399,34 @@ static int drm_fb_helper_damage_blit(struct drm_fb_helper 
*fb_helper,
int ret;
 
/*
-* We have to pin the client buffer to its current location while
-* flushing the shadow buffer. In the general case, concurrent
-* modesetting operations could try to move the buffer and would
-* fail. The modeset has to be serialized by acquiring the reservation
-* object of the underlying BO here.
-*
 * For fbdev emulation, we only have to protect against fbdev modeset
  

[PATCH v3 3/8] dma-buf: Add vmap_local and vnumap_local operations

2020-12-09 Thread Thomas Zimmermann
The existing dma-buf calls dma_buf_vmap() and dma_buf_vunmap() are
allowed to pin the buffer or acquire the buffer's reservation object
lock.

This is a problem for callers that only require a short-term mapping
of the buffer without the pinning, or callers that have special locking
requirements. These may suffer from unnecessary overhead or interfere
with regular pin operations.

The new interfaces dma_buf_vmap_local(), dma_buf_vunmapo_local(), and
their rsp callbacks in struct dma_buf_ops provide an alternative without
pinning or reservation locking. Callers are responsible for these
operations.

Signed-off-by: Thomas Zimmermann 
---
 drivers/dma-buf/dma-buf.c | 80 +++
 include/linux/dma-buf.h   | 34 +
 2 files changed, 114 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e63684d4cd90..be9f80190a66 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1265,6 +1265,86 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct 
dma_buf_map *map)
 }
 EXPORT_SYMBOL_GPL(dma_buf_vunmap);
 
+/**
+ * dma_buf_vmap_local - Create virtual mapping for the buffer object into 
kernel
+ * address space.
+ * @dmabuf:[in]buffer to vmap
+ * @map:   [out]   returns the vmap pointer
+ *
+ * This call may fail due to lack of virtual mapping address space.
+ * These calls are optional in drivers. The intended use for them
+ * is for mapping objects linear in kernel space for high use objects.
+ * Please attempt to use kmap/kunmap before thinking about these interfaces.
+ *
+ * Returns:
+ * 0 on success, or a negative errno code otherwise.
+ */
+int dma_buf_vmap_local(struct dma_buf *dmabuf, struct dma_buf_map *map)
+{
+   struct dma_buf_map ptr;
+   int ret = 0;
+
+   dma_buf_map_clear(map);
+
+   if (WARN_ON(!dmabuf))
+   return -EINVAL;
+
+   dma_resv_assert_held(dmabuf->resv);
+
+   if (!dmabuf->ops->vmap_local)
+   return -EINVAL;
+
+   mutex_lock(>lock);
+   if (dmabuf->vmapping_counter) {
+   dmabuf->vmapping_counter++;
+   BUG_ON(dma_buf_map_is_null(>vmap_ptr));
+   *map = dmabuf->vmap_ptr;
+   goto out_unlock;
+   }
+
+   BUG_ON(dma_buf_map_is_set(>vmap_ptr));
+
+   ret = dmabuf->ops->vmap_local(dmabuf, );
+   if (WARN_ON_ONCE(ret))
+   goto out_unlock;
+
+   dmabuf->vmap_ptr = ptr;
+   dmabuf->vmapping_counter = 1;
+
+   *map = dmabuf->vmap_ptr;
+
+out_unlock:
+   mutex_unlock(>lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(dma_buf_vmap_local);
+
+/**
+ * dma_buf_vunmap_local - Unmap a vmap obtained by dma_buf_vmap_local.
+ * @dmabuf:[in]buffer to vunmap
+ * @map:   [in]vmap pointer to vunmap
+ */
+void dma_buf_vunmap_local(struct dma_buf *dmabuf, struct dma_buf_map *map)
+{
+   if (WARN_ON(!dmabuf))
+   return;
+
+   dma_resv_assert_held(dmabuf->resv);
+
+   BUG_ON(dma_buf_map_is_null(>vmap_ptr));
+   BUG_ON(dmabuf->vmapping_counter == 0);
+   BUG_ON(!dma_buf_map_is_equal(>vmap_ptr, map));
+
+   mutex_lock(>lock);
+   if (--dmabuf->vmapping_counter == 0) {
+   if (dmabuf->ops->vunmap_local)
+   dmabuf->ops->vunmap_local(dmabuf, map);
+   dma_buf_map_clear(>vmap_ptr);
+   }
+   mutex_unlock(>lock);
+}
+EXPORT_SYMBOL_GPL(dma_buf_vunmap_local);
+
 #ifdef CONFIG_DEBUG_FS
 static int dma_buf_debug_show(struct seq_file *s, void *unused)
 {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index cf72699cb2bc..f66580d23a9b 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -269,6 +269,38 @@ struct dma_buf_ops {
 
int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+   /**
+* @vmap_local:
+*
+* Creates a virtual mapping for the buffer into kernel address space.
+*
+* This callback establishes short-term mappings for situations where
+* callers only use the buffer for a bounded amount of time; such as
+* updates to the framebuffer or reading back contained information.
+* In contrast to the regular @vmap callback, vmap_local does never pin
+* the buffer to a specific domain or acquire the buffer's reservation
+* lock.
+*
+* This is called with the dmabuf->resv object locked. Callers must hold
+* the lock until after removing the mapping with @vunmap_local.
+*
+* This callback is optional.
+*
+* Returns:
+*
+* 0 on success or a negative error code on failure.
+*/
+   int (*vmap_local)(struct dma_buf *dmabuf, struct dma_buf_map *map);
+
+   /**
+* @vunmap_local:
+*
+* Removes a virtual mapping that wa sestablished by @vmap_local.
+*
+ 

[PATCH v3 7/8] drm/vram-helper: Provide a vmap function for short-term mappings

2020-12-09 Thread Thomas Zimmermann
Implementations of the vmap/vunmap GEM callbacks may perform pinning
of the BO and may acquire the associated reservation object's lock.
It's somewhat inconvenient to callers that simply require a mapping of
the contained memory; and also ipmplies a certain overhead.

Therefore provide drm_gem_vram_vmap_local() drm_gem_vram_vunmap_local(),
which only perform the vmap/vunmap operations. Callers have to hold the
reservation lock while the mapping persists; or have to pin the BO by
themselves.

The affected callers are cursor updates in ast and vboxvideo. Both
are being changed to the new interface.

This patch connects GEM VRAM helpers to GEM object functions with
equivalent functionality.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/ast/ast_cursor.c  |  37 +--
 drivers/gpu/drm/drm_gem_vram_helper.c | 142 +-
 drivers/gpu/drm/vboxvideo/vbox_mode.c |  15 +--
 include/drm/drm_gem_vram_helper.h |   2 +
 4 files changed, 132 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_cursor.c b/drivers/gpu/drm/ast/ast_cursor.c
index fac1ee79c372..c38f435bcde2 100644
--- a/drivers/gpu/drm/ast/ast_cursor.c
+++ b/drivers/gpu/drm/ast/ast_cursor.c
@@ -159,6 +159,8 @@ int ast_cursor_blit(struct ast_private *ast, struct 
drm_framebuffer *fb)
struct drm_device *dev = >base;
struct drm_gem_vram_object *dst_gbo = 
ast->cursor.gbo[ast->cursor.next_index];
struct drm_gem_vram_object *src_gbo = drm_gem_vram_of_gem(fb->obj[0]);
+   struct drm_gem_object *objs[] = {_gbo->bo.base, _gbo->bo.base};
+   struct ww_acquire_ctx ctx;
struct dma_buf_map src_map, dst_map;
void __iomem *dst;
void *src;
@@ -168,26 +170,34 @@ int ast_cursor_blit(struct ast_private *ast, struct 
drm_framebuffer *fb)
drm_WARN_ON_ONCE(dev, fb->height > AST_MAX_HWC_HEIGHT))
return -EINVAL;
 
-   ret = drm_gem_vram_vmap(src_gbo, _map);
+   ret = drm_gem_lock_reservations(objs, ARRAY_SIZE(objs), );
if (ret)
return ret;
+
+   ret = drm_gem_vram_vmap_local(src_gbo, _map);
+   if (ret)
+   goto err_drm_gem_unlock_reservations;
src = src_map.vaddr; /* TODO: Use mapping abstraction properly */
 
-   ret = drm_gem_vram_vmap(dst_gbo, _map);
+   ret = drm_gem_vram_vmap_local(dst_gbo, _map);
if (ret)
-   goto err_drm_gem_vram_vunmap;
+   goto err_drm_gem_vram_vunmap_local;
dst = dst_map.vaddr_iomem; /* TODO: Use mapping abstraction properly */
 
/* do data transfer to cursor BO */
update_cursor_image(dst, src, fb->width, fb->height);
 
-   drm_gem_vram_vunmap(dst_gbo, _map);
-   drm_gem_vram_vunmap(src_gbo, _map);
+   drm_gem_vram_vunmap_local(dst_gbo, _map);
+   drm_gem_vram_vunmap_local(src_gbo, _map);
+
+   drm_gem_unlock_reservations(objs, ARRAY_SIZE(objs), );
 
return 0;
 
-err_drm_gem_vram_vunmap:
-   drm_gem_vram_vunmap(src_gbo, _map);
+err_drm_gem_vram_vunmap_local:
+   drm_gem_vram_vunmap_local(src_gbo, _map);
+err_drm_gem_unlock_reservations:
+   drm_gem_unlock_reservations(objs, ARRAY_SIZE(objs), );
return ret;
 }
 
@@ -241,6 +251,7 @@ void ast_cursor_show(struct ast_private *ast, int x, int y,
 {
struct drm_device *dev = >base;
struct drm_gem_vram_object *gbo = 
ast->cursor.gbo[ast->cursor.next_index];
+   struct drm_gem_object *obj = >bo.base;
struct dma_buf_map map;
u8 x_offset, y_offset;
u8 __iomem *dst;
@@ -248,16 +259,22 @@ void ast_cursor_show(struct ast_private *ast, int x, int 
y,
u8 jreg;
int ret;
 
-   ret = drm_gem_vram_vmap(gbo, );
-   if (drm_WARN_ONCE(dev, ret, "drm_gem_vram_vmap() failed, ret=%d\n", 
ret))
+   ret = dma_resv_lock(obj->resv, NULL);
+   if (ret)
+   return;
+   ret = drm_gem_vram_vmap_local(gbo, );
+   if (drm_WARN_ONCE(dev, ret, "drm_gem_vram_vmap_local() failed, 
ret=%d\n", ret)) {
+   dma_resv_unlock(obj->resv);
return;
+   }
dst = map.vaddr_iomem; /* TODO: Use mapping abstraction properly */
 
sig = dst + AST_HWC_SIZE;
writel(x, sig + AST_HWC_SIGNATURE_X);
writel(y, sig + AST_HWC_SIGNATURE_Y);
 
-   drm_gem_vram_vunmap(gbo, );
+   drm_gem_vram_vunmap_local(gbo, );
+   dma_resv_unlock(obj->resv);
 
if (x < 0) {
x_offset = (-x) + offset_x;
diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c
index 02ca22e90290..08a713993896 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -379,47 +379,6 @@ int drm_gem_vram_unpin(struct drm_gem_vram_object *gbo)
 }
 EXPORT_SYMBOL(drm_gem_vram_unpin);
 
-static int drm_gem_vram_kmap_locked(struct drm_gem_vram_object *gbo,
-   struct dma_buf_map *map)
-{
-   int ret;
-
-   

[PATCH v3 0/8] drm: Support short-term vmap via vmap_local

2020-12-09 Thread Thomas Zimmermann
(was: drm/vram-helper: Lock GEM BOs while they are mapped)

GEM VRAM helpers used to pin the BO in their implementation of vmap, so
that they could not be relocated. In recent discussions, [1][2] it became
clear that this is incorrect for in-kernel use cases, such as fbdev
emulation; which should rather depend on the reservation lock to prevent
relocation.

This patchset addresses the issue by introducing the new interfaces
vmap_local and vunmap_local throughout dma-buf and GEM. It further adds
support to DRM's CMA, SHMEM and VRAM helpers and finally converts fbdev
emulation to the new interface.

Patches 1 and 2 prepare the ast cursor code for the later changes.

Patches 3 and 4 add the vmap_local infrastructure throughout dma-buf,
GEM and PRIME.

Patches 5 to 7 add implementations of vmap_local to DRM's various GEM
helper libraries. Due to the simple nature of these libraries, existing
vmap code can be reused easily. Several drivers are updateed as well to
use the new interfaces.

Patch 8 converts generic fbdev emulation to use vmap_local. Only DRM
drivers that use GEM helpers currently use fbdev emulation, so patches
5 to 7 covered all necessary instances.

I smoke-tested the patchset with ast (VRAM helpers), mgag200 (SHMEM) and
vc4 (CMA). I also tested with a version of radeon (raw TTM) that has been
converted to generic fbdev emulation.

v3:
* rewrite patchset around vmap_local
v2:
* make importers acquire resv locks by themselves
* document dma-buf vamp/vunmap ops

[1] https://patchwork.freedesktop.org/patch/400054/?series=83765=1
[2] https://patchwork.freedesktop.org/patch/405407/?series=84401=2

Thomas Zimmermann (8):
  drm/ast: Don't pin cursor source BO explicitly during update
  drm/ast: Only map cursor BOs during updates
  dma-buf: Add vmap_local and vnumap_local operations
  drm/gem: Create infrastructure for GEM vmap_local
  drm/cma-helper: Provide a vmap function for short-term mappings
  drm/shmem-helper: Provide a vmap function for short-term mappings
  drm/vram-helper: Provide a vmap function for short-term mappings
  drm/fb-helper: Move BO locking from DRM client to fbdev damage worker

 drivers/dma-buf/dma-buf.c  |  80 ++
 drivers/gpu/drm/ast/ast_cursor.c   |  70 +++-
 drivers/gpu/drm/ast/ast_drv.h  |   2 -
 drivers/gpu/drm/drm_client.c   |  91 
 drivers/gpu/drm/drm_fb_helper.c|  41 +++
 drivers/gpu/drm/drm_gem.c  |  28 +
 drivers/gpu/drm/drm_gem_cma_helper.c   |  35 ++
 drivers/gpu/drm/drm_gem_shmem_helper.c |  71 -
 drivers/gpu/drm/drm_gem_vram_helper.c  | 142 -
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_prime.c|  39 +++
 drivers/gpu/drm/mgag200/mgag200_mode.c |  16 ++-
 drivers/gpu/drm/tiny/cirrus.c  |  10 +-
 drivers/gpu/drm/tiny/gm12u320.c|  14 ++-
 drivers/gpu/drm/udl/udl_modeset.c  |  18 ++--
 drivers/gpu/drm/vboxvideo/vbox_mode.c  |  15 +--
 drivers/gpu/drm/vc4/vc4_bo.c   |  13 +++
 drivers/gpu/drm/vc4/vc4_drv.h  |   1 +
 drivers/gpu/drm/virtio/virtgpu_prime.c |   2 +
 include/drm/drm_client.h   |   4 +
 include/drm/drm_gem.h  |  20 
 include/drm/drm_gem_cma_helper.h   |   1 +
 include/drm/drm_gem_shmem_helper.h |   2 +
 include/drm/drm_gem_vram_helper.h  |   2 +
 include/drm/drm_prime.h|   2 +
 include/linux/dma-buf.h|  34 ++
 26 files changed, 635 insertions(+), 120 deletions(-)

--
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 6/8] drm/shmem-helper: Provide a vmap function for short-term mappings

2020-12-09 Thread Thomas Zimmermann
Implementations of the vmap/vunmap GEM callbacks may perform pinning
of the BO and may acquire the associated reservation object's lock.
Callers that only require a mapping of the contained memory can thus
interfere with other tasks that require exact pinning, such as scanout.
This is less of an issue with private SHMEM buffers, but may happen
with imported ones.

Therefore provide the new interfaces drm_gem_shmem_vmap_local() and
drm_gem_shmem_vunmap_local(), which only perform the vmap/vunmap
operations. Callers have to hold the reservation lock while the mapping
persists.

The affected callers are display updates in cirrus, gm12u320, mgag200
and udl. All are being changed to the new interface.

This patch also connects GEM SHMEM helpers to GEM object functions with
equivalent functionality.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 71 +-
 drivers/gpu/drm/mgag200/mgag200_mode.c | 16 --
 drivers/gpu/drm/tiny/cirrus.c  | 10 +++-
 drivers/gpu/drm/tiny/gm12u320.c| 14 +++--
 drivers/gpu/drm/udl/udl_modeset.c  | 18 ---
 include/drm/drm_gem_shmem_helper.h |  2 +
 6 files changed, 115 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 9825c378dfa6..41663f48d46a 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -32,6 +32,8 @@ static const struct drm_gem_object_funcs drm_gem_shmem_funcs 
= {
.get_sg_table = drm_gem_shmem_get_sg_table,
.vmap = drm_gem_shmem_vmap,
.vunmap = drm_gem_shmem_vunmap,
+   .vmap_local = drm_gem_shmem_vmap_local,
+   .vunmap_local = drm_gem_shmem_vunmap_local,
.mmap = drm_gem_shmem_mmap,
 };
 
@@ -313,7 +315,7 @@ static int drm_gem_shmem_vmap_locked(struct 
drm_gem_shmem_object *shmem, struct
return ret;
 }
 
-/*
+/**
  * drm_gem_shmem_vmap - Create a virtual mapping for a shmem GEM object
  * @shmem: shmem GEM object
  * @map: Returns the kernel virtual address of the SHMEM GEM object's backing
@@ -346,6 +348,44 @@ int drm_gem_shmem_vmap(struct drm_gem_object *obj, struct 
dma_buf_map *map)
 }
 EXPORT_SYMBOL(drm_gem_shmem_vmap);
 
+/**
+ * drm_gem_shmem_vmap_local - Create a virtual mapping for a shmem GEM object
+ * @shmem: shmem GEM object
+ * @map: Returns the kernel virtual address of the SHMEM GEM object's backing
+ *   store.
+ *
+ * This function makes sure that a contiguous kernel virtual address mapping
+ * exists for the buffer backing the shmem GEM object.
+ *
+ * The function is called with the BO's reservation object locked. Callers must
+ * hold the lock until after unmapping the buffer.
+ *
+ * This function can be used to implement _gem_object_funcs.vmap_local. But
+ * it can also be called by drivers directly, in which case it will hide the
+ * differences between dma-buf imported and natively allocated objects.
+ *
+ * Acquired mappings should be cleaned up by calling 
drm_gem_shmem_vunmap_local().
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drm_gem_shmem_vmap_local(struct drm_gem_object *obj, struct dma_buf_map 
*map)
+{
+   struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
+   int ret;
+
+   dma_resv_assert_held(obj->resv);
+
+   ret = mutex_lock_interruptible(>vmap_lock);
+   if (ret)
+   return ret;
+   ret = drm_gem_shmem_vmap_locked(shmem, map);
+   mutex_unlock(>vmap_lock);
+
+   return ret;
+}
+EXPORT_SYMBOL(drm_gem_shmem_vmap_local);
+
 static void drm_gem_shmem_vunmap_locked(struct drm_gem_shmem_object *shmem,
struct dma_buf_map *map)
 {
@@ -366,7 +406,7 @@ static void drm_gem_shmem_vunmap_locked(struct 
drm_gem_shmem_object *shmem,
drm_gem_shmem_put_pages(shmem);
 }
 
-/*
+/**
  * drm_gem_shmem_vunmap - Unmap a virtual mapping fo a shmem GEM object
  * @shmem: shmem GEM object
  * @map: Kernel virtual address where the SHMEM GEM object was mapped
@@ -389,6 +429,33 @@ void drm_gem_shmem_vunmap(struct drm_gem_object *obj, 
struct dma_buf_map *map)
 }
 EXPORT_SYMBOL(drm_gem_shmem_vunmap);
 
+/**
+ * drm_gem_shmem_vunmap_local - Unmap a virtual mapping fo a shmem GEM object
+ * @shmem: shmem GEM object
+ * @map: Kernel virtual address where the SHMEM GEM object was mapped
+ *
+ * This function cleans up a kernel virtual address mapping acquired by
+ * drm_gem_shmem_vmap_local(). The mapping is only removed when the use count
+ * drops to zero.
+ *
+ * The function is called with the BO's reservation object locked.
+ *
+ * This function can be used to implement _gem_object_funcs.vmap_local.
+ * But it can also be called by drivers directly, in which case it will hide
+ * the differences between dma-buf imported and natively allocated objects.
+ */
+void drm_gem_shmem_vunmap_local(struct drm_gem_object *obj, struct dma_buf_map 
*map)
+{
+   struct 

[PATCH v3 2/8] drm/ast: Only map cursor BOs during updates

2020-12-09 Thread Thomas Zimmermann
The HW cursor's BO used to be mapped permanently into the kernel's
address space. GEM's vmap operation will be protected by locks, and
we don't want to lock the BO's for an indefinate period of time.

Change the cursor code to map the HW BOs only during updates. The
vmap operation in VRAM helpers is cheap, as a once estabished mapping
is being reused until the BO actually moves. As the HW cursor BOs are
permanently pinned, they never move at all.

v2:
* fix typos in commit description

Signed-off-by: Thomas Zimmermann 
Acked-by: Christian König 
---
 drivers/gpu/drm/ast/ast_cursor.c | 51 ++--
 drivers/gpu/drm/ast/ast_drv.h|  2 --
 2 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_cursor.c b/drivers/gpu/drm/ast/ast_cursor.c
index 68bf3d33f1ed..fac1ee79c372 100644
--- a/drivers/gpu/drm/ast/ast_cursor.c
+++ b/drivers/gpu/drm/ast/ast_cursor.c
@@ -39,7 +39,6 @@ static void ast_cursor_fini(struct ast_private *ast)
 
for (i = 0; i < ARRAY_SIZE(ast->cursor.gbo); ++i) {
gbo = ast->cursor.gbo[i];
-   drm_gem_vram_vunmap(gbo, >cursor.map[i]);
drm_gem_vram_unpin(gbo);
drm_gem_vram_put(gbo);
}
@@ -53,14 +52,13 @@ static void ast_cursor_release(struct drm_device *dev, void 
*ptr)
 }
 
 /*
- * Allocate cursor BOs and pins them at the end of VRAM.
+ * Allocate cursor BOs and pin them at the end of VRAM.
  */
 int ast_cursor_init(struct ast_private *ast)
 {
struct drm_device *dev = >base;
size_t size, i;
struct drm_gem_vram_object *gbo;
-   struct dma_buf_map map;
int ret;
 
size = roundup(AST_HWC_SIZE + AST_HWC_SIGNATURE_SIZE, PAGE_SIZE);
@@ -77,15 +75,7 @@ int ast_cursor_init(struct ast_private *ast)
drm_gem_vram_put(gbo);
goto err_drm_gem_vram_put;
}
-   ret = drm_gem_vram_vmap(gbo, );
-   if (ret) {
-   drm_gem_vram_unpin(gbo);
-   drm_gem_vram_put(gbo);
-   goto err_drm_gem_vram_put;
-   }
-
ast->cursor.gbo[i] = gbo;
-   ast->cursor.map[i] = map;
}
 
return drmm_add_action_or_reset(dev, ast_cursor_release, NULL);
@@ -94,7 +84,6 @@ int ast_cursor_init(struct ast_private *ast)
while (i) {
--i;
gbo = ast->cursor.gbo[i];
-   drm_gem_vram_vunmap(gbo, >cursor.map[i]);
drm_gem_vram_unpin(gbo);
drm_gem_vram_put(gbo);
}
@@ -168,31 +157,38 @@ static void update_cursor_image(u8 __iomem *dst, const u8 
*src, int width, int h
 int ast_cursor_blit(struct ast_private *ast, struct drm_framebuffer *fb)
 {
struct drm_device *dev = >base;
-   struct drm_gem_vram_object *gbo;
-   struct dma_buf_map map;
-   int ret;
-   void *src;
+   struct drm_gem_vram_object *dst_gbo = 
ast->cursor.gbo[ast->cursor.next_index];
+   struct drm_gem_vram_object *src_gbo = drm_gem_vram_of_gem(fb->obj[0]);
+   struct dma_buf_map src_map, dst_map;
void __iomem *dst;
+   void *src;
+   int ret;
 
if (drm_WARN_ON_ONCE(dev, fb->width > AST_MAX_HWC_WIDTH) ||
drm_WARN_ON_ONCE(dev, fb->height > AST_MAX_HWC_HEIGHT))
return -EINVAL;
 
-   gbo = drm_gem_vram_of_gem(fb->obj[0]);
-
-   ret = drm_gem_vram_vmap(gbo, );
+   ret = drm_gem_vram_vmap(src_gbo, _map);
if (ret)
return ret;
-   src = map.vaddr; /* TODO: Use mapping abstraction properly */
+   src = src_map.vaddr; /* TODO: Use mapping abstraction properly */
 
-   dst = ast->cursor.map[ast->cursor.next_index].vaddr_iomem;
+   ret = drm_gem_vram_vmap(dst_gbo, _map);
+   if (ret)
+   goto err_drm_gem_vram_vunmap;
+   dst = dst_map.vaddr_iomem; /* TODO: Use mapping abstraction properly */
 
/* do data transfer to cursor BO */
update_cursor_image(dst, src, fb->width, fb->height);
 
-   drm_gem_vram_vunmap(gbo, );
+   drm_gem_vram_vunmap(dst_gbo, _map);
+   drm_gem_vram_vunmap(src_gbo, _map);
 
return 0;
+
+err_drm_gem_vram_vunmap:
+   drm_gem_vram_vunmap(src_gbo, _map);
+   return ret;
 }
 
 static void ast_cursor_set_base(struct ast_private *ast, u64 address)
@@ -243,17 +239,26 @@ static void ast_cursor_set_location(struct ast_private 
*ast, u16 x, u16 y,
 void ast_cursor_show(struct ast_private *ast, int x, int y,
 unsigned int offset_x, unsigned int offset_y)
 {
+   struct drm_device *dev = >base;
+   struct drm_gem_vram_object *gbo = 
ast->cursor.gbo[ast->cursor.next_index];
+   struct dma_buf_map map;
u8 x_offset, y_offset;
u8 __iomem *dst;
u8 __iomem *sig;
u8 jreg;
+   int ret;
 
-   dst = ast->cursor.map[ast->cursor.next_index].vaddr;
+   ret = 

[PATCH v3 4/8] drm/gem: Create infrastructure for GEM vmap_local

2020-12-09 Thread Thomas Zimmermann
This patch adds vmap_local and vunmap_local to struct drm_gem_object_funcs;
including the PRIME helpers to connect with dma-buf's related interfaces.

Besides the generic DRM core, this will become relevant for fbdev emulation
with virtio, so we update it as well.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/drm_gem.c  | 28 ++
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_prime.c| 39 ++
 drivers/gpu/drm/virtio/virtgpu_prime.c |  2 ++
 include/drm/drm_gem.h  | 20 +
 include/drm/drm_prime.h|  2 ++
 6 files changed, 93 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 92f89cee213e..6e131d9bb7bd 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1234,6 +1234,34 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct 
dma_buf_map *map)
dma_buf_map_clear(map);
 }
 
+int drm_gem_vmap_local(struct drm_gem_object *obj, struct dma_buf_map *map)
+{
+   int ret;
+
+   if (!obj->funcs->vmap_local)
+   return -EOPNOTSUPP;
+
+   ret = obj->funcs->vmap_local(obj, map);
+   if (ret)
+   return ret;
+   else if (dma_buf_map_is_null(map))
+   return -ENOMEM;
+
+   return 0;
+}
+
+void drm_gem_vunmap_local(struct drm_gem_object *obj, struct dma_buf_map *map)
+{
+   if (dma_buf_map_is_null(map))
+   return;
+
+   if (obj->funcs->vunmap_local)
+   obj->funcs->vunmap_local(obj, map);
+
+   /* Always set the mapping to NULL. Callers may rely on this. */
+   dma_buf_map_clear(map);
+}
+
 /**
  * drm_gem_lock_reservations - Sets up the ww context and acquires
  * the lock on an array of GEM objects.
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 81d386b5b92a..b0bf6aba763a 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -190,6 +190,8 @@ int drm_gem_pin(struct drm_gem_object *obj);
 void drm_gem_unpin(struct drm_gem_object *obj);
 int drm_gem_vmap(struct drm_gem_object *obj, struct dma_buf_map *map);
 void drm_gem_vunmap(struct drm_gem_object *obj, struct dma_buf_map *map);
+int drm_gem_vmap_local(struct drm_gem_object *obj, struct dma_buf_map *map);
+void drm_gem_vunmap_local(struct drm_gem_object *obj, struct dma_buf_map *map);
 
 /* drm_debugfs.c drm_debugfs_crc.c */
 #if defined(CONFIG_DEBUG_FS)
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 683aa29ecd3b..633edea76985 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -695,6 +695,43 @@ void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, struct 
dma_buf_map *map)
 }
 EXPORT_SYMBOL(drm_gem_dmabuf_vunmap);
 
+/**
+ * drm_gem_dmabuf_vmap_local - dma_buf vmap_local implementation for GEM
+ * @dma_buf: buffer to be mapped
+ * @map: the virtual address of the buffer
+ *
+ * Sets up a kernel virtual mapping. This can be used as the 
_buf_ops.vmap_local
+ * callback. Calls into _gem_object_funcs.vmap_local for device specific 
handling.
+ * The kernel virtual address is returned in map.
+ *
+ * Returns:
+ * 0 on success or a negative errno code otherwise.
+ */
+int drm_gem_dmabuf_vmap_local(struct dma_buf *dma_buf, struct dma_buf_map *map)
+{
+   struct drm_gem_object *obj = dma_buf->priv;
+
+   return drm_gem_vmap_local(obj, map);
+}
+EXPORT_SYMBOL(drm_gem_dmabuf_vmap_local);
+
+/**
+ * drm_gem_dmabuf_vunmap_local - dma_buf vunmap_local implementation for GEM
+ * @dma_buf: buffer to be unmapped
+ * @map: the virtual address of the buffer
+ *
+ * Releases a kernel virtual mapping. This can be used as the
+ * _buf_ops.vunmap_local callback. Calls into 
_gem_object_funcs.vunmap_local
+ * for device specific handling.
+ */
+void drm_gem_dmabuf_vunmap_local(struct dma_buf *dma_buf, struct dma_buf_map 
*map)
+{
+   struct drm_gem_object *obj = dma_buf->priv;
+
+   drm_gem_vunmap_local(obj, map);
+}
+EXPORT_SYMBOL(drm_gem_dmabuf_vunmap_local);
+
 /**
  * drm_gem_prime_mmap - PRIME mmap function for GEM drivers
  * @obj: GEM object
@@ -787,6 +824,8 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops =  
{
.mmap = drm_gem_dmabuf_mmap,
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
+   .vmap_local = drm_gem_dmabuf_vmap_local,
+   .vunmap_local = drm_gem_dmabuf_vunmap_local,
 };
 
 /**
diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c 
b/drivers/gpu/drm/virtio/virtgpu_prime.c
index 807a27a16365..fea11a53d8fc 100644
--- a/drivers/gpu/drm/virtio/virtgpu_prime.c
+++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
@@ -54,6 +54,8 @@ static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  {
.mmap = drm_gem_dmabuf_mmap,
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
+   .vmap = drm_gem_dmabuf_vmap_local,
+   

[PATCH v3 1/8] drm/ast: Don't pin cursor source BO explicitly during update

2020-12-09 Thread Thomas Zimmermann
Vmapping the cursor source BO contains an implicit pin operation,
so there's no need to do this manually.

Signed-off-by: Thomas Zimmermann 
---
 drivers/gpu/drm/ast/ast_cursor.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_cursor.c b/drivers/gpu/drm/ast/ast_cursor.c
index 742d43a7edf4..68bf3d33f1ed 100644
--- a/drivers/gpu/drm/ast/ast_cursor.c
+++ b/drivers/gpu/drm/ast/ast_cursor.c
@@ -180,12 +180,9 @@ int ast_cursor_blit(struct ast_private *ast, struct 
drm_framebuffer *fb)
 
gbo = drm_gem_vram_of_gem(fb->obj[0]);
 
-   ret = drm_gem_vram_pin(gbo, 0);
-   if (ret)
-   return ret;
ret = drm_gem_vram_vmap(gbo, );
if (ret)
-   goto err_drm_gem_vram_unpin;
+   return ret;
src = map.vaddr; /* TODO: Use mapping abstraction properly */
 
dst = ast->cursor.map[ast->cursor.next_index].vaddr_iomem;
@@ -194,13 +191,8 @@ int ast_cursor_blit(struct ast_private *ast, struct 
drm_framebuffer *fb)
update_cursor_image(dst, src, fb->width, fb->height);
 
drm_gem_vram_vunmap(gbo, );
-   drm_gem_vram_unpin(gbo);
 
return 0;
-
-err_drm_gem_vram_unpin:
-   drm_gem_vram_unpin(gbo);
-   return ret;
 }
 
 static void ast_cursor_set_base(struct ast_private *ast, u64 address)
-- 
2.29.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-09 Thread Jürgen Groß via Virtualization

On 09.12.20 15:02, Mark Rutland wrote:

On Wed, Dec 09, 2020 at 01:27:10PM +, Mark Rutland wrote:

On Sun, Nov 22, 2020 at 01:44:53PM -0800, Andy Lutomirski wrote:

On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß  wrote:

On 20.11.20 12:59, Peter Zijlstra wrote:

If someone were to write horrible code like:

   local_irq_disable();
   local_irq_save(flags);
   local_irq_enable();
   local_irq_restore(flags);

we'd be up some creek without a paddle... now I don't _think_ we have
genius code like that, but I'd feel saver if we can haz an assertion in
there somewhere...



I was just talking to Peter on IRC about implementing the same thing for
arm64, so could we put this in the generic irqflags code? IIUC we can
use raw_irqs_disabled() to do the check.

As this isn't really entry specific (and IIUC the cases this should
catch would break lockdep today), maybe we should add a new
DEBUG_IRQFLAGS for this, that DEBUG_LOCKDEP can also select?

Something like:

#define local_irq_restore(flags)   \
do {\
if (!raw_irqs_disabled_flags(flags)) {  \
trace_hardirqs_on();\
} else if (IS_ENABLED(CONFIG_DEBUG_IRQFLAGS) {  \
if (unlikely(raw_irqs_disabled())   \


Whoops; that should be !raw_irqs_disabled().


warn_bogus_irqrestore();\
}   \
raw_local_irq_restore(flags);   \
 } while (0)

... perhaps? (ignoring however we deal with once-ness).


If no-one shouts in the next day or two I'll spin this as its own patch.


Fine with me. So I'll just ignore a potential error case in my patch.

Thanks,


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-09 Thread Mark Rutland
On Wed, Dec 09, 2020 at 01:27:10PM +, Mark Rutland wrote:
> On Sun, Nov 22, 2020 at 01:44:53PM -0800, Andy Lutomirski wrote:
> > On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß  wrote:
> > > On 20.11.20 12:59, Peter Zijlstra wrote:
> > > > If someone were to write horrible code like:
> > > >
> > > >   local_irq_disable();
> > > >   local_irq_save(flags);
> > > >   local_irq_enable();
> > > >   local_irq_restore(flags);
> > > >
> > > > we'd be up some creek without a paddle... now I don't _think_ we have
> > > > genius code like that, but I'd feel saver if we can haz an assertion in
> > > > there somewhere...

> I was just talking to Peter on IRC about implementing the same thing for
> arm64, so could we put this in the generic irqflags code? IIUC we can
> use raw_irqs_disabled() to do the check.
> 
> As this isn't really entry specific (and IIUC the cases this should
> catch would break lockdep today), maybe we should add a new
> DEBUG_IRQFLAGS for this, that DEBUG_LOCKDEP can also select?
> 
> Something like:
> 
> #define local_irq_restore(flags)   \
>do {\
>if (!raw_irqs_disabled_flags(flags)) {  \
>trace_hardirqs_on();\
>} else if (IS_ENABLED(CONFIG_DEBUG_IRQFLAGS) {  \
>if (unlikely(raw_irqs_disabled())   \

Whoops; that should be !raw_irqs_disabled().

>warn_bogus_irqrestore();\
>}   \
>raw_local_irq_restore(flags);   \
> } while (0)
> 
> ... perhaps? (ignoring however we deal with once-ness).

If no-one shouts in the next day or two I'll spin this as its own patch.

Mark.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 05/12] x86: rework arch_local_irq_restore() to not use popf

2020-12-09 Thread Mark Rutland
On Sun, Nov 22, 2020 at 01:44:53PM -0800, Andy Lutomirski wrote:
> On Sat, Nov 21, 2020 at 10:55 PM Jürgen Groß  wrote:
> >
> > On 20.11.20 12:59, Peter Zijlstra wrote:
> > > On Fri, Nov 20, 2020 at 12:46:23PM +0100, Juergen Gross wrote:
> > >> +static __always_inline void arch_local_irq_restore(unsigned long flags)
> > >> +{
> > >> +if (!arch_irqs_disabled_flags(flags))
> > >> +arch_local_irq_enable();
> > >> +}
> > >
> > > If someone were to write horrible code like:
> > >
> > >   local_irq_disable();
> > >   local_irq_save(flags);
> > >   local_irq_enable();
> > >   local_irq_restore(flags);
> > >
> > > we'd be up some creek without a paddle... now I don't _think_ we have
> > > genius code like that, but I'd feel saver if we can haz an assertion in
> > > there somewhere...
> > >
> > > Maybe something like:
> > >
> > > #ifdef CONFIG_DEBUG_ENTRY // for lack of something saner
> > >   WARN_ON_ONCE((arch_local_save_flags() ^ flags) & X86_EFLAGS_IF);
> > > #endif
> > >
> > > At the end?
> >
> > I'd like to, but using WARN_ON_ONCE() in include/asm/irqflags.h sounds
> > like a perfect receipt for include dependency hell.
> >
> > We could use a plain asm("ud2") instead.
> 
> How about out-of-lining it:
> 
> #ifdef CONFIG_DEBUG_ENTRY
> extern void warn_bogus_irqrestore();
> #endif
> 
> static __always_inline void arch_local_irq_restore(unsigned long flags)
> {
>if (!arch_irqs_disabled_flags(flags)) {
>arch_local_irq_enable();
>} else {
> #ifdef CONFIG_DEBUG_ENTRY
>if (unlikely(arch_local_irq_save() & X86_EFLAGS_IF))
> warn_bogus_irqrestore();
> #endif
> }

I was just talking to Peter on IRC about implementing the same thing for
arm64, so could we put this in the generic irqflags code? IIUC we can
use raw_irqs_disabled() to do the check.

As this isn't really entry specific (and IIUC the cases this should
catch would break lockdep today), maybe we should add a new
DEBUG_IRQFLAGS for this, that DEBUG_LOCKDEP can also select?

Something like:

#define local_irq_restore(flags)   \
   do {\
   if (!raw_irqs_disabled_flags(flags)) {  \
   trace_hardirqs_on();\
   } else if (IS_ENABLED(CONFIG_DEBUG_IRQFLAGS) {  \
   if (unlikely(raw_irqs_disabled())   \
   warn_bogus_irqrestore();\
   }   \
   raw_local_irq_restore(flags);   \
} while (0)

... perhaps? (ignoring however we deal with once-ness).

Thanks,
Mark.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Howto listen to/handle gpio state changes ? Re: [PATCH v2 2/2] drivers: gpio: add virtio-gpio guest driver

2020-12-09 Thread Linus Walleij
On Wed, Dec 9, 2020 at 12:19 PM Arnd Bergmann  wrote:
> On Wed, Dec 9, 2020 at 9:51 AM Linus Walleij  wrote:
> > On Tue, Dec 8, 2020 at 3:07 PM Enrico Weigelt, metux IT consult 
> >  wrote:
>
> > What we need to understand is if your new usecase is an outlier
> > so it is simplest modeled by a "mock" irq_chip or we have to design
> > something new altogether like notifications on changes. I suspect
> > irq_chip would be best because all drivers using GPIOs for interrupts
> > are expecting interrupts, and it would be an enormous task to
> > change them all and really annoying to create a new mechanism
> > on the side.
>
> I would expect the platform abstraction to actually be close enough
> to a chained irqchip that it actually works: the notification should
> come in via vring_interrupt(), which is a normal interrupt handler
> that calls vq->vq.callback(), calling generic_handle_irq() (and
> possibly chained_irq_enter()/chained_irq_exit() around it) like the
> other gpio drivers do should just work here I think, and if it did
> not, then I would expect this to be just a bug in the driver rather
> than something missing in the gpio framework.

Performance/latency-wise that would also be strongly encouraged.

Tglx isn't super-happy about the chained interrupts at times, as they
can create really nasty bugs, but a pure IRQ in fastpath of some
kinde is preferable and intuitive either way.

Yours,
Linus Walleij
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost_net: fix high cpu load when sendmsg fails

2020-12-09 Thread Michael S. Tsirkin
On Wed, Dec 09, 2020 at 07:48:24PM +0800, wangyunjian wrote:
> From: Yunjian Wang 
> 
> Currently we break the loop and wake up the vhost_worker when
> sendmsg fails. When the worker wakes up again, we'll meet the
> same error. This will cause high CPU load. To fix this issue,
> we can skip this description by ignoring the error.
> 
> Signed-off-by: Yunjian Wang 
> ---
>  drivers/vhost/net.c | 24 +---
>  1 file changed, 5 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 531a00d703cd..ac950b1120f5 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -829,14 +829,8 @@ static void handle_tx_copy(struct vhost_net *net, struct 
> socket *sock)
>  
>   /* TODO: Check specific error and bomb out unless ENOBUFS? */
>   err = sock->ops->sendmsg(sock, , len);
> - if (unlikely(err < 0)) {
> - vhost_discard_vq_desc(vq, 1);
> - vhost_net_enable_vq(net, vq);
> - break;
> - }
> - if (err != len)
> - pr_debug("Truncated TX packet: len %d != %zd\n",
> -  err, len);
> + if (unlikely(err < 0 || err != len))
> + vq_err(vq, "Fail to sending packets err : %d, len : 
> %zd\n", err, len);
>  done:
>   vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
>   vq->heads[nvq->done_idx].len = 0;

One of the reasons for sendmsg to fail is ENOBUFS.
In that case for sure we don't want to drop packet.
There could be other transient errors.
Which error did you encounter, specifically?

> @@ -925,19 +919,11 @@ static void handle_tx_zerocopy(struct vhost_net *net, 
> struct socket *sock)
>  
>   /* TODO: Check specific error and bomb out unless ENOBUFS? */
>   err = sock->ops->sendmsg(sock, , len);
> - if (unlikely(err < 0)) {
> - if (zcopy_used) {
> + if (unlikely(err < 0 || err != len)) {
> + if (zcopy_used && err < 0)
>   vhost_net_ubuf_put(ubufs);
> - nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> - % UIO_MAXIOV;
> - }
> - vhost_discard_vq_desc(vq, 1);
> - vhost_net_enable_vq(net, vq);
> - break;
> + vq_err(vq, "Fail to sending packets err : %d, len : 
> %zd\n", err, len);
>   }
> - if (err != len)
> - pr_debug("Truncated TX packet: "
> -  " len %d != %zd\n", err, len);
>   if (!zcopy_used)
>   vhost_add_used_and_signal(>dev, vq, head, 0);
>   else
> -- 
> 2.23.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: Use write memory barrier after updating CQ index

2020-12-09 Thread Michael S. Tsirkin
On Wed, Dec 09, 2020 at 11:38:36AM +0200, Eli Cohen wrote:
> On Wed, Dec 09, 2020 at 03:05:42AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Dec 09, 2020 at 08:58:46AM +0200, Eli Cohen wrote:
> > > On Wed, Dec 09, 2020 at 01:46:22AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Dec 09, 2020 at 08:02:30AM +0200, Eli Cohen wrote:
> > > > > On Tue, Dec 08, 2020 at 04:45:04PM -0500, Michael S. Tsirkin wrote:
> > > > > > On Sun, Dec 06, 2020 at 12:57:19PM +0200, Eli Cohen wrote:
> > > > > > > Make sure to put write memory barrier after updating CQ consumer 
> > > > > > > index
> > > > > > > so the hardware knows that there are available CQE slots in the 
> > > > > > > queue.
> > > > > > > 
> > > > > > > Failure to do this can cause the update of the RX doorbell record 
> > > > > > > to get
> > > > > > > updated before the CQ consumer index resulting in CQ overrun.
> > > > > > > 
> > > > > > > Change-Id: Ib0ae4c118cce524c9f492b32569179f3c1f04cc1
> > > > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported 
> > > > > > > mlx5 devices")
> > > > > > > Signed-off-by: Eli Cohen 
> > > > > > 
> > > > > > Aren't both memory writes?
> > > > > 
> > > > > Not sure what exactly you mean here.
> > > > 
> > > > Both updates are CPU writes into RAM that hardware then reads
> > > > using DMA.
> > > > 
> > > 
> > > You mean why I did not put a memory barrier right after updating the
> > > recieve doorbell record?
> > 
> > Sorry about being unclear.  I just tried to give justification for why
> > dma_wmb seems more appropriate than wmb here. If you need to
> > order memory writes wrt writes to card, that is different, but generally
> > writeX and friends will handle the ordering for you, except when
> > using relaxed memory mappings - then wmb is generally necessary.
> > 
> 
> Bear in mind, we're writing to memory (not io memory). In this case, we
> want this write to be visible my the DMA device.
> 
> https://www.kernel.org/doc/Documentation/memory-barriers.txt gives a
> similar example using dma_wmb() to flush updates to make them visible
> by the hardware before notifying the hardware to come and inspect this
> memory.

Exactly.

> 
> > > I thought about this and I think it is not required. Suppose it takes a
> > > very long time till the hardware can actually see this update. The worst
> > > effect would be that the hardware will drop received packets if it does
> > > sees none available due to the delayed update. Eventually it will see
> > > the update and will continue working.
> > > 
> > > If I put a memory barrier, I put some delay waiting for the CPU to flush
> > > the write before continuing. I tried both options while checking packet
> > > rate on couldn't see noticable difference in either case.
> > 
> > 
> > makes sense.
> > 
> > > > > > And given that, isn't dma_wmb() sufficient here?
> > > > > 
> > > > > I agree that dma_wmb() is more appropriate here.
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > > ---
> > > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
> > > > > > >  1 file changed, 5 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 1f4089c6f9d7..295f46eea2a5 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -478,6 +478,11 @@ static int mlx5_vdpa_poll_one(struct 
> > > > > > > mlx5_vdpa_cq *vcq)
> > > > > > >  static void mlx5_vdpa_handle_completions(struct 
> > > > > > > mlx5_vdpa_virtqueue *mvq, int num)
> > > > > > >  {
> > > > > > >   mlx5_cq_set_ci(>cq.mcq);
> > > > > > > +
> > > > > > > + /* make sure CQ cosumer update is visible to the hardware 
> > > > > > > before updating
> > > > > > > +  * RX doorbell record.
> > > > > > > +  */
> > > > > > > + wmb();
> > > > > > >   rx_post(>vqqp, num);
> > > > > > >   if (mvq->event_cb.callback)
> > > > > > >   mvq->event_cb.callback(mvq->event_cb.private);
> > > > > > > -- 
> > > > > > > 2.27.0
> > > > > > 
> > > > 
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 05/19] vdpa_sim: remove the limit of IOTLB entries

2020-12-09 Thread Stefano Garzarella

On Mon, Dec 07, 2020 at 12:00:07PM +0800, Jason Wang wrote:


On 2020/12/4 上午1:04, Stefano Garzarella wrote:

The simulated devices can support multiple queues, so this limit
should be defined according to the number of queues supported by
the device.

Since we are in a simulator, let's simply remove that limit.

Suggested-by: Jason Wang 
Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 



Rethink about this, since simulator can be used by VM, so the 
allocation is actually guest trigger-able when vIOMMU is enabled.


This means we need a limit somehow, (e.g I remember swiotlb is about 
64MB by default). Or having a module parameter for this.


Btw, have you met any issue when using 2048, I guess it can happen 
when we run several processes in parallel?




No, I didn't try with the limit.
This came from the reviews to Max's patches.

Anyway I can add a module parameter to control that limit, do you think 
is better to set a limit per queue (the parameter per number of queues), 
or just a value for the entire device?


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 2/2] drivers: gpio: add virtio-gpio guest driver

2020-12-09 Thread Jason Wang


On 2020/12/8 下午3:02, Enrico Weigelt, metux IT consult wrote:

On 08.12.20 03:36, Jason Wang wrote:

Hi,


So we endup with two solutions (without a prompt):

1) using select, user may end up with driver without transport

IMHO not an entirely unusual situation in other places of the kernel,
eg. one can enable USB devices, w/o having an usb host adapter enabled.

And even if some USB-HA driver is enabled, the actualy machine doesn't
necessarily have the corresponding device.



Ok, then select works for me.





2) using depends, user need to enable at least one transport

2) looks a little bit better I admit.

So, all virtio devices should depend on TRANSPORT_A || TRANSPORT_B ||
TRANSPORT_C || ... ? (and also change all these places if another
transport is added) ?



I think not. The idea is, if none of the transport (select VIRTIO) is 
enabled, user can not enable any virtio drivers (depends on VIRTIO).


Thanks




--mtx



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next] tun: fix ubuf refcount incorrectly on error path

2020-12-09 Thread Jason Wang


On 2020/12/8 上午10:32, Jason Wang wrote:


On 2020/12/7 下午9:38, wangyunjian wrote:

I think the newly added code is easy to miss this problem, so I want to
copy ubuf_info until we're sure there's no errors.

Thanks,
Yunjian



But isn't this actually a disabling of zerocopy?

Thanks




Sorry, I misread the patch.

Please send a formal version, and let's move the discussion there.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 00/27] vDPA software assisted live migration

2020-12-09 Thread Jason Wang
- Original Message -
> On Fri, Nov 20, 2020 at 07:50:38PM +0100, Eugenio Pérez wrote:
> > This series enable vDPA software assisted live migration for vhost-net
> > devices. This is a new method of vhost devices migration: Instead of
> > relay on vDPA device's dirty logging capability, SW assisted LM
> > intercepts dataplane, forwarding the descriptors between VM and device.
> 
> Pros:
> + vhost/vDPA devices don't need to implement dirty memory logging
> + Obsoletes ioctl(VHOST_SET_LOG_BASE) and friends
> 
> Cons:
> - Not generic, relies on vhost-net-specific ioctls
> - Doesn't support VIRTIO Shared Memory Regions
>   https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex

I may miss something but my understanding is that it's the
responsiblity of device to migrate this part?

> - Performance (see below)
> 
> I think performance will be significantly lower when the shadow vq is
> enabled. Imagine a vDPA device with hardware vq doorbell registers
> mapped into the guest so the guest driver can directly kick the device.
> When the shadow vq is enabled a vmexit is needed to write to the shadow
> vq ioeventfd, then the host kernel scheduler switches to a QEMU thread
> to read the ioeventfd, the descriptors are translated, QEMU writes to
> the vhost hdev kick fd, the host kernel scheduler switches to the vhost
> worker thread, vhost/vDPA notifies the virtqueue, and finally the
> vDPA driver writes to the hardware vq doorbell register. That is a lot
> of overhead compared to writing to an exitless MMIO register!

I think it's a balance. E.g we can poll the virtqueue to have an
exitless doorbell.

> 
> If the shadow vq was implemented in drivers/vhost/ and QEMU used the
> existing ioctl(VHOST_SET_LOG_BASE) approach, then the overhead would be
> reduced to just one set of ioeventfd/irqfd. In other words, the QEMU
> dirty memory logging happens asynchronously and isn't in the dataplane.
> 
> In addition, hardware that supports dirty memory logging as well as
> software vDPA devices could completely eliminate the shadow vq for even
> better performance.

Yes. That's our plan. But the interface might require more thought.

E.g is the bitmap a good approach? To me reporting dirty pages via
virqueue is better since it get less footprint and is self throttled.

And we need an address space other than the one used by guest for
either bitmap for virtqueue.

> 
> But performance is a question of "is it good enough?". Maybe this
> approach is okay and users don't expect good performance while dirty
> memory logging is enabled.

Yes, and actually such slow down may help for the converge of the
migration.

Note that the whole idea is try to have a generic solution for all
types of devices. It's good to consider the performance but for the
first stage, it should be sufficient to make it work and consider to
optimize on top.

> I just wanted to share the idea of moving the
> shadow vq into the kernel in case you like that approach better.

My understanding is to keep kernel as simple as possible and leave the
polices to userspace as much as possible. E.g it requires us to
disable doorbell mapping and irq offloading, all of which were under
the control of userspace.

Thanks

> 
> Stefan
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Howto listen to/handle gpio state changes ? Re: [PATCH v2 2/2] drivers: gpio: add virtio-gpio guest driver

2020-12-09 Thread Linus Walleij
On Tue, Dec 8, 2020 at 3:07 PM Enrico Weigelt, metux IT consult
 wrote:

> I've been looking for some more direct notification callback for gpio
> consumers: here the consumer would register itself as a listener on
> some gpio_desc and called back when something changes (with data what
> exactly changed, eg. "gpio #3 input switched to high").
>
> Seems we currently just have the indirect path via interrupts.

I don't know how indirect it is, it seems pretty direct to me. The subsystem
was designed in response to how the hardware in front of the developers
worked.

So far we have had:
- Cascaded interrupts
- Dedicated (hieararchical) interrupts
- Message Signalled Interrupts

And if you now bring something else to the show then it's not like the
subsystem was designed for some abstract quality such as
generic notification of events that occurred, all practical instances
have been around actual IRQs and that is why it is using
struct irq_chip.

What we need to understand is if your new usecase is an outlier
so it is simplest modeled by a "mock" irq_chip or we have to design
something new altogether like notifications on changes. I suspect
irq_chip would be best because all drivers using GPIOs for interrupts
are expecting interrupts, and it would be an enormous task to
change them all and really annoying to create a new mechanism
on the side.

Yours,
Linus Walleij
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RESEND v2] virtio-input: add multi-touch support

2020-12-09 Thread Michael S. Tsirkin
On Tue, Dec 08, 2020 at 11:01:50PM +0200, Vasyl Vavrychuk wrote:
> From: Mathias Crombez 
> Cc: sta...@vger.kernel.org

I don't believe this is appropriate for stable, looks like
a new feature to me.


> 
> Without multi-touch slots allocated, ABS_MT_SLOT events will be lost by
> input_handle_abs_event.
> 
> Signed-off-by: Mathias Crombez 
> Signed-off-by: Vasyl Vavrychuk 
> Tested-by: Vasyl Vavrychuk 
> ---
> v2: fix patch corrupted by corporate email server
> 
>  drivers/virtio/Kconfig| 11 +++
>  drivers/virtio/virtio_input.c |  8 
>  2 files changed, 19 insertions(+)
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 7b41130d3f35..2cfd5b01d96d 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -111,6 +111,17 @@ config VIRTIO_INPUT
>  
>If unsure, say M.
>  
> +config VIRTIO_INPUT_MULTITOUCH_SLOTS
> + depends on VIRTIO_INPUT
> + int "Number of multitouch slots"
> + range 0 64
> + default 10
> + help
> +  Define the number of multitouch slots used. Default to 10.
> +  This parameter is unused if there is no multitouch capability.
> +
> +  0 will disable the feature.
> +

Most people won't be using this config so the defaults matter. So why 10? 10 
fingers?

And where does 64 come from?


>  config VIRTIO_MMIO
>   tristate "Platform bus driver for memory mapped virtio devices"
>   depends on HAS_IOMEM && HAS_DMA
> diff --git a/drivers/virtio/virtio_input.c b/drivers/virtio/virtio_input.c
> index f1f6208edcf5..13f3d90e6c30 100644
> --- a/drivers/virtio/virtio_input.c
> +++ b/drivers/virtio/virtio_input.c
> @@ -7,6 +7,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  struct virtio_input {
>   struct virtio_device   *vdev;
> @@ -205,6 +206,7 @@ static int virtinput_probe(struct virtio_device *vdev)
>   unsigned long flags;
>   size_t size;
>   int abs, err;
> + bool is_mt = false;
>  
>   if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1))
>   return -ENODEV;
> @@ -287,9 +289,15 @@ static int virtinput_probe(struct virtio_device *vdev)
>   for (abs = 0; abs < ABS_CNT; abs++) {
>   if (!test_bit(abs, vi->idev->absbit))
>   continue;
> + if (input_is_mt_value(abs))
> + is_mt = true;
>   virtinput_cfg_abs(vi, abs);
>   }
>   }
> + if (is_mt)
> + input_mt_init_slots(vi->idev,
> + CONFIG_VIRTIO_INPUT_MULTITOUCH_SLOTS,
> + INPUT_MT_DIRECT);


Do we need the number in config space maybe? And maybe with a feature
bit so host can find out whether guest supports MT?

>  
>   virtio_device_ready(vdev);
>   vi->ready = true;
> -- 
> 2.23.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: Use write memory barrier after updating CQ index

2020-12-09 Thread Michael S. Tsirkin
On Wed, Dec 09, 2020 at 08:58:46AM +0200, Eli Cohen wrote:
> On Wed, Dec 09, 2020 at 01:46:22AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Dec 09, 2020 at 08:02:30AM +0200, Eli Cohen wrote:
> > > On Tue, Dec 08, 2020 at 04:45:04PM -0500, Michael S. Tsirkin wrote:
> > > > On Sun, Dec 06, 2020 at 12:57:19PM +0200, Eli Cohen wrote:
> > > > > Make sure to put write memory barrier after updating CQ consumer index
> > > > > so the hardware knows that there are available CQE slots in the queue.
> > > > > 
> > > > > Failure to do this can cause the update of the RX doorbell record to 
> > > > > get
> > > > > updated before the CQ consumer index resulting in CQ overrun.
> > > > > 
> > > > > Change-Id: Ib0ae4c118cce524c9f492b32569179f3c1f04cc1
> > > > > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
> > > > > devices")
> > > > > Signed-off-by: Eli Cohen 
> > > > 
> > > > Aren't both memory writes?
> > > 
> > > Not sure what exactly you mean here.
> > 
> > Both updates are CPU writes into RAM that hardware then reads
> > using DMA.
> > 
> 
> You mean why I did not put a memory barrier right after updating the
> recieve doorbell record?

Sorry about being unclear.  I just tried to give justification for why
dma_wmb seems more appropriate than wmb here. If you need to
order memory writes wrt writes to card, that is different, but generally
writeX and friends will handle the ordering for you, except when
using relaxed memory mappings - then wmb is generally necessary.

> I thought about this and I think it is not required. Suppose it takes a
> very long time till the hardware can actually see this update. The worst
> effect would be that the hardware will drop received packets if it does
> sees none available due to the delayed update. Eventually it will see
> the update and will continue working.
> 
> If I put a memory barrier, I put some delay waiting for the CPU to flush
> the write before continuing. I tried both options while checking packet
> rate on couldn't see noticable difference in either case.


makes sense.

> > > > And given that, isn't dma_wmb() sufficient here?
> > > 
> > > I agree that dma_wmb() is more appropriate here.
> > > 
> > > > 
> > > > 
> > > > > ---
> > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
> > > > >  1 file changed, 5 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 1f4089c6f9d7..295f46eea2a5 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -478,6 +478,11 @@ static int mlx5_vdpa_poll_one(struct 
> > > > > mlx5_vdpa_cq *vcq)
> > > > >  static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue 
> > > > > *mvq, int num)
> > > > >  {
> > > > >   mlx5_cq_set_ci(>cq.mcq);
> > > > > +
> > > > > + /* make sure CQ cosumer update is visible to the hardware 
> > > > > before updating
> > > > > +  * RX doorbell record.
> > > > > +  */
> > > > > + wmb();
> > > > >   rx_post(>vqqp, num);
> > > > >   if (mvq->event_cb.callback)
> > > > >   mvq->event_cb.callback(mvq->event_cb.private);
> > > > > -- 
> > > > > 2.27.0
> > > > 
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization