Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ

2022-02-23 Thread Jason Wang
On Wed, Feb 23, 2022 at 4:06 PM Eugenio Perez Martin
 wrote:
>
> On Wed, Feb 23, 2022 at 4:47 AM Jason Wang  wrote:
> >
> > On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Tue, Feb 22, 2022 at 8:41 AM Jason Wang  wrote:
> > > >
> > > >
> > > > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang  
> > > > > wrote:
> > > > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > > > >>  wrote:
> > > > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang  
> > > > >>> wrote:
> > > > 
> > > >  在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > > > On Sun, Jan 30, 2022 at 7:50 AM Jason Wang  
> > > > > wrote:
> > > > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > > >>> SVQ is able to log the dirty bits by itself, so let's use it to 
> > > > >>> not
> > > > >>> block migration.
> > > > >>>
> > > > >>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features 
> > > > >>> if SVQ is
> > > > >>> enabled. Even if the device supports it, the reports would be 
> > > > >>> nonsense
> > > > >>> because SVQ memory is in the qemu region.
> > > > >>>
> > > > >>> The log region is still allocated. Future changes might skip 
> > > > >>> that, but
> > > > >>> this series is already long enough.
> > > > >>>
> > > > >>> Signed-off-by: Eugenio Pérez 
> > > > >>> ---
> > > > >>> hw/virtio/vhost-vdpa.c | 20 
> > > > >>> 1 file changed, 20 insertions(+)
> > > > >>>
> > > > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > >>> index fb0a338baa..75090d65e8 100644
> > > > >>> --- a/hw/virtio/vhost-vdpa.c
> > > > >>> +++ b/hw/virtio/vhost-vdpa.c
> > > > >>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct 
> > > > >>> vhost_dev *dev, uint64_t *features)
> > > > >>> if (ret == 0 && v->shadow_vqs_enabled) {
> > > > >>> /* Filter only features that SVQ can offer to guest 
> > > > >>> */
> > > > >>> vhost_svq_valid_guest_features(features);
> > > > >>> +
> > > > >>> +/* Add SVQ logging capabilities */
> > > > >>> +*features |= BIT_ULL(VHOST_F_LOG_ALL);
> > > > >>> }
> > > > >>>
> > > > >>> return ret;
> > > > >>> @@ -1039,8 +1042,25 @@ static int 
> > > > >>> vhost_vdpa_set_features(struct vhost_dev *dev,
> > > > >>>
> > > > >>> if (v->shadow_vqs_enabled) {
> > > > >>> uint64_t dev_features, svq_features, acked_features;
> > > > >>> +uint8_t status = 0;
> > > > >>> bool ok;
> > > > >>>
> > > > >>> +ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, 
> > > > >>> );
> > > > >>> +if (unlikely(ret)) {
> > > > >>> +return ret;
> > > > >>> +}
> > > > >>> +
> > > > >>> +if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > >>> +/*
> > > > >>> + * vhost is trying to enable or disable _F_LOG, 
> > > > >>> and the device
> > > > >>> + * would report wrong dirty pages. SVQ handles it.
> > > > >>> + */
> > > > >> I fail to understand this comment, I'd think there's no way to 
> > > > >> disable
> > > > >> dirty page tracking for SVQ.
> > > > >>
> > > > > vhost_log_global_{start,stop} are called at the beginning and end 
> > > > > of
> > > > > migration. To inform the device that it should start logging, 
> > > > > they set
> > > > > or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > > > 
> > > >  Yes, but for SVQ, we can't disable dirty page tracking, isn't it? 
> > > >  The
> > > >  only thing is to ignore or filter out the F_LOG_ALL and pretend to 
> > > >  be
> > > >  enabled and disabled.
> > > > 
> > > > >>> Yes, that's what this patch does.
> > > > >>>
> > > > > While SVQ does not use VHOST_F_LOG_ALL, it exports the feature 
> > > > > bit so
> > > > > vhost does not block migration. Maybe we need to look for another 
> > > > > way
> > > > > to do this?
> > > > 
> > > >  I'm fine with filtering since it's much more simpler, but I fail to
> > > >  understand why we need to check DRIVER_OK.
> > > > 
> > > > >>> Ok maybe I can make that part more clear,
> > > > >>>
> > > > >>> Since both operations use vhost_vdpa_set_features we must just 
> > > > >>> filter
> > > > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > > > >>> affecting other features.
> > > > >>>
> > > > >>> In practice, that means to not forward the set features after
> > > > >>> DRIVER_OK. The device is not expecting them anymore.
> > > > >> I wonder what happens if we don't do this.
> > > > >>
> > > > > If we simply delete the check vhost_dev_set_features will return an
> > > > > error, failing the 

Re: [RFC PATCH 1/5] uapi/linux/if_tun.h: Added new ioctl for tun/tap.

2022-02-23 Thread Jason Wang
On Wed, Feb 23, 2022 at 9:31 PM Yuri Benditovich
 wrote:
>
> Hi Jason,
> We agree that the same can be done also using the old way, i.e. try to
> set specific offload - if failed, probably it is not supported.
> We think this is a little not scalable and we suggest adding the ioctl
> that will allow us to query allo the supported features in a single
> call.

Possibly but then you need some kind of probing. E.g we need endup
with probing TUNGETSUPPORTEDOFFLOADS iotctl itself.

> We think this will make QEMU code more simple also in future.

We can discuss this when qemu patches were sent.

> Do I understand correctly that you suggest to skip this new ioctl and
> use the old way of query for this (USO) feature and all future
> extensions?

Yes, since it's not a must. And we can do the TUNGETSUPPORTEDOFFLOADS
in a separate series.

Thanks

>
> Thanks
>
>
> On Wed, Feb 23, 2022 at 5:53 AM Jason Wang  wrote:
> >
> > On Tue, Feb 22, 2022 at 9:28 PM Andrew Melnichenko  
> > wrote:
> > >
> > > Hi all,
> > >
> > > On Wed, Feb 9, 2022 at 6:26 AM Jason Wang  wrote:
> > > >
> > > >
> > > > 在 2022/1/25 下午4:46, Andrew Melnychenko 写道:
> > > > > Added TUNGETSUPPORTEDOFFLOADS that should allow
> > > > > to get bits of supported offloads.
> > > >
> > > >
> > > > So we don't use dedicated ioctls in the past, instead, we just probing
> > > > by checking the return value of TUNSETOFFLOADS.
> > > >
> > > > E.g qemu has the following codes:
> > > >
> > > > int tap_probe_has_ufo(int fd)
> > > > {
> > > >  unsigned offload;
> > > >
> > > >  offload = TUN_F_CSUM | TUN_F_UFO;
> > > >
> > > >  if (ioctl(fd, TUNSETOFFLOAD, offload) < 0)
> > > >  return 0;
> > > >
> > > >  return 1;
> > > > }
> > > >
> > > > Any reason we can't keep using that?
> > > >
> > > > Thanks
> > > >
> > >
> > > Well, even in this example. To check the ufo feature, we trying to set it.
> > > What if we don't need to "enable" UFO and/or do not change its state?
> >
> > So at least Qemu doesn't have such a requirement since during the
> > probe the virtual networking backend is not even started.
> >
> > > I think it's a good idea to have the ability to get supported offloads
> > > without changing device behavior.
> >
> > Do you see a real user for this?
> >
> > Btw, we still need to probe this new ioctl anyway.
> >
> > Thanks
> >
> > >
> > > >
> > > > > Added 2 additional offlloads for USO(IPv4 & IPv6).
> > > > > Separate offloads are required for Windows VM guests,
> > > > > g.e. Windows may set USO rx only for IPv4.
> > > > >
> > > > > Signed-off-by: Andrew Melnychenko 
> > > > > ---
> > > > >   include/uapi/linux/if_tun.h | 3 +++
> > > > >   1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> > > > > index 454ae31b93c7..07680fae6e18 100644
> > > > > --- a/include/uapi/linux/if_tun.h
> > > > > +++ b/include/uapi/linux/if_tun.h
> > > > > @@ -61,6 +61,7 @@
> > > > >   #define TUNSETFILTEREBPF _IOR('T', 225, int)
> > > > >   #define TUNSETCARRIER _IOW('T', 226, int)
> > > > >   #define TUNGETDEVNETNS _IO('T', 227)
> > > > > +#define TUNGETSUPPORTEDOFFLOADS _IOR('T', 228, unsigned int)
> > > > >
> > > > >   /* TUNSETIFF ifr flags */
> > > > >   #define IFF_TUN 0x0001
> > > > > @@ -88,6 +89,8 @@
> > > > >   #define TUN_F_TSO6  0x04/* I can handle TSO for IPv6 packets */
> > > > >   #define TUN_F_TSO_ECN   0x08/* I can handle TSO with ECN 
> > > > > bits. */
> > > > >   #define TUN_F_UFO   0x10/* I can handle UFO packets */
> > > > > +#define TUN_F_USO4   0x20/* I can handle USO for IPv4 packets */
> > > > > +#define TUN_F_USO6   0x40/* I can handle USO for IPv6 packets */
> > > > >
> > > > >   /* Protocol info prepended to the packets (when IFF_NO_PI is not 
> > > > > set) */
> > > > >   #define TUN_PKT_STRIP   0x0001
> > > >
> > >
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 01/11] driver: platform: add and use helper for safer setting of driver_override

2022-02-23 Thread Bjorn Helgaas
On Wed, Feb 23, 2022 at 08:13:00PM +0100, Krzysztof Kozlowski wrote:
> Several core drivers and buses expect that driver_override is a
> dynamically allocated memory thus later they can kfree() it.
> ...

> + * set_driver_override() - Helper to set or clear driver override.

Doesn't match actual function name.

> + * @dev: Device to change
> + * @override: Address of string to change (e.g. >driver_override);
> + *The contents will be freed and hold newly allocated override.
> + * @s: NULL terminated string, new driver name to force a match, pass empty
> + * string to clear it
> + *
> + * Helper to setr or clear driver override in a device, intended for the 
> cases
> + * when the driver_override field is allocated by driver/bus code.

s/setr/set/

> + * Returns: 0 on success or a negative error code on failure.
> + */
> +int driver_set_override(struct device *dev, char **override, const char *s)
> +{
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 05/11] pci: use helper for safer setting of driver_override

2022-02-23 Thread Bjorn Helgaas
In subject, to match drivers/pci/ convention, do something like:

  PCI: Use driver_set_override() instead of open-coding

On Wed, Feb 23, 2022 at 08:13:04PM +0100, Krzysztof Kozlowski wrote:
> Use a helper for seting driver_override to reduce amount of duplicated
> code.

s/seting/setting/

> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/pci/pci-sysfs.c | 24 
>  1 file changed, 4 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 602f0fb0b007..16a163d4623e 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -567,31 +567,15 @@ static ssize_t driver_override_store(struct device *dev,
>const char *buf, size_t count)
>  {
>   struct pci_dev *pdev = to_pci_dev(dev);
> - char *driver_override, *old, *cp;
> + int ret;
>  
>   /* We need to keep extra room for a newline */
>   if (count >= (PAGE_SIZE - 1))
>   return -EINVAL;

This check makes no sense in the new function.  Michael alluded to
this as well.

> - driver_override = kstrndup(buf, count, GFP_KERNEL);
> - if (!driver_override)
> - return -ENOMEM;
> -
> - cp = strchr(driver_override, '\n');
> - if (cp)
> - *cp = '\0';
> -
> - device_lock(dev);
> - old = pdev->driver_override;
> - if (strlen(driver_override)) {
> - pdev->driver_override = driver_override;
> - } else {
> - kfree(driver_override);
> - pdev->driver_override = NULL;
> - }
> - device_unlock(dev);
> -
> - kfree(old);
> + ret = driver_set_override(dev, >driver_override, buf);
> + if (ret)
> + return ret;
>  
>   return count;
>  }
> -- 
> 2.32.0
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 01/11] driver: platform: add and use helper for safer setting of driver_override

2022-02-23 Thread Michael S. Tsirkin
On Wed, Feb 23, 2022 at 08:13:00PM +0100, Krzysztof Kozlowski wrote:
> Several core drivers and buses expect that driver_override is a
> dynamically allocated memory thus later they can kfree() it.
> 
> However such assumption is not documented, there were in the past and
> there are already users setting it to a string literal. This leads to
> kfree() of static memory during device release (e.g. in error paths or
> during unbind):
> 
> kernel BUG at ../mm/slub.c:3960!
> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> ...
> (kfree) from [] (platform_device_release+0x88/0xb4)
> (platform_device_release) from [] (device_release+0x2c/0x90)
> (device_release) from [] (kobject_put+0xec/0x20c)
> (kobject_put) from [] (exynos5_clk_probe+0x154/0x18c)
> (exynos5_clk_probe) from [] (platform_drv_probe+0x6c/0xa4)
> (platform_drv_probe) from [] (really_probe+0x280/0x414)
> (really_probe) from [] (driver_probe_device+0x78/0x1c4)
> (driver_probe_device) from [] (bus_for_each_drv+0x74/0xb8)
> (bus_for_each_drv) from [] (__device_attach+0xd4/0x16c)
> (__device_attach) from [] (bus_probe_device+0x88/0x90)
> (bus_probe_device) from [] (device_add+0x3dc/0x62c)
> (device_add) from [] (of_platform_device_create_pdata+0x94/0xbc)
> (of_platform_device_create_pdata) from [] 
> (of_platform_bus_create+0x1a8/0x4fc)
> (of_platform_bus_create) from [] 
> (of_platform_bus_create+0x20c/0x4fc)
> (of_platform_bus_create) from [] 
> (of_platform_populate+0x84/0x118)
> (of_platform_populate) from [] 
> (of_platform_default_populate_init+0xa0/0xb8)
> (of_platform_default_populate_init) from [] 
> (do_one_initcall+0x8c/0x404)
> (do_one_initcall) from [] (kernel_init_freeable+0x3d0/0x4d8)
> (kernel_init_freeable) from [] (kernel_init+0x8/0x114)
> (kernel_init) from [] (ret_from_fork+0x14/0x20)
> 
> Provide a helper which clearly documents the usage of driver_override.
> This will allow later to reuse the helper and reduce amount of
> duplicated code.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/base/driver.c   | 44 +
>  drivers/base/platform.c | 24 +++---
>  include/linux/device/driver.h   |  1 +
>  include/linux/platform_device.h |  6 -
>  4 files changed, 54 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/base/driver.c b/drivers/base/driver.c
> index 8c0d33e182fd..79efe51bb4c0 100644
> --- a/drivers/base/driver.c
> +++ b/drivers/base/driver.c
> @@ -30,6 +30,50 @@ static struct device *next_device(struct klist_iter *i)
>   return dev;
>  }
>  
> +/*
> + * set_driver_override() - Helper to set or clear driver override.
> + * @dev: Device to change
> + * @override: Address of string to change (e.g. >driver_override);
> + *The contents will be freed and hold newly allocated override.
> + * @s: NULL terminated string, new driver name to force a match, pass empty

Don't you mean NUL terminated?
Do all callers really validate that it's NUL terminated?

> + * string to clear it
> + *
> + * Helper to setr or clear driver override in a device, intended for the 
> cases

set?

> + * when the driver_override field is allocated by driver/bus code.
> + *
> + * Returns: 0 on success or a negative error code on failure.
> + */
> +int driver_set_override(struct device *dev, char **override, const char *s)
> +{
> + char *new, *old, *cp;
> +
> + if (!dev || !override || !s)
> + return -EINVAL;
> +
> + new = kstrndup(s, strlen(s), GFP_KERNEL);


what's the point of this kstrndup then? why not just kstrdup?

> + if (!new)
> + return -ENOMEM;
> +
> + cp = strchr(new, '\n');
> + if (cp)
> + *cp = '\0';
> +
> + device_lock(dev);
> + old = *override;
> + if (strlen(new)) {

We are re-reading the string like 3 times here.

> + *override = new;
> + } else {
> + kfree(new);
> + *override = NULL;
> + }
> + device_unlock(dev);
> +
> + kfree(old);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(driver_set_override);
> +
>  /**
>   * driver_for_each_device - Iterator for devices bound to a driver.
>   * @drv: Driver we're iterating.
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 6cb04ac48bf0..d8853b32ea10 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -1275,31 +1275,15 @@ static ssize_t driver_override_store(struct device 
> *dev,
>const char *buf, size_t count)
>  {
>   struct platform_device *pdev = to_platform_device(dev);
> - char *driver_override, *old, *cp;
> + int ret;
>  
>   /* We need to keep extra room for a newline */
>   if (count >= (PAGE_SIZE - 1))
>   return -EINVAL;

Given everyone seems to repeat this check, how about passing
in count and doing the validation in the helper?
We will then also avoid the need to do 

Re: [PATCH v2 07/11] spi: use helper for safer setting of driver_override

2022-02-23 Thread Mark Brown
On Wed, Feb 23, 2022 at 08:14:37PM +0100, Krzysztof Kozlowski wrote:

> Remove also "const" from the definition of spi_device.driver_override,
> because it is not correct.  The SPI driver already treats it as
> dynamic, not const, memory.

We don't modify the string do we, we just allocate a new one?


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: validate range size before adding to iotlb

2022-02-23 Thread Michael S. Tsirkin
On Wed, Feb 23, 2022 at 10:19:23PM +0530, Anirudh Rayabharam wrote:
> On Wed, Feb 23, 2022 at 10:15:01AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Feb 23, 2022 at 07:48:18PM +0530, Anirudh Rayabharam wrote:
> > > On Tue, Feb 22, 2022 at 06:21:50PM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 22, 2022 at 10:57:41PM +0530, Anirudh Rayabharam wrote:
> > > > > On Tue, Feb 22, 2022 at 10:02:29AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Tue, Feb 22, 2022 at 03:11:07PM +0800, Jason Wang wrote:
> > > > > > > On Tue, Feb 22, 2022 at 12:57 PM Anirudh Rayabharam 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Feb 22, 2022 at 10:50:20AM +0800, Jason Wang wrote:
> > > > > > > > > On Tue, Feb 22, 2022 at 3:53 AM Anirudh Rayabharam 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > In vhost_iotlb_add_range_ctx(), validate the range size is 
> > > > > > > > > > non-zero
> > > > > > > > > > before proceeding with adding it to the iotlb.
> > > > > > > > > >
> > > > > > > > > > Range size can overflow to 0 when start is 0 and last is 
> > > > > > > > > > (2^64 - 1).
> > > > > > > > > > One instance where it can happen is when userspace sends an 
> > > > > > > > > > IOTLB
> > > > > > > > > > message with iova=size=uaddr=0 (vhost_process_iotlb_msg). 
> > > > > > > > > > So, an
> > > > > > > > > > entry with size = 0, start = 0, last = (2^64 - 1) ends up 
> > > > > > > > > > in the
> > > > > > > > > > iotlb. Next time a packet is sent, iotlb_access_ok() loops
> > > > > > > > > > indefinitely due to that erroneous entry:
> > > > > > > > > >
> > > > > > > > > > Call Trace:
> > > > > > > > > >  
> > > > > > > > > >  iotlb_access_ok+0x21b/0x3e0 
> > > > > > > > > > drivers/vhost/vhost.c:1340
> > > > > > > > > >  vq_meta_prefetch+0xbc/0x280 
> > > > > > > > > > drivers/vhost/vhost.c:1366
> > > > > > > > > >  vhost_transport_do_send_pkt+0xe0/0xfd0 
> > > > > > > > > > drivers/vhost/vsock.c:104
> > > > > > > > > >  vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
> > > > > > > > > >  kthread+0x2e9/0x3a0 kernel/kthread.c:377
> > > > > > > > > >  ret_from_fork+0x1f/0x30 
> > > > > > > > > > arch/x86/entry/entry_64.S:295
> > > > > > > > > >  
> > > > > > > > > >
> > > > > > > > > > Reported by syzbot at:
> > > > > > > > > > 
> > > > > > > > > > https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
> > > > > > > > > >
> > > > > > > > > > Reported-by: 
> > > > > > > > > > syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > > > > > > > Tested-by: 
> > > > > > > > > > syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > > > > > > > Signed-off-by: Anirudh Rayabharam 
> > > > > > > > > > ---
> > > > > > > > > >  drivers/vhost/iotlb.c | 6 --
> > > > > > > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> > > > > > > > > > index 670d56c879e5..b9de74bd2f9c 100644
> > > > > > > > > > --- a/drivers/vhost/iotlb.c
> > > > > > > > > > +++ b/drivers/vhost/iotlb.c
> > > > > > > > > > @@ -53,8 +53,10 @@ int vhost_iotlb_add_range_ctx(struct 
> > > > > > > > > > vhost_iotlb *iotlb,
> > > > > > > > > >   void *opaque)
> > > > > > > > > >  {
> > > > > > > > > > struct vhost_iotlb_map *map;
> > > > > > > > > > +   u64 size = last - start + 1;
> > > > > > > > > >
> > > > > > > > > > -   if (last < start)
> > > > > > > > > > +   // size can overflow to 0 when start is 0 and last 
> > > > > > > > > > is (2^64 - 1).
> > > > > > > > > > +   if (last < start || size == 0)
> > > > > > > > > > return -EFAULT;
> > > > > > > > >
> > > > > > > > > I'd move this check to vhost_chr_iter_write(), then for the 
> > > > > > > > > device who
> > > > > > > > > has its own msg handler (e.g vDPA) can benefit from it as 
> > > > > > > > > well.
> > > > > > > >
> > > > > > > > Thanks for reviewing!
> > > > > > > >
> > > > > > > > I kept the check here thinking that all devices would benefit 
> > > > > > > > from it
> > > > > > > > because they would need to call vhost_iotlb_add_range() to add 
> > > > > > > > an entry
> > > > > > > > to the iotlb. Isn't that correct?
> > > > > > > 
> > > > > > > Correct for now but not for the future, it's not guaranteed that 
> > > > > > > the
> > > > > > > per device iotlb message handler will use vhost iotlb.
> > > > > > > 
> > > > > > > But I agree that we probably don't need to care about it too much 
> > > > > > > now.
> > > > > > > 
> > > > > > > > Do you see any other benefit in moving
> > > > > > > > it to vhost_chr_iter_write()?
> > > > > > > >
> > > > > > > > One concern I have is that if we move it out some future caller 
> > > > > > > > to
> > > > > > > > vhost_iotlb_add_range() might forget to handle this case.
> > > > > > > 
> > > > > > > Yes.
> > > > > > > 
> > > > > > > Rethink the whole fix, we're 

Re: [PATCH] vhost: validate range size before adding to iotlb

2022-02-23 Thread Michael S. Tsirkin
On Wed, Feb 23, 2022 at 07:48:18PM +0530, Anirudh Rayabharam wrote:
> On Tue, Feb 22, 2022 at 06:21:50PM -0500, Michael S. Tsirkin wrote:
> > On Tue, Feb 22, 2022 at 10:57:41PM +0530, Anirudh Rayabharam wrote:
> > > On Tue, Feb 22, 2022 at 10:02:29AM -0500, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 22, 2022 at 03:11:07PM +0800, Jason Wang wrote:
> > > > > On Tue, Feb 22, 2022 at 12:57 PM Anirudh Rayabharam 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Feb 22, 2022 at 10:50:20AM +0800, Jason Wang wrote:
> > > > > > > On Tue, Feb 22, 2022 at 3:53 AM Anirudh Rayabharam 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > In vhost_iotlb_add_range_ctx(), validate the range size is 
> > > > > > > > non-zero
> > > > > > > > before proceeding with adding it to the iotlb.
> > > > > > > >
> > > > > > > > Range size can overflow to 0 when start is 0 and last is (2^64 
> > > > > > > > - 1).
> > > > > > > > One instance where it can happen is when userspace sends an 
> > > > > > > > IOTLB
> > > > > > > > message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an
> > > > > > > > entry with size = 0, start = 0, last = (2^64 - 1) ends up in the
> > > > > > > > iotlb. Next time a packet is sent, iotlb_access_ok() loops
> > > > > > > > indefinitely due to that erroneous entry:
> > > > > > > >
> > > > > > > > Call Trace:
> > > > > > > >  
> > > > > > > >  iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
> > > > > > > >  vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
> > > > > > > >  vhost_transport_do_send_pkt+0xe0/0xfd0 
> > > > > > > > drivers/vhost/vsock.c:104
> > > > > > > >  vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
> > > > > > > >  kthread+0x2e9/0x3a0 kernel/kthread.c:377
> > > > > > > >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > > > > > > >  
> > > > > > > >
> > > > > > > > Reported by syzbot at:
> > > > > > > > 
> > > > > > > > https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
> > > > > > > >
> > > > > > > > Reported-by: 
> > > > > > > > syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > > > > > Tested-by: syzbot+0abd373e2e50d704d...@syzkaller.appspotmail.com
> > > > > > > > Signed-off-by: Anirudh Rayabharam 
> > > > > > > > ---
> > > > > > > >  drivers/vhost/iotlb.c | 6 --
> > > > > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> > > > > > > > index 670d56c879e5..b9de74bd2f9c 100644
> > > > > > > > --- a/drivers/vhost/iotlb.c
> > > > > > > > +++ b/drivers/vhost/iotlb.c
> > > > > > > > @@ -53,8 +53,10 @@ int vhost_iotlb_add_range_ctx(struct 
> > > > > > > > vhost_iotlb *iotlb,
> > > > > > > >   void *opaque)
> > > > > > > >  {
> > > > > > > > struct vhost_iotlb_map *map;
> > > > > > > > +   u64 size = last - start + 1;
> > > > > > > >
> > > > > > > > -   if (last < start)
> > > > > > > > +   // size can overflow to 0 when start is 0 and last is 
> > > > > > > > (2^64 - 1).
> > > > > > > > +   if (last < start || size == 0)
> > > > > > > > return -EFAULT;
> > > > > > >
> > > > > > > I'd move this check to vhost_chr_iter_write(), then for the 
> > > > > > > device who
> > > > > > > has its own msg handler (e.g vDPA) can benefit from it as well.
> > > > > >
> > > > > > Thanks for reviewing!
> > > > > >
> > > > > > I kept the check here thinking that all devices would benefit from 
> > > > > > it
> > > > > > because they would need to call vhost_iotlb_add_range() to add an 
> > > > > > entry
> > > > > > to the iotlb. Isn't that correct?
> > > > > 
> > > > > Correct for now but not for the future, it's not guaranteed that the
> > > > > per device iotlb message handler will use vhost iotlb.
> > > > > 
> > > > > But I agree that we probably don't need to care about it too much now.
> > > > > 
> > > > > > Do you see any other benefit in moving
> > > > > > it to vhost_chr_iter_write()?
> > > > > >
> > > > > > One concern I have is that if we move it out some future caller to
> > > > > > vhost_iotlb_add_range() might forget to handle this case.
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > Rethink the whole fix, we're basically rejecting [0, ULONG_MAX] range
> > > > > which seems a little bit odd.
> > > > 
> > > > Well, I guess ideally we'd split this up as two entries - this kind of
> > > > thing is after all one of the reasons we initially used first,last as
> > > > the API - as opposed to first,size.
> > > 
> > > IIUC, the APIs exposed to userspace accept first,size.
> > 
> > Some of them.
> > 
> > 
> > /* vhost vdpa IOVA range
> >  * @first: First address that can be mapped by vhost-vDPA
> >  * @last: Last address that can be mapped by vhost-vDPA
> >  */
> > struct vhost_vdpa_iova_range {
> > __u64 first;
> > __u64 last;
> > };
> 
> Alright, I will split it into two entries. That 

Re: [RFC PATCH 1/5] uapi/linux/if_tun.h: Added new ioctl for tun/tap.

2022-02-23 Thread Yuri Benditovich
Hi Jason,
We agree that the same can be done also using the old way, i.e. try to
set specific offload - if failed, probably it is not supported.
We think this is a little not scalable and we suggest adding the ioctl
that will allow us to query allo the supported features in a single
call.
We think this will make QEMU code more simple also in future.
Do I understand correctly that you suggest to skip this new ioctl and
use the old way of query for this (USO) feature and all future
extensions?

Thanks


On Wed, Feb 23, 2022 at 5:53 AM Jason Wang  wrote:
>
> On Tue, Feb 22, 2022 at 9:28 PM Andrew Melnichenko  wrote:
> >
> > Hi all,
> >
> > On Wed, Feb 9, 2022 at 6:26 AM Jason Wang  wrote:
> > >
> > >
> > > 在 2022/1/25 下午4:46, Andrew Melnychenko 写道:
> > > > Added TUNGETSUPPORTEDOFFLOADS that should allow
> > > > to get bits of supported offloads.
> > >
> > >
> > > So we don't use dedicated ioctls in the past, instead, we just probing
> > > by checking the return value of TUNSETOFFLOADS.
> > >
> > > E.g qemu has the following codes:
> > >
> > > int tap_probe_has_ufo(int fd)
> > > {
> > >  unsigned offload;
> > >
> > >  offload = TUN_F_CSUM | TUN_F_UFO;
> > >
> > >  if (ioctl(fd, TUNSETOFFLOAD, offload) < 0)
> > >  return 0;
> > >
> > >  return 1;
> > > }
> > >
> > > Any reason we can't keep using that?
> > >
> > > Thanks
> > >
> >
> > Well, even in this example. To check the ufo feature, we trying to set it.
> > What if we don't need to "enable" UFO and/or do not change its state?
>
> So at least Qemu doesn't have such a requirement since during the
> probe the virtual networking backend is not even started.
>
> > I think it's a good idea to have the ability to get supported offloads
> > without changing device behavior.
>
> Do you see a real user for this?
>
> Btw, we still need to probe this new ioctl anyway.
>
> Thanks
>
> >
> > >
> > > > Added 2 additional offlloads for USO(IPv4 & IPv6).
> > > > Separate offloads are required for Windows VM guests,
> > > > g.e. Windows may set USO rx only for IPv4.
> > > >
> > > > Signed-off-by: Andrew Melnychenko 
> > > > ---
> > > >   include/uapi/linux/if_tun.h | 3 +++
> > > >   1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> > > > index 454ae31b93c7..07680fae6e18 100644
> > > > --- a/include/uapi/linux/if_tun.h
> > > > +++ b/include/uapi/linux/if_tun.h
> > > > @@ -61,6 +61,7 @@
> > > >   #define TUNSETFILTEREBPF _IOR('T', 225, int)
> > > >   #define TUNSETCARRIER _IOW('T', 226, int)
> > > >   #define TUNGETDEVNETNS _IO('T', 227)
> > > > +#define TUNGETSUPPORTEDOFFLOADS _IOR('T', 228, unsigned int)
> > > >
> > > >   /* TUNSETIFF ifr flags */
> > > >   #define IFF_TUN 0x0001
> > > > @@ -88,6 +89,8 @@
> > > >   #define TUN_F_TSO6  0x04/* I can handle TSO for IPv6 packets */
> > > >   #define TUN_F_TSO_ECN   0x08/* I can handle TSO with ECN 
> > > > bits. */
> > > >   #define TUN_F_UFO   0x10/* I can handle UFO packets */
> > > > +#define TUN_F_USO4   0x20/* I can handle USO for IPv4 packets */
> > > > +#define TUN_F_USO6   0x40/* I can handle USO for IPv6 packets */
> > > >
> > > >   /* Protocol info prepended to the packets (when IFF_NO_PI is not set) 
> > > > */
> > > >   #define TUN_PKT_STRIP   0x0001
> > >
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 3/3] virtio-crypto: implement RSA algorithm

2022-02-23 Thread zhenwei pi


On 2/18/22 11:12 AM, zhenwei pi wrote:

+void virtio_crypto_akcipher_algs_unregister(struct virtio_crypto
+*vcrypto) {
+    int i = 0;
+
+    mutex_lock(_lock);
+
+    for (i = 0; i < ARRAY_SIZE(virtio_crypto_akcipher_algs); i++) {
+    uint32_t service = virtio_crypto_akcipher_algs[i].service;
+    uint32_t algonum = virtio_crypto_akcipher_algs[i].algonum;
+
+    if (virtio_crypto_akcipher_algs[i].active_devs == 0 ||
+    !virtcrypto_algo_is_supported(vcrypto, service, algonum))
+    continue;
+
+    if (virtio_crypto_akcipher_algs[i].active_devs == 1)
+
crypto_unregister_akcipher(_crypto_akcipher_algs[i].algo);
+
+    virtio_crypto_akcipher_algs[i].active_devs--;
+    }
+
+    mutex_unlock(_lock);
+}


Why don't you reuse the virtio_crypto_algs_register/unregister functions?
The current code is too repetitive. Maybe we don't need create the new 
file virtio_crypto_akcipher_algo.c

because we had virtio_crypto_algs.c which includes all algorithms.



Yes, this looks similar to virtio_crypto_algs_register/unregister.

Let's look at the difference:
struct virtio_crypto_akcipher_algo {
     uint32_t algonum;
     uint32_t service;
     unsigned int active_devs;
     struct akcipher_alg algo;
};

struct virtio_crypto_algo {
     uint32_t algonum;
     uint32_t service;
     unsigned int active_devs;
     struct skcipher_alg algo; /* akcipher_alg VS skcipher_alg */
};

If reusing virtio_crypto_algs_register/unregister, we need to modify the 
data structure like this:

struct virtio_crypto_akcipher_algo {
     uint32_t algonum;
     uint32_t service;    /* use service to distinguish 
akcipher/skcipher */

     unsigned int active_devs;
 union {
     struct skcipher_alg skcipher;
     struct akcipher_alg akcipher;
 } alg;
};

int virtio_crypto_akcipher_algs_register(struct virtio_crypto *vcrypto)
{
 ...
     for (i = 0; i < ARRAY_SIZE(virtio_crypto_akcipher_algs); i++) {
     uint32_t service = virtio_crypto_akcipher_algs[i].service;
     ...
     /* test service type then call 
crypto_register_akcipher/crypto_register_skcipher */

     if (service == VIRTIO_CRYPTO_SERVICE_AKCIPHER)
 
crypto_register_akcipher(_crypto_akcipher_algs[i].algo.akcipher);

     else
 
crypto_register_skcipher(_crypto_skcipher_algs[i].algo.skcipher);

     ...
     }
 ...
}

Also test service type and call 
crypto_unregister_skcipher/crypto_unregister_akcipher.


This gets unclear from current v2 version.

On the other hand, the kernel side prefers to separate skcipher and 
akcipher(separated header files and implementations).



Hi, Lei
I also take a look at other crypto drivers at qat/ccp/hisilicon, they 
separate akcipher/skcipher algo. If you consider that reusing 
virtio_crypto_algs_register/unregister seems better, I will try to merge 
them into a single function.


--
zhenwei pi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 3/6] virtio: remove flags check for unmap packed indirect desc

2022-02-23 Thread Xuan Zhuo
On Wed, 23 Feb 2022 10:53:26 +0800, Jason Wang  wrote:
>
> 在 2022/2/10 下午4:51, Xuan Zhuo 写道:
> > When calling vring_unmap_desc_packed(), it will not encounter the
> > situation that the flags contains VRING_DESC_F_INDIRECT. So remove this
> > logic.
>
>
> This seems not correct.
>
> There's a call from detach_buf_packed() that can work for indirect
> descriptors?
>

Yes, it works with indirect descriptors. But these descriptors do not contain
VRING_DESC_F_INDIRECT.

The one that contains VRING_DESC_F_INDIRECT is vq->packed.desc_extra[id].flags.
This is handled by vring_unmap_state_packed().

Thanks.

> Thanks
>
>
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >   drivers/virtio/virtio_ring.c | 18 +-
> >   1 file changed, 5 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index fadd0a7503e9..cfb028ca238e 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1009,19 +1009,11 @@ static void vring_unmap_desc_packed(const struct 
> > vring_virtqueue *vq,
> >
> > flags = le16_to_cpu(desc->flags);
> >
> > -   if (flags & VRING_DESC_F_INDIRECT) {
> > -   dma_unmap_single(vring_dma_dev(vq),
> > -le64_to_cpu(desc->addr),
> > -le32_to_cpu(desc->len),
> > -(flags & VRING_DESC_F_WRITE) ?
> > -DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > -   } else {
> > -   dma_unmap_page(vring_dma_dev(vq),
> > -  le64_to_cpu(desc->addr),
> > -  le32_to_cpu(desc->len),
> > -  (flags & VRING_DESC_F_WRITE) ?
> > -  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > -   }
> > +   dma_unmap_page(vring_dma_dev(vq),
> > +  le64_to_cpu(desc->addr),
> > +  le32_to_cpu(desc->len),
> > +  (flags & VRING_DESC_F_WRITE) ?
> > +  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >   }
> >
> >   static struct vring_packed_desc *alloc_indirect_packed(unsigned int 
> > total_sg,
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization