Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-09-29 Thread Michael S. Tsirkin
On Wed, Sep 29, 2021 at 04:08:29PM -0700, Wei Wang wrote:
> On Wed, Sep 29, 2021 at 2:53 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote:
> > > On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > > > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang  wrote:
> > > > > > >
> > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn 
> > > > > > > > wrote:
> > > > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang 
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > With the implementation of napi-tx in virtio 
> > > > > > > > > > > > > > > driver, we clean tx
> > > > > > > > > > > > > > > descriptors from rx napi handler, for the purpose 
> > > > > > > > > > > > > > > of reducing tx
> > > > > > > > > > > > > > > complete interrupts. But this could introduce a 
> > > > > > > > > > > > > > > race where tx complete
> > > > > > > > > > > > > > > interrupt has been raised, but the handler found 
> > > > > > > > > > > > > > > there is no work to do
> > > > > > > > > > > > > > > because we have done the work in the previous rx 
> > > > > > > > > > > > > > > interrupt handler.
> > > > > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting 
> > > > > > > > > > > > > > > with the
> > > > > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not 
> > > > > > > > > > > > > > > tainted
> > > > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > > > > [ 3588.017942]  
> > > > > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > > > > [ 3588.017966]  
> > > > > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > > > > [ 3588.020374] [<1b9f1da8>] 
> > > > > > > > > > > > > > > vring_interrupt
> > > > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patch adds a new param to struct 
> > > > > > > > > > > > > > > vring_virtqueue, and we set it for
> > > > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress 
> > > > > > > > > > > > > > > the warning in such
> > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx 
> > > > > > > > > > > > > > > descriptors from rx napi")
> > > > > > > > > > > > > > > Reported-by: Rick Jones 
> > > > > > > > > > > > > > > Signed-off-by: Wei Wang 
> > > > > > > > > > > > > > > Signed-off-by: Willem de Bruijn 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > > > > only triggers after an interrupt is unhandled 
> > > > > > > > > > > > > > repeatedly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So something causes a storm of useless tx 
> > > > > > > > > > > > > > interrupts here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let's find out what it was please. What you are 
> > > > > > > > > > > > > > doing is
> > > > > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The traffic that causes this warning is a netperf 
> > > > > > 

Re: [PATCH v2 1/6] driver core: Move the "authorized" attribute from USB/Thunderbolt to core

2021-09-29 Thread Dan Williams
On Wed, Sep 29, 2021 at 7:39 PM Kuppuswamy, Sathyanarayanan
 wrote:
>
>
>
> On 9/29/21 6:55 PM, Dan Williams wrote:
> >> Also, you ignored the usb_[de]authorize_interface() functions and
> >> their friends.
> > Ugh, yes.
>
> I did not change it because I am not sure about the interface vs device
> dependency.
>

This is was the rationale for has_probe_authorization flag. USB
performs authorization of child devices based on the authorization
state of the parent interface.

> I think following change should work.
>
> diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
> index f57b5a7a90ca..84969732d09c 100644
> --- a/drivers/usb/core/driver.c
> +++ b/drivers/usb/core/driver.c
> @@ -334,7 +334,7 @@ static int usb_probe_interface(struct device *dev)
> if (udev->dev.authorized == false) {
> dev_err(&intf->dev, "Device is not authorized for usage\n");
> return error;
> -   } else if (intf->authorized == 0) {
> +   } else if (intf->dev.authorized == 0) {

== false.

> dev_err(&intf->dev, "Interface %d is not authorized for 
> usage\n",
> intf->altsetting->desc.bInterfaceNumber);
> return error;
> @@ -546,7 +546,7 @@ int usb_driver_claim_interface(struct usb_driver *driver,
> return -EBUSY;
>
> /* reject claim if interface is not authorized */
> -   if (!iface->authorized)
> +   if (!iface->dev.authorized)

I'd do == false to keep it consistent with other conversions.

> return -ENODEV;
>
> dev->driver = &driver->drvwrap.driver;
> diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
> index 47548ce1cfb1..ab3c8d1e4db9 100644
> --- a/drivers/usb/core/message.c
> +++ b/drivers/usb/core/message.c
> @@ -1791,9 +1791,9 @@ void usb_deauthorize_interface(struct usb_interface 
> *intf)
>
> device_lock(dev->parent);
>
> -   if (intf->authorized) {
> +   if (intf->dev.authorized) {
> device_lock(dev);
> -   intf->authorized = 0;
> +   intf->dev.authorized = 0;

= false;

> device_unlock(dev);
>
> usb_forced_unbind_intf(intf);
> @@ -1811,9 +1811,9 @@ void usb_authorize_interface(struct usb_interface *intf)
>   {
> struct device *dev = &intf->dev;
>
> -   if (!intf->authorized) {
> +   if (!intf->dev.authorized) {
> device_lock(dev);
> -   intf->authorized = 1; /* authorize interface */
> +   intf->dev.authorized = 1; /* authorize interface */

= true

...not sure that comment is worth preserving.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/6] driver core: Move the "authorized" attribute from USB/Thunderbolt to core

2021-09-29 Thread Dan Williams
On Wed, Sep 29, 2021 at 6:43 PM Alan Stern  wrote:
>
> On Wed, Sep 29, 2021 at 06:05:06PM -0700, Kuppuswamy Sathyanarayanan wrote:
> > Currently bus drivers like "USB" or "Thunderbolt" implement a custom
> > version of device authorization to selectively authorize the driver
> > probes. Since there is a common requirement, move the "authorized"
> > attribute support to the driver core in order to allow it to be used
> > by other subsystems / buses.
> >
> > Similar requirements have been discussed in the PCI [1] community for
> > PCI bus drivers as well.
> >
> > No functional changes are intended. It just converts authorized
> > attribute from int to bool and moves it to the driver core. There
> > should be no user-visible change in the location or semantics of
> > attributes for USB devices.
> >
> > Regarding thunderbolt driver, although it declares sw->authorized as
> > "int" and allows 0,1,2 as valid values for sw->authorized attribute,
> > but within the driver, in all authorized attribute related checks,
> > it is treated as bool value. So when converting the authorized
> > attribute from int to bool value, there should be no functional
> > changes other than value 2 being not visible to the user.
> >
> > [1]: 
> > https://lore.kernel.org/all/CACK8Z6E8pjVeC934oFgr=vb3pulx_gyt2nkzaogdrqj9tks...@mail.gmail.com/
> >
> > Reviewed-by: Dan Williams 
> > Signed-off-by: Kuppuswamy Sathyanarayanan 
> > 
>
> Since you're moving the authorized flag from the USB core to the
> driver core, the corresponding sysfs attribute functions should be
> moved as well.

Unlike when 'removable' moved from USB to the driver core there isn't
a common definition for how the 'authorized' sysfs-attribute behaves
across buses. The only common piece is where this flag is stored in
the data structure, i.e. the 'authorized' sysfs interface is
purposefully left bus specific.

> Also, you ignored the usb_[de]authorize_interface() functions and
> their friends.

Ugh, yes.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/6] driver core: Move the "authorized" attribute from USB/Thunderbolt to core

2021-09-29 Thread Alan Stern
On Wed, Sep 29, 2021 at 06:05:06PM -0700, Kuppuswamy Sathyanarayanan wrote:
> Currently bus drivers like "USB" or "Thunderbolt" implement a custom
> version of device authorization to selectively authorize the driver
> probes. Since there is a common requirement, move the "authorized"
> attribute support to the driver core in order to allow it to be used
> by other subsystems / buses.
> 
> Similar requirements have been discussed in the PCI [1] community for
> PCI bus drivers as well.
> 
> No functional changes are intended. It just converts authorized
> attribute from int to bool and moves it to the driver core. There
> should be no user-visible change in the location or semantics of
> attributes for USB devices.
> 
> Regarding thunderbolt driver, although it declares sw->authorized as
> "int" and allows 0,1,2 as valid values for sw->authorized attribute,
> but within the driver, in all authorized attribute related checks,
> it is treated as bool value. So when converting the authorized
> attribute from int to bool value, there should be no functional
> changes other than value 2 being not visible to the user.
> 
> [1]: 
> https://lore.kernel.org/all/CACK8Z6E8pjVeC934oFgr=vb3pulx_gyt2nkzaogdrqj9tks...@mail.gmail.com/
> 
> Reviewed-by: Dan Williams 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 

Since you're moving the authorized flag from the USB core to the
driver core, the corresponding sysfs attribute functions should be
moved as well.

Also, you ignored the usb_[de]authorize_interface() functions and 
their friends.

Alan Stern
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 1/1] virtio: write back features before verify

2021-09-29 Thread Halil Pasic
This patch fixes a regression introduced by commit 82e89ea077b9
("virtio-blk: Add validation for block size in config space") and
enables similar checks in verify() on big endian platforms.

The problem with checking multi-byte config fields in the verify
callback, on big endian platforms, and with a possibly transitional
device is the following. The verify() callback is called between
config->get_features() and virtio_finalize_features(). That we have a
device that offered F_VERSION_1 then we have the following options
either the device is transitional, and then it has to present the legacy
interface, i.e. a big endian config space until F_VERSION_1 is
negotiated, or we have a non-transitional device, which makes
F_VERSION_1 mandatory, and only implements the non-legacy interface and
thus presents a little endian config space. Because at this point we
can't know if the device is transitional or non-transitional, we can't
know do we need to byte swap or not.

The virtio spec explicitly states that the driver MAY read config
between reading and writing the features so saying that first accessing
the config before feature negotiation is done is not an option. The
specification ain't clear about setting the features multiple times
before FEATURES_OK, so I guess that should be fine.

I don't consider this patch super clean, but frankly I don't think we
have a ton of options. Another option that may or man not be cleaner,
but is also IMHO much uglier is to figure out whether the device is
transitional by rejecting _F_VERSION_1, then resetting it and proceeding
according tho what we have figured out, hoping that the characteristics
of the device didn't change.

Signed-off-by: Halil Pasic 
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
space")
Reported-by: mark...@us.ibm.com
---
 drivers/virtio/virtio.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 0a5b54034d4b..9dc3cfa17b1c 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -249,6 +249,10 @@ static int virtio_dev_probe(struct device *_d)
if (device_features & (1ULL << i))
__virtio_set_bit(dev, i);
 
+   /* Write back features before validate to know endianness */
+   if (device_features & (1ULL << VIRTIO_F_VERSION_1))
+   dev->config->finalize_features(dev);
+
if (drv->validate) {
err = drv->validate(dev);
if (err)

base-commit: 02d5e016800d082058b3d3b7c3ede136cdc6ddcb
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-09-29 Thread Michael S. Tsirkin
On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote:
> On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > >  wrote:
> > > >
> > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang  wrote:
> > > > >
> > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn 
> > > > > > > > wrote:
> > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang  
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, 
> > > > > > > > > > > > > we clean tx
> > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of 
> > > > > > > > > > > > > reducing tx
> > > > > > > > > > > > > complete interrupts. But this could introduce a race 
> > > > > > > > > > > > > where tx complete
> > > > > > > > > > > > > interrupt has been raised, but the handler found 
> > > > > > > > > > > > > there is no work to do
> > > > > > > > > > > > > because we have done the work in the previous rx 
> > > > > > > > > > > > > interrupt handler.
> > > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with 
> > > > > > > > > > > > > the
> > > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not 
> > > > > > > > > > > > > tainted
> > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > > [ 3588.017942]  
> > > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > > [ 3588.017966]  
> > > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > > [ 3588.020374] [<1b9f1da8>] vring_interrupt
> > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > > >
> > > > > > > > > > > > > This patch adds a new param to struct 
> > > > > > > > > > > > > vring_virtqueue, and we set it for
> > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the 
> > > > > > > > > > > > > warning in such
> > > > > > > > > > > > > case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx 
> > > > > > > > > > > > > descriptors from rx napi")
> > > > > > > > > > > > > Reported-by: Rick Jones 
> > > > > > > > > > > > > Signed-off-by: Wei Wang 
> > > > > > > > > > > > > Signed-off-by: Willem de Bruijn 
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > > >
> > > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > > only triggers after an interrupt is unhandled 
> > > > > > > > > > > > repeatedly.
> > > > > > > > > > > >
> > > > > > > > > > > > So something causes a storm of useless tx interrupts 
> > > > > > > > > > > > here.
> > > > > > > > > > > >
> > > > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > > >
> > > > > > > > > > > The traffic that causes this warning is a netperf 
> > > > > > > > > > > tcp_stream with at
> > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets 
> > > > > > > > > > > triggered on the
> > > > > > > > > > > receiving host, which has a lot of rx interrupts firing 
> > > > > > > > > > > on all queues,
> > > > > > > > > > > and a few tx interrupts.
> > > > > > > > > > > And I think the scenario is: when the tx interrupt gets 
> > > > > > > > > > > fired, it gets
> > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx 
> > > > > > > > > > > interrupts
> > > > > > > > > > > get triggered very close to ea

[RFC PATCH 02/10] vhost: push virtqueue area pointers into a user struct

2021-09-29 Thread Vincent Whitchurch
In order to prepare for allowing vhost to operate on kernel buffers,
push the virtqueue desc/avail/used area pointers down to a new "user"
struct.

No functional change intended.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/vdpa.c  |  6 +--
 drivers/vhost/vhost.c | 90 +--
 drivers/vhost/vhost.h |  8 ++--
 3 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index f41d081777f5..6f05388f5a21 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -400,9 +400,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, 
unsigned int cmd,
switch (cmd) {
case VHOST_SET_VRING_ADDR:
if (ops->set_vq_address(vdpa, idx,
-   (u64)(uintptr_t)vq->desc,
-   (u64)(uintptr_t)vq->avail,
-   (u64)(uintptr_t)vq->used))
+   (u64)(uintptr_t)vq->user.desc,
+   (u64)(uintptr_t)vq->user.avail,
+   (u64)(uintptr_t)vq->user.used))
r = -EINVAL;
break;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe2..108994f386f7 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -46,8 +46,8 @@ enum {
VHOST_MEMORY_F_LOG = 0x1,
 };
 
-#define vhost_used_event(vq) ((__virtio16 __user *)&vq->avail->ring[vq->num])
-#define vhost_avail_event(vq) ((__virtio16 __user *)&vq->used->ring[vq->num])
+#define vhost_used_event(vq) ((__virtio16 __user 
*)&vq->user.avail->ring[vq->num])
+#define vhost_avail_event(vq) ((__virtio16 __user 
*)&vq->user.used->ring[vq->num])
 
 #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
 static void vhost_disable_cross_endian(struct vhost_virtqueue *vq)
@@ -306,7 +306,7 @@ static void vhost_vring_call_reset(struct vhost_vring_call 
*call_ctx)
 
 bool vhost_vq_is_setup(struct vhost_virtqueue *vq)
 {
-   return vq->avail && vq->desc && vq->used && vhost_vq_access_ok(vq);
+   return vq->user.avail && vq->user.desc && vq->user.used && 
vhost_vq_access_ok(vq);
 }
 EXPORT_SYMBOL_GPL(vhost_vq_is_setup);
 
@@ -314,9 +314,9 @@ static void vhost_vq_reset(struct vhost_dev *dev,
   struct vhost_virtqueue *vq)
 {
vq->num = 1;
-   vq->desc = NULL;
-   vq->avail = NULL;
-   vq->used = NULL;
+   vq->user.desc = NULL;
+   vq->user.avail = NULL;
+   vq->user.used = NULL;
vq->last_avail_idx = 0;
vq->avail_idx = 0;
vq->last_used_idx = 0;
@@ -444,8 +444,8 @@ static size_t vhost_get_avail_size(struct vhost_virtqueue 
*vq,
size_t event __maybe_unused =
   vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
 
-   return sizeof(*vq->avail) +
-  sizeof(*vq->avail->ring) * num + event;
+   return sizeof(*vq->user.avail) +
+  sizeof(*vq->user.avail->ring) * num + event;
 }
 
 static size_t vhost_get_used_size(struct vhost_virtqueue *vq,
@@ -454,14 +454,14 @@ static size_t vhost_get_used_size(struct vhost_virtqueue 
*vq,
size_t event __maybe_unused =
   vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
 
-   return sizeof(*vq->used) +
-  sizeof(*vq->used->ring) * num + event;
+   return sizeof(*vq->user.used) +
+  sizeof(*vq->user.used->ring) * num + event;
 }
 
 static size_t vhost_get_desc_size(struct vhost_virtqueue *vq,
  unsigned int num)
 {
-   return sizeof(*vq->desc) * num;
+   return sizeof(*vq->user.desc) * num;
 }
 
 void vhost_dev_init(struct vhost_dev *dev,
@@ -959,7 +959,7 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq,
 struct vring_used_elem *head, int idx,
 int count)
 {
-   return vhost_copy_to_user(vq, vq->used->ring + idx, head,
+   return vhost_copy_to_user(vq, vq->user.used->ring + idx, head,
  count * sizeof(*head));
 }
 
@@ -967,14 +967,14 @@ static inline int vhost_put_used_flags(struct 
vhost_virtqueue *vq)
 
 {
return vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags),
- &vq->used->flags);
+ &vq->user.used->flags);
 }
 
 static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 
 {
return vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
- &vq->used->idx);
+ &vq->user.used->idx);
 }
 
 #define vhost_get_user(vq, x, ptr, type)   \
@@ -1018,20 +1018,20 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)
 static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
  __virtio16 *idx)
 {
-   return vhost_get_avail(vq, *idx, &vq->avail->i

[RFC PATCH 06/10] vhost: extract ioctl locking to common code

2021-09-29 Thread Vincent Whitchurch
Extract the mutex locking for the vhost ioctl into common code.  This
will allow the common code to easily add some extra handling required
for adding a kernel API to control vhost.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/common.c |  7 ++-
 drivers/vhost/net.c| 14 +-
 drivers/vhost/vhost.c  | 10 --
 drivers/vhost/vhost.h  |  1 +
 drivers/vhost/vsock.c  | 12 
 5 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/drivers/vhost/common.c b/drivers/vhost/common.c
index 27d4672b15d3..a5722ad65e24 100644
--- a/drivers/vhost/common.c
+++ b/drivers/vhost/common.c
@@ -60,8 +60,13 @@ static long vhost_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg
 {
struct vhost_dev *dev = file->private_data;
struct vhost *vhost = dev->vhost;
+   long ret;
 
-   return vhost->ops->ioctl(dev, ioctl, arg);
+   mutex_lock(&dev->mutex);
+   ret = vhost->ops->ioctl(dev, ioctl, arg);
+   mutex_unlock(&dev->mutex);
+
+   return ret;
 }
 
 static ssize_t vhost_read_iter(struct kiocb *iocb, struct iov_iter *to)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8910d9e2a74e..b5590b7862a9 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1505,7 +1505,6 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
struct vhost_net_ubuf_ref *ubufs, *oldubufs = NULL;
int r;
 
-   mutex_lock(&n->dev.mutex);
r = vhost_dev_check_owner(&n->dev);
if (r)
goto err;
@@ -1573,7 +1572,6 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
sockfd_put(oldsock);
}
 
-   mutex_unlock(&n->dev.mutex);
return 0;
 
 err_used:
@@ -1587,7 +1585,6 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
 err_vq:
mutex_unlock(&vq->mutex);
 err:
-   mutex_unlock(&n->dev.mutex);
return r;
 }
 
@@ -1598,7 +1595,6 @@ static long vhost_net_reset_owner(struct vhost_net *n)
long err;
struct vhost_iotlb *umem;
 
-   mutex_lock(&n->dev.mutex);
err = vhost_dev_check_owner(&n->dev);
if (err)
goto done;
@@ -1613,7 +1609,6 @@ static long vhost_net_reset_owner(struct vhost_net *n)
vhost_dev_reset_owner(&n->dev, umem);
vhost_net_vq_reset(n);
 done:
-   mutex_unlock(&n->dev.mutex);
if (tx_sock)
sockfd_put(tx_sock);
if (rx_sock)
@@ -1639,7 +1634,6 @@ static int vhost_net_set_features(struct vhost_net *n, 
u64 features)
vhost_hlen = 0;
sock_hlen = hdr_len;
}
-   mutex_lock(&n->dev.mutex);
if ((features & (1 << VHOST_F_LOG_ALL)) &&
!vhost_log_access_ok(&n->dev))
goto out_unlock;
@@ -1656,11 +1650,9 @@ static int vhost_net_set_features(struct vhost_net *n, 
u64 features)
n->vqs[i].sock_hlen = sock_hlen;
mutex_unlock(&n->vqs[i].vq.mutex);
}
-   mutex_unlock(&n->dev.mutex);
return 0;
 
 out_unlock:
-   mutex_unlock(&n->dev.mutex);
return -EFAULT;
 }
 
@@ -1668,7 +1660,6 @@ static long vhost_net_set_owner(struct vhost_net *n)
 {
int r;
 
-   mutex_lock(&n->dev.mutex);
if (vhost_dev_has_owner(&n->dev)) {
r = -EBUSY;
goto out;
@@ -1681,7 +1672,6 @@ static long vhost_net_set_owner(struct vhost_net *n)
vhost_net_clear_ubuf_info(n);
vhost_net_flush(n);
 out:
-   mutex_unlock(&n->dev.mutex);
return r;
 }
 
@@ -1721,20 +1711,18 @@ static long vhost_net_ioctl(struct vhost_dev *dev, 
unsigned int ioctl,
return -EFAULT;
if (features & ~VHOST_NET_BACKEND_FEATURES)
return -EOPNOTSUPP;
-   vhost_set_backend_features(&n->dev, features);
+   __vhost_set_backend_features(&n->dev, features);
return 0;
case VHOST_RESET_OWNER:
return vhost_net_reset_owner(n);
case VHOST_SET_OWNER:
return vhost_net_set_owner(n);
default:
-   mutex_lock(&n->dev.mutex);
r = vhost_dev_ioctl(&n->dev, ioctl, argp);
if (r == -ENOIOCTLCMD)
r = vhost_vring_ioctl(&n->dev, ioctl, argp);
else
vhost_net_flush(n);
-   mutex_unlock(&n->dev.mutex);
return r;
}
 }
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9354061ce75e..9d6496b7ad85 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2821,18 +2821,24 @@ struct vhost_msg_node *vhost_dequeue_msg(struct 
vhost_dev *dev,
 }
 EXPORT_SYMBOL_GPL(vhost_dequeue_msg);
 
-void vhost_set_backend_features(struct vhost_dev *dev, u64 features)
+void __vhost_set_backend_features(struct vhost_dev *dev, u64 features)
 {

[RFC PATCH 08/10] vhost: net: add support for kernel control

2021-09-29 Thread Vincent Whitchurch
Add support for kernel control to virtio-net.  For the vhost-*-kernel
devices, the ioctl to set the backend only provides the socket to
vhost-net but does not start the handling of the virtqueues.  The
handling of the virtqueues is started and stopped by the kernel.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/net.c | 106 
 1 file changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index b5590b7862a9..977cfa89b216 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -144,6 +144,9 @@ struct vhost_net {
struct page_frag page_frag;
/* Refcount bias of page frag */
int refcnt_bias;
+   /* Used for storing backend sockets when stopped under kernel control */
+   struct socket *tx_sock;
+   struct socket *rx_sock;
 };
 
 static unsigned vhost_net_zcopy_mask __read_mostly;
@@ -1293,6 +1296,8 @@ static struct vhost_dev *vhost_net_open(struct vhost 
*vhost)
n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
if (!n)
return ERR_PTR(-ENOMEM);
+   n->tx_sock = NULL;
+   n->rx_sock = NULL;
vqs = kmalloc_array(VHOST_NET_VQ_MAX, sizeof(*vqs), GFP_KERNEL);
if (!vqs) {
kvfree(n);
@@ -1364,6 +1369,20 @@ static struct socket *vhost_net_stop_vq(struct vhost_net 
*n,
return sock;
 }
 
+/* Stops the virtqueue but doesn't unconsume the tap ring */
+static struct socket *__vhost_net_stop_vq(struct vhost_net *n,
+ struct vhost_virtqueue *vq)
+{
+   struct socket *sock;
+
+   mutex_lock(&vq->mutex);
+   sock = vhost_vq_get_backend(vq);
+   vhost_net_disable_vq(n, vq);
+   vhost_vq_set_backend(vq, NULL);
+   mutex_unlock(&vq->mutex);
+   return sock;
+}
+
 static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock,
   struct socket **rx_sock)
 {
@@ -1394,6 +1413,57 @@ static void vhost_net_flush(struct vhost_net *n)
}
 }
 
+static void vhost_net_start_vq(struct vhost_net *n, struct vhost_virtqueue *vq,
+  struct socket *sock)
+{
+   mutex_lock(&vq->mutex);
+   vhost_vq_set_backend(vq, sock);
+   vhost_vq_init_access(vq);
+   vhost_net_enable_vq(n, vq);
+   mutex_unlock(&vq->mutex);
+}
+
+static void vhost_net_dev_start_vq(struct vhost_dev *dev, u16 idx)
+{
+   struct vhost_net *n = container_of(dev, struct vhost_net, dev);
+
+   if (WARN_ON(idx >= ARRAY_SIZE(n->vqs)))
+   return;
+
+   if (idx == VHOST_NET_VQ_RX) {
+   vhost_net_start_vq(n, &n->vqs[idx].vq, n->rx_sock);
+   n->rx_sock = NULL;
+   } else if (idx == VHOST_NET_VQ_TX) {
+   vhost_net_start_vq(n, &n->vqs[idx].vq, n->tx_sock);
+   n->tx_sock = NULL;
+   }
+
+   vhost_net_flush_vq(n, idx);
+}
+
+static void vhost_net_dev_stop_vq(struct vhost_dev *dev, u16 idx)
+{
+   struct vhost_net *n = container_of(dev, struct vhost_net, dev);
+   struct socket *sock;
+
+   if (WARN_ON(idx >= ARRAY_SIZE(n->vqs)))
+   return;
+
+   if (!vhost_vq_get_backend(&n->vqs[idx].vq))
+   return;
+
+   sock = __vhost_net_stop_vq(n, &n->vqs[idx].vq);
+
+   vhost_net_flush(n);
+   synchronize_rcu();
+   vhost_net_flush(n);
+
+   if (idx == VHOST_NET_VQ_RX)
+   n->rx_sock = sock;
+   else if (idx == VHOST_NET_VQ_TX)
+   n->tx_sock = sock;
+}
+
 static void vhost_net_release(struct vhost_dev *dev)
 {
struct vhost_net *n = container_of(dev, struct vhost_net, dev);
@@ -1405,6 +1475,14 @@ static void vhost_net_release(struct vhost_dev *dev)
vhost_dev_stop(&n->dev);
vhost_dev_cleanup(&n->dev);
vhost_net_vq_reset(n);
+   if (n->tx_sock) {
+   WARN_ON(tx_sock);
+   tx_sock = n->tx_sock;
+   }
+   if (n->rx_sock) {
+   WARN_ON(rx_sock);
+   rx_sock = n->rx_sock;
+   }
if (tx_sock)
sockfd_put(tx_sock);
if (rx_sock)
@@ -1518,7 +1596,7 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
mutex_lock(&vq->mutex);
 
/* Verify that ring has been setup correctly. */
-   if (!vhost_vq_access_ok(vq)) {
+   if (!vhost_kernel(vq) && !vhost_vq_access_ok(vq)) {
r = -EFAULT;
goto err_vq;
}
@@ -1539,14 +1617,17 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
}
 
vhost_net_disable_vq(n, vq);
-   vhost_vq_set_backend(vq, sock);
+   if (!vhost_kernel(vq))
+   vhost_vq_set_backend(vq, sock);
vhost_net_buf_unproduce(nvq);
-   r = vhost_vq_init_access(vq);
-   if (r)
-   goto err_used;
-   

[RFC PATCH 05/10] vhost: extract common code for file_operations handling

2021-09-29 Thread Vincent Whitchurch
There is some duplicated code for handling of file_operations among
vhost drivers.  Move this to a common file.

Having file_operations in a common place also makes adding functions for
obaining a handle to a vhost device from a file descriptor simpler.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/Makefile |   3 +
 drivers/vhost/common.c | 134 +
 drivers/vhost/net.c|  79 +++-
 drivers/vhost/vhost.h  |  15 +
 drivers/vhost/vsock.c  |  75 +++
 5 files changed, 197 insertions(+), 109 deletions(-)
 create mode 100644 drivers/vhost/common.c

diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index f3e1897cce85..b1ddc976aede 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -15,5 +15,8 @@ vhost_vdpa-y := vdpa.o
 
 obj-$(CONFIG_VHOST)+= vhost.o
 
+obj-$(CONFIG_VHOST)+= vhost_common.o
+vhost_common-y := common.o
+
 obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
 vhost_iotlb-y := iotlb.o
diff --git a/drivers/vhost/common.c b/drivers/vhost/common.c
new file mode 100644
index ..27d4672b15d3
--- /dev/null
+++ b/drivers/vhost/common.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost.h"
+
+struct vhost_ops;
+
+struct vhost {
+   struct miscdevice misc;
+   const struct vhost_ops *ops;
+};
+
+static int vhost_open(struct inode *inode, struct file *file)
+{
+   struct miscdevice *misc = file->private_data;
+   struct vhost *vhost = container_of(misc, struct vhost, misc);
+   struct vhost_dev *dev;
+
+   dev = vhost->ops->open(vhost);
+   if (IS_ERR(dev))
+   return PTR_ERR(dev);
+
+   dev->vhost = vhost;
+   dev->file = file;
+   file->private_data = dev;
+
+   return 0;
+}
+
+static int vhost_release(struct inode *inode, struct file *file)
+{
+   struct vhost_dev *dev = file->private_data;
+   struct vhost *vhost = dev->vhost;
+
+   vhost->ops->release(dev);
+
+   return 0;
+}
+
+static long vhost_ioctl(struct file *file, unsigned int ioctl, unsigned long 
arg)
+{
+   struct vhost_dev *dev = file->private_data;
+   struct vhost *vhost = dev->vhost;
+
+   return vhost->ops->ioctl(dev, ioctl, arg);
+}
+
+static ssize_t vhost_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+   struct file *file = iocb->ki_filp;
+   struct vhost_dev *dev = file->private_data;
+   int noblock = file->f_flags & O_NONBLOCK;
+
+   return vhost_chr_read_iter(dev, to, noblock);
+}
+
+static ssize_t vhost_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+   struct file *file = iocb->ki_filp;
+   struct vhost_dev *dev = file->private_data;
+
+   return vhost_chr_write_iter(dev, from);
+}
+
+static __poll_t vhost_poll(struct file *file, poll_table *wait)
+{
+   struct vhost_dev *dev = file->private_data;
+
+   return vhost_chr_poll(file, dev, wait);
+}
+
+static const struct file_operations vhost_fops = {
+   .owner  = THIS_MODULE,
+   .open   = vhost_open,
+   .release= vhost_release,
+   .llseek = noop_llseek,
+   .unlocked_ioctl = vhost_ioctl,
+   .compat_ioctl   = compat_ptr_ioctl,
+   .read_iter  = vhost_read_iter,
+   .write_iter = vhost_write_iter,
+   .poll   = vhost_poll,
+};
+
+struct vhost *vhost_register(const struct vhost_ops *ops)
+{
+   struct vhost *vhost;
+   int ret;
+
+   vhost = kzalloc(sizeof(*vhost), GFP_KERNEL);
+   if (!vhost)
+   return ERR_PTR(-ENOMEM);
+
+   vhost->misc.minor = ops->minor;
+   vhost->misc.name = ops->name;
+   vhost->misc.fops = &vhost_fops;
+   vhost->ops = ops;
+
+   ret = misc_register(&vhost->misc);
+   if (ret) {
+   kfree(vhost);
+   return ERR_PTR(ret);
+   }
+
+   return vhost;
+}
+EXPORT_SYMBOL_GPL(vhost_register);
+
+void vhost_unregister(struct vhost *vhost)
+{
+   misc_deregister(&vhost->misc);
+   kfree(vhost);
+}
+EXPORT_SYMBOL_GPL(vhost_unregister);
+
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8f82b646d4af..8910d9e2a74e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1281,7 +1281,7 @@ static void handle_rx_net(struct vhost_work *work)
handle_rx(net);
 }
 
-static int vhost_net_open(struct inode *inode, struct file *f)
+static struct vhost_dev *vhost_net_open(struct vhost *vhost)
 {
struct vhost_net *n;
struct vhost_dev *dev;
@@ -1292,11 +1292,11 @@ static int vhost_net_open(struct inode *inode, struct 
file *f)
 
n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
if (!n)
-   return -ENO

[RFC PATCH 09/10] vdpa: add test driver for kernel buffers in vhost

2021-09-29 Thread Vincent Whitchurch
Add a driver which uses the kernel buffer support in vhost to allow
virtio-net and vhost-net to be run in a looback setup on the same
system.

While this feature could be useful on its own (for example for
development of the vhost/virtio drivers), this driver is primarily
intended to be used for testing the support for kernel buffers in vhost.

A selftest which uses this driver will be added.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vdpa/Kconfig  |   8 +
 drivers/vdpa/Makefile |   1 +
 drivers/vdpa/vhost_kernel_test/Makefile   |   2 +
 .../vhost_kernel_test/vhost_kernel_test.c | 575 ++
 4 files changed, 586 insertions(+)
 create mode 100644 drivers/vdpa/vhost_kernel_test/Makefile
 create mode 100644 drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c

diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 3d91982d8371..308e5f11d2a9 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -43,6 +43,14 @@ config VDPA_USER
  With VDUSE it is possible to emulate a vDPA Device
  in a userspace program.
 
+config VHOST_KERNEL_TEST
+   tristate "vhost kernel test driver"
+   depends on EVENTFD
+   select VHOST
+   select VHOST_KERNEL
+   help
+ Test driver for the vhost kernel-space buffer support.
+
 config IFCVF
tristate "Intel IFC VF vDPA driver"
depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index f02ebed33f19..4ba8a4b350c4 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_VDPA) += vdpa.o
 obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
 obj-$(CONFIG_VDPA_USER) += vdpa_user/
+obj-$(CONFIG_VHOST_KERNEL_TEST) += vhost_kernel_test/
 obj-$(CONFIG_IFCVF)+= ifcvf/
 obj-$(CONFIG_MLX5_VDPA) += mlx5/
 obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vhost_kernel_test/Makefile 
b/drivers/vdpa/vhost_kernel_test/Makefile
new file mode 100644
index ..7e0c7bdb3c0e
--- /dev/null
+++ b/drivers/vdpa/vhost_kernel_test/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_VHOST_KERNEL_TEST) += vhost_kernel_test.o
diff --git a/drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c 
b/drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c
new file mode 100644
index ..82364cd02667
--- /dev/null
+++ b/drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c
@@ -0,0 +1,575 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct vktest_vq {
+   struct vktest *vktest;
+   struct eventfd_ctx *kick;
+   struct eventfd_ctx *call;
+   u64 desc_addr;
+   u64 device_addr;
+   u64 driver_addr;
+   u32 num;
+   bool ready;
+   wait_queue_entry_t call_wait;
+   wait_queue_head_t *wqh;
+   poll_table call_pt;
+   struct vdpa_callback cb;
+   struct irq_work irq_work;
+};
+
+struct vktest {
+   struct vdpa_device vdpa;
+   struct mutex mutex;
+   struct vhost_dev *vhost;
+   struct virtio_net_config config;
+   struct vktest_vq vqs[2];
+   u8 status;
+};
+
+static struct vktest *vdpa_to_vktest(struct vdpa_device *vdpa)
+{
+   return container_of(vdpa, struct vktest, vdpa);
+}
+
+static int vktest_set_vq_address(struct vdpa_device *vdpa, u16 idx,
+u64 desc_area, u64 driver_area,
+u64 device_area)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+
+   vq->desc_addr = desc_area;
+   vq->driver_addr = driver_area;
+   vq->device_addr = device_area;
+
+   return 0;
+}
+
+static void vktest_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+
+   vq->num = num;
+}
+
+static void vktest_kick_vq(struct vdpa_device *vdpa, u16 idx)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+
+   if (vq->kick)
+   eventfd_signal(vq->kick, 1);
+}
+
+static void vktest_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
+struct vdpa_callback *cb)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+
+   vq->cb = *cb;
+}
+
+static void vktest_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
+{
+   struct vktest *vktest = vdpa_to_vktest(vdpa);
+   struct vktest_vq *vq = &vktest->vqs[idx];
+   struct vhost_dev *vhost = vktest->vhost;
+
+   if (!ready) {
+   vq->ready = false;
+   vhost_dev_stop_vq(vhost, idx);
+   

[RFC PATCH 07/10] vhost: add support for kernel control

2021-09-29 Thread Vincent Whitchurch
Add functions to allow vhost buffers to be placed in kernel space and
for the vhost driver to be controlled from a kernel driver after initial
setup by userspace.

The kernel control is only possible on new /dev/vhost-*-kernel devices,
and on these devices userspace cannot write to the iotlb, nor can it
control the placement and attributes of the virtqueues, nor start/stop
the virtqueue handling after the kernel starts using it.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/common.c | 201 +
 drivers/vhost/vhost.c  |  92 +--
 drivers/vhost/vhost.h  |   3 +
 include/linux/vhost.h  |  23 +
 4 files changed, 310 insertions(+), 9 deletions(-)
 create mode 100644 include/linux/vhost.h

diff --git a/drivers/vhost/common.c b/drivers/vhost/common.c
index a5722ad65e24..f9758920a33a 100644
--- a/drivers/vhost/common.c
+++ b/drivers/vhost/common.c
@@ -25,7 +25,9 @@
 struct vhost_ops;
 
 struct vhost {
+   char kernelname[128];
struct miscdevice misc;
+   struct miscdevice kernelmisc;
const struct vhost_ops *ops;
 };
 
@@ -46,6 +48,24 @@ static int vhost_open(struct inode *inode, struct file *file)
return 0;
 }
 
+static int vhost_kernel_open(struct inode *inode, struct file *file)
+{
+   struct miscdevice *misc = file->private_data;
+   struct vhost *vhost = container_of(misc, struct vhost, kernelmisc);
+   struct vhost_dev *dev;
+
+   dev = vhost->ops->open(vhost);
+   if (IS_ERR(dev))
+   return PTR_ERR(dev);
+
+   dev->vhost = vhost;
+   dev->file = file;
+   dev->kernel = true;
+   file->private_data = dev;
+
+   return 0;
+}
+
 static int vhost_release(struct inode *inode, struct file *file)
 {
struct vhost_dev *dev = file->private_data;
@@ -69,6 +89,46 @@ static long vhost_ioctl(struct file *file, unsigned int 
ioctl, unsigned long arg
return ret;
 }
 
+static long vhost_kernel_ioctl(struct file *file, unsigned int ioctl, unsigned 
long arg)
+{
+   struct vhost_dev *dev = file->private_data;
+   struct vhost *vhost = dev->vhost;
+   long ret;
+
+   /* Only the kernel is allowed to control virtqueue attributes */
+   switch (ioctl) {
+   case VHOST_SET_VRING_NUM:
+   case VHOST_SET_VRING_ADDR:
+   case VHOST_SET_VRING_BASE:
+   case VHOST_SET_VRING_ENDIAN:
+   case VHOST_SET_MEM_TABLE:
+   case VHOST_SET_LOG_BASE:
+   case VHOST_SET_LOG_FD:
+   return -EPERM;
+   }
+
+   mutex_lock(&dev->mutex);
+
+   /*
+* Userspace should perform all reqired setup on the vhost device
+* _before_ asking the kernel to start using it.
+*
+* Note that ->kernel_attached is never reset, if userspace wants to
+* attach again it should open the device again.
+*/
+   if (dev->kernel_attached) {
+   ret = -EPERM;
+   goto out_unlock;
+   }
+
+   ret = vhost->ops->ioctl(dev, ioctl, arg);
+
+out_unlock:
+   mutex_unlock(&dev->mutex);
+
+   return ret;
+}
+
 static ssize_t vhost_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
struct file *file = iocb->ki_filp;
@@ -105,6 +165,129 @@ static const struct file_operations vhost_fops = {
.poll   = vhost_poll,
 };
 
+static const struct file_operations vhost_kernel_fops = {
+   .owner  = THIS_MODULE,
+   .open   = vhost_kernel_open,
+   .release= vhost_release,
+   .llseek = noop_llseek,
+   .unlocked_ioctl = vhost_kernel_ioctl,
+   .compat_ioctl   = compat_ptr_ioctl,
+};
+
+static void vhost_dev_lock_vqs(struct vhost_dev *d)
+{
+   int i;
+
+   for (i = 0; i < d->nvqs; ++i)
+   mutex_lock_nested(&d->vqs[i]->mutex, i);
+}
+
+static void vhost_dev_unlock_vqs(struct vhost_dev *d)
+{
+   int i;
+
+   for (i = 0; i < d->nvqs; ++i)
+   mutex_unlock(&d->vqs[i]->mutex);
+}
+
+struct vhost_dev *vhost_dev_get(int fd)
+{
+   struct file *file;
+   struct vhost_dev *dev;
+   struct vhost_dev *ret;
+   int err;
+   int i;
+
+   file = fget(fd);
+   if (!file)
+   return ERR_PTR(-EBADF);
+
+   if (file->f_op != &vhost_kernel_fops) {
+   ret = ERR_PTR(-EINVAL);
+   goto err_fput;
+   }
+
+   dev = file->private_data;
+
+   mutex_lock(&dev->mutex);
+   vhost_dev_lock_vqs(dev);
+
+   err = vhost_dev_check_owner(dev);
+   if (err) {
+   ret = ERR_PTR(err);
+   goto err_unlock;
+   }
+
+   if (dev->kernel_attached) {
+   ret = ERR_PTR(-EBUSY);
+   goto err_unlock;
+   }
+
+   if (!dev->iotlb) {
+   ret = ERR_PTR(-EINVAL);
+   goto err_unlock;
+   }
+
+   for (i = 0; i < dev->nvqs; i++) {
+   struct vhost_virtqueue *vq = dev->vqs[i];
+
+   if (vq->private_da

[RFC PATCH 03/10] vhost: add iov wrapper

2021-09-29 Thread Vincent Whitchurch
In order to prepare for supporting buffers in kernel space, add a
vhost_iov struct to wrap the userspace iovec, add helper functions for
accessing this struct, and use these helpers from all vhost drivers.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/net.c   | 13 ++--
 drivers/vhost/scsi.c  | 30 +--
 drivers/vhost/test.c  |  2 +-
 drivers/vhost/vhost.c | 25 +++---
 drivers/vhost/vhost.h | 48 +--
 drivers/vhost/vsock.c |  8 
 6 files changed, 81 insertions(+), 45 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 28ef323882fb..8f82b646d4af 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -607,9 +607,9 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, 
struct iov_iter *iter,
size_t hdr_size, int out)
 {
/* Skip header. TODO: support TSO. */
-   size_t len = iov_length(vq->iov, out);
+   size_t len = vhost_iov_length(vq, vq->iov, out);
 
-   iov_iter_init(iter, WRITE, vq->iov, out, len);
+   vhost_iov_iter_init(vq, iter, WRITE, vq->iov, out, len);
iov_iter_advance(iter, hdr_size);
 
return iov_iter_count(iter);
@@ -1080,7 +1080,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
log += *log_num;
}
heads[headcount].id = cpu_to_vhost32(vq, d);
-   len = iov_length(vq->iov + seg, in);
+   len = vhost_iov_length(vq, vq->iov + seg, in);
heads[headcount].len = cpu_to_vhost32(vq, len);
datalen -= len;
++headcount;
@@ -1182,14 +1182,14 @@ static void handle_rx(struct vhost_net *net)
msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
/* On overrun, truncate and discard */
if (unlikely(headcount > UIO_MAXIOV)) {
-   iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
+   vhost_iov_iter_init(vq, &msg.msg_iter, READ, vq->iov, 
1, 1);
err = sock->ops->recvmsg(sock, &msg,
 1, MSG_DONTWAIT | MSG_TRUNC);
pr_debug("Discarded rx packet: len %zd\n", sock_len);
continue;
}
/* We don't need to be notified again. */
-   iov_iter_init(&msg.msg_iter, READ, vq->iov, in, vhost_len);
+   vhost_iov_iter_init(vq, &msg.msg_iter, READ, vq->iov, in, 
vhost_len);
fixup = msg.msg_iter;
if (unlikely((vhost_hlen))) {
/* We will supply the header ourselves
@@ -1212,8 +1212,7 @@ static void handle_rx(struct vhost_net *net)
if (unlikely(vhost_hlen)) {
if (copy_to_iter(&hdr, sizeof(hdr),
 &fixup) != sizeof(hdr)) {
-   vq_err(vq, "Unable to write vnet_hdr "
-  "at addr %p\n", vq->iov->iov_base);
+   vq_err(vq, "Unable to write vnet_hdr");
goto out;
}
} else {
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index bcf53685439d..22a372b52165 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -80,7 +80,7 @@ struct vhost_scsi_cmd {
struct scatterlist *tvc_prot_sgl;
struct page **tvc_upages;
/* Pointer to response header iovec */
-   struct iovec tvc_resp_iov;
+   struct vhost_iov tvc_resp_iov;
/* Pointer to vhost_scsi for our device */
struct vhost_scsi *tvc_vhost;
/* Pointer to vhost_virtqueue for the cmd */
@@ -208,7 +208,7 @@ struct vhost_scsi_tmf {
struct se_cmd se_cmd;
u8 scsi_resp;
struct vhost_scsi_inflight *inflight;
-   struct iovec resp_iov;
+   struct vhost_iov resp_iov;
int in_iovs;
int vq_desc;
 };
@@ -487,9 +487,9 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
return;
}
 
-   if ((vq->iov[out].iov_len != sizeof(struct virtio_scsi_event))) {
+   if (vhost_iov_len(vq, &vq->iov[out]) != sizeof(struct 
virtio_scsi_event)) {
vq_err(vq, "Expecting virtio_scsi_event, got %zu bytes\n",
-   vq->iov[out].iov_len);
+   vhost_iov_len(vq, &vq->iov[out]));
vs->vs_events_missed = true;
return;
}
@@ -499,7 +499,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
vs->vs_events_missed = false;
}
 
-   iov_iter_init(&iov_iter, READ, &vq->iov[out], in, sizeof(*event));
+   vhost_iov_iter_init(vq, &iov_iter, READ, &vq->iov[out], in, 
sizeof(*event));
 
ret = copy_to_iter(event, sizeof(*event),

[RFC PATCH 01/10] vhost: scsi: use copy_to_iter()

2021-09-29 Thread Vincent Whitchurch
Use copy_to_iter() instead of __copy_to_user() when accessing the virtio
buffers as a preparation for supporting kernel-space buffers in vhost.

It also makes the code consistent since the driver is already using
copy_to_iter() in the other places it accesses the queued buffers.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/scsi.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 532e204f2b1b..bcf53685439d 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -462,7 +462,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
 {
struct vhost_virtqueue *vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
struct virtio_scsi_event *event = &evt->event;
-   struct virtio_scsi_event __user *eventp;
+   struct iov_iter iov_iter;
unsigned out, in;
int head, ret;
 
@@ -499,9 +499,10 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
vs->vs_events_missed = false;
}
 
-   eventp = vq->iov[out].iov_base;
-   ret = __copy_to_user(eventp, event, sizeof(*event));
-   if (!ret)
+   iov_iter_init(&iov_iter, READ, &vq->iov[out], in, sizeof(*event));
+
+   ret = copy_to_iter(event, sizeof(*event), &iov_iter);
+   if (ret == sizeof(*event))
vhost_add_used_and_signal(&vs->dev, vq, head, 0);
else
vq_err(vq, "Faulted on vhost_scsi_send_event\n");
@@ -802,17 +803,18 @@ static void vhost_scsi_target_queue_cmd(struct 
vhost_scsi_cmd *cmd)
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
   struct vhost_virtqueue *vq,
-  int head, unsigned out)
+  int head, unsigned out, unsigned in)
 {
-   struct virtio_scsi_cmd_resp __user *resp;
struct virtio_scsi_cmd_resp rsp;
+   struct iov_iter iov_iter;
int ret;
 
+   iov_iter_init(&iov_iter, READ, &vq->iov[out], in, sizeof(rsp));
+
memset(&rsp, 0, sizeof(rsp));
rsp.response = VIRTIO_SCSI_S_BAD_TARGET;
-   resp = vq->iov[out].iov_base;
-   ret = __copy_to_user(resp, &rsp, sizeof(rsp));
-   if (!ret)
+   ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
+   if (ret == sizeof(rsp))
vhost_add_used_and_signal(&vs->dev, vq, head, 0);
else
pr_err("Faulted on virtio_scsi_cmd_resp\n");
@@ -1124,7 +1126,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
if (ret == -ENXIO)
break;
else if (ret == -EIO)
-   vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+   vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out, 
vc.in);
} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
 out:
mutex_unlock(&vq->mutex);
@@ -1347,7 +1349,7 @@ vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct 
vhost_virtqueue *vq)
if (ret == -ENXIO)
break;
else if (ret == -EIO)
-   vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+   vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out, 
vc.in);
} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
 out:
mutex_unlock(&vq->mutex);
-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH 10/10] selftests: add vhost_kernel tests

2021-09-29 Thread Vincent Whitchurch
Add a test which uses the vhost_kernel_test driver to test the vhost
kernel buffers support.

The test uses virtio-net and vhost-net and sets up a loopback network
and then tests that ping works between the interface.  It also checks
that unbinding/rebinding of devices and closing the involved file
descriptors in different sequences during active use works correctly.

Signed-off-by: Vincent Whitchurch 
---
 tools/testing/selftests/Makefile  |   1 +
 .../vhost_kernel/vhost_kernel_test.c  | 287 ++
 .../vhost_kernel/vhost_kernel_test.sh | 125 
 3 files changed, 413 insertions(+)
 create mode 100644 tools/testing/selftests/vhost_kernel/vhost_kernel_test.c
 create mode 100755 tools/testing/selftests/vhost_kernel/vhost_kernel_test.sh

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index c852eb40c4f7..14a8349e3dc1 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -73,6 +73,7 @@ TARGETS += tmpfs
 TARGETS += tpm2
 TARGETS += user
 TARGETS += vDSO
+TARGETS += vhost_kernel
 TARGETS += vm
 TARGETS += x86
 TARGETS += zram
diff --git a/tools/testing/selftests/vhost_kernel/vhost_kernel_test.c 
b/tools/testing/selftests/vhost_kernel/vhost_kernel_test.c
new file mode 100644
index ..b0f889bd2f72
--- /dev/null
+++ b/tools/testing/selftests/vhost_kernel/vhost_kernel_test.c
@@ -0,0 +1,287 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifndef VIRTIO_F_ACCESS_PLATFORM
+#define VIRTIO_F_ACCESS_PLATFORM 33
+#endif
+
+#ifndef VKTEST_ATTACH_VHOST
+#define VKTEST_ATTACH_VHOST _IOW(0xbf, 0x31, int)
+#endif
+
+static int vktest;
+static const int num_vqs = 2;
+
+static int tun_alloc(char *dev)
+{
+   int hdrsize = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   struct ifreq ifr = {
+   .ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR,
+   };
+   int fd, ret;
+
+   fd = open("/dev/net/tun", O_RDWR);
+   if (fd < 0)
+   err(1, "open /dev/net/tun");
+
+   strncpy(ifr.ifr_name, dev, IFNAMSIZ);
+
+   ret = ioctl(fd, TUNSETIFF, &ifr);
+   if (ret < 0)
+   err(1, "TUNSETIFF");
+
+   ret = ioctl(fd, TUNSETOFFLOAD,
+   TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | TUN_F_TSO_ECN);
+   if (ret < 0)
+   err(1, "TUNSETOFFLOAD");
+
+   ret = ioctl(fd, TUNSETVNETHDRSZ, &hdrsize);
+   if (ret < 0)
+   err(1, "TUNSETVNETHDRSZ");
+
+   strncpy(dev, ifr.ifr_name, IFNAMSIZ);
+
+   return fd;
+}
+
+static void handle_signal(int signum)
+{
+   if (signum == SIGUSR1)
+   close(vktest);
+}
+
+static void vhost_net_set_backend(int vhost)
+{
+   char if_name[IFNAMSIZ];
+   int tap_fd;
+
+   snprintf(if_name, IFNAMSIZ, "vhostkernel%d", 0);
+
+   tap_fd = tun_alloc(if_name);
+
+   for (int i = 0; i < num_vqs; i++) {
+   struct vhost_vring_file txbackend = {
+   .index = i,
+   .fd = tap_fd,
+   };
+   int ret;
+
+   ret = ioctl(vhost, VHOST_NET_SET_BACKEND, &txbackend);
+   if (ret < 0)
+   err(1, "VHOST_NET_SET_BACKEND");
+   }
+}
+
+static void prepare_vhost_vktest(int vhost, int vktest)
+{
+   uint64_t features = 1llu << VIRTIO_F_ACCESS_PLATFORM | 1llu << 
VIRTIO_F_VERSION_1;
+   int ret;
+
+   for (int i = 0; i < num_vqs; i++) {
+   int kickfd = eventfd(0, EFD_CLOEXEC);
+
+   if (kickfd < 0)
+   err(1, "eventfd");
+
+   struct vhost_vring_file kick = {
+   .index = i,
+   .fd = kickfd,
+   };
+
+   ret = ioctl(vktest, VHOST_SET_VRING_KICK, &kick);
+   if (ret < 0)
+   err(1, "VHOST_SET_VRING_KICK");
+
+   ret = ioctl(vhost, VHOST_SET_VRING_KICK, &kick);
+   if (ret < 0)
+   err(1, "VHOST_SET_VRING_KICK");
+   }
+
+   for (int i = 0; i < num_vqs; i++) {
+   int callfd = eventfd(0, EFD_CLOEXEC);
+
+   if (callfd < 0)
+   err(1, "eventfd");
+
+   struct vhost_vring_file call = {
+   .index = i,
+   .fd = callfd,
+   };
+
+   ret = ioctl(vktest, VHOST_SET_VRING_CALL, &call);
+   if (ret < 0)
+   err(1, "VHOST_SET_VRING_CALL");
+
+   ret = ioctl(vhost, VHOST_SET_VRING_CALL, &call);
+   if (ret < 0)
+   err(1, "VHOST_SET_VRING_CALL");
+   }
+
+   ret = ioctl(vhost, VHOST_SET_FEATURES, &features);
+

[RFC PATCH 04/10] vhost: add support for kernel buffers

2021-09-29 Thread Vincent Whitchurch
Handle the virtio rings and buffers being placed in kernel memory
instead of userspace memory.  The software IOTLB support is used to
ensure that only permitted regions are accessed.

Note that this patch does not provide a way to actually request that
kernel memory be used, an API for that is added in a later patch.

Signed-off-by: Vincent Whitchurch 
---
 drivers/vhost/Kconfig |   6 ++
 drivers/vhost/vhost.c | 222 +-
 drivers/vhost/vhost.h |  34 +++
 3 files changed, 257 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 587fbae06182..9e76ed485fe1 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -20,6 +20,12 @@ config VHOST
  This option is selected by any driver which needs to access
  the core of vhost.
 
+config VHOST_KERNEL
+   tristate
+   help
+ This option is selected by any driver which needs to access the
+ support for kernel buffers in vhost.
+
 menuconfig VHOST_MENU
bool "VHOST drivers"
default y
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ce81eee2a3fa..9354061ce75e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -49,6 +49,9 @@ enum {
 #define vhost_used_event(vq) ((__virtio16 __user 
*)&vq->user.avail->ring[vq->num])
 #define vhost_avail_event(vq) ((__virtio16 __user 
*)&vq->user.used->ring[vq->num])
 
+#define vhost_used_event_kern(vq) ((__virtio16 
*)&vq->kern.avail->ring[vq->num])
+#define vhost_avail_event_kern(vq) ((__virtio16 
*)&vq->kern.used->ring[vq->num])
+
 #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
 static void vhost_disable_cross_endian(struct vhost_virtqueue *vq)
 {
@@ -482,6 +485,7 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->iotlb = NULL;
dev->mm = NULL;
dev->worker = NULL;
+   dev->kernel = false;
dev->iov_limit = iov_limit;
dev->weight = weight;
dev->byte_weight = byte_weight;
@@ -785,6 +789,18 @@ static inline void __user *vhost_vq_meta_fetch(struct 
vhost_virtqueue *vq,
return (void __user *)(uintptr_t)(map->addr + addr - map->start);
 }
 
+static inline void *vhost_vq_meta_fetch_kern(struct vhost_virtqueue *vq,
+  u64 addr, unsigned int size,
+  int type)
+{
+   const struct vhost_iotlb_map *map = vq->meta_iotlb[type];
+
+   if (!map)
+   return NULL;
+
+   return (void *)(uintptr_t)(map->addr + addr - map->start);
+}
+
 /* Can we switch to this memory table? */
 /* Caller should have device mutex but not vq mutex */
 static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem,
@@ -849,6 +865,40 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, 
void __user *to,
return ret;
 }
 
+static int vhost_copy_to_kern(struct vhost_virtqueue *vq, void *to,
+ const void *from, unsigned int size)
+{
+   int ret;
+
+   /* This function should be called after iotlb
+* prefetch, which means we're sure that all vq
+* could be access through iotlb. So -EAGAIN should
+* not happen in this case.
+*/
+   struct iov_iter t;
+   void *kaddr = vhost_vq_meta_fetch_kern(vq,
+(u64)(uintptr_t)to, size,
+VHOST_ADDR_USED);
+
+   if (kaddr) {
+   memcpy(kaddr, from, size);
+   return 0;
+   }
+
+   ret = translate_desc(vq, (u64)(uintptr_t)to, size, vq->iotlb_iov,
+ARRAY_SIZE(vq->iotlb_iov),
+VHOST_ACCESS_WO);
+   if (ret < 0)
+   goto out;
+   iov_iter_kvec(&t, WRITE, &vq->iotlb_iov->kvec, ret, size);
+   ret = copy_to_iter(from, size, &t);
+   if (ret == size)
+   ret = 0;
+
+out:
+   return ret;
+}
+
 static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to,
void __user *from, unsigned size)
 {
@@ -889,6 +939,43 @@ static int vhost_copy_from_user(struct vhost_virtqueue 
*vq, void *to,
return ret;
 }
 
+static int vhost_copy_from_kern(struct vhost_virtqueue *vq, void *to,
+   void *from, unsigned int size)
+{
+   int ret;
+
+   /* This function should be called after iotlb
+* prefetch, which means we're sure that vq
+* could be access through iotlb. So -EAGAIN should
+* not happen in this case.
+*/
+   void *kaddr = vhost_vq_meta_fetch_kern(vq,
+(u64)(uintptr_t)from, size,
+VHOST_ADDR_DESC);
+   struct iov_iter f;
+
+   if (kaddr) {
+   memcpy(to, kaddr, size);
+   return 0;
+   }
+
+   ret = translate_desc(vq, (u64)(uintptr_t)from, size, vq->iotlb_iov,
+ARRAY_SIZE(vq->iotlb_iov),
+  

[RFC PATCH 00/10] Support kernel buffers in vhost

2021-09-29 Thread Vincent Whitchurch
vhost currently expects that the virtqueues and the queued buffers are
accessible from a userspace process' address space.  However, when using vhost
to communicate between two Linux systems running on two physical CPUs in an AMP
configuration (on a single SoC or via something like PCIe), it is undesirable
from a security perspective to make the entire kernel memory of the other Linux
system accessible from userspace.

To remedy this, this series adds support to vhost for placing the virtqueues
and queued buffers in kernel memory.  Since userspace should not be allowed to
control the placement and attributes of these virtqueues, a mechanism to do
this from kernel space is added.  A vDPA-based test driver is added which uses
this support to allow virtio-net and vhost-net to communicate with each other
on the same system without exposing kernel memory to userspace via /dev/mem or
similar.

This vDPA-based test driver is intended to be used as the basis for the
implementation of driver which will allow Linux-Linux communication between
physical CPUs on SoCs using virtio and vhost, for instance by using information
from the device tree to indicate the location of shared memory, and the mailbox
API to trigger interrupts between the CPUs.

This patchset is also available at:

 https://github.com/vwax/linux/tree/vhost/rfc

Vincent Whitchurch (10):
  vhost: scsi: use copy_to_iter()
  vhost: push virtqueue area pointers into a user struct
  vhost: add iov wrapper
  vhost: add support for kernel buffers
  vhost: extract common code for file_operations handling
  vhost: extract ioctl locking to common code
  vhost: add support for kernel control
  vhost: net: add support for kernel control
  vdpa: add test driver for kernel buffers in vhost
  selftests: add vhost_kernel tests

 drivers/vdpa/Kconfig  |   8 +
 drivers/vdpa/Makefile |   1 +
 drivers/vdpa/vhost_kernel_test/Makefile   |   2 +
 .../vhost_kernel_test/vhost_kernel_test.c | 575 ++
 drivers/vhost/Kconfig |   6 +
 drivers/vhost/Makefile|   3 +
 drivers/vhost/common.c| 340 +++
 drivers/vhost/net.c   | 212 ---
 drivers/vhost/scsi.c  |  50 +-
 drivers/vhost/test.c  |   2 +-
 drivers/vhost/vdpa.c  |   6 +-
 drivers/vhost/vhost.c | 437 ++---
 drivers/vhost/vhost.h | 109 +++-
 drivers/vhost/vsock.c |  95 +--
 include/linux/vhost.h |  23 +
 tools/testing/selftests/Makefile  |   1 +
 .../vhost_kernel/vhost_kernel_test.c  | 287 +
 .../vhost_kernel/vhost_kernel_test.sh | 125 
 18 files changed, 2020 insertions(+), 262 deletions(-)
 create mode 100644 drivers/vdpa/vhost_kernel_test/Makefile
 create mode 100644 drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c
 create mode 100644 drivers/vhost/common.c
 create mode 100644 include/linux/vhost.h
 create mode 100644 tools/testing/selftests/vhost_kernel/vhost_kernel_test.c
 create mode 100755 tools/testing/selftests/vhost_kernel/vhost_kernel_test.sh

-- 
2.28.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 2/8] x86/xen: simplify xen_oldmem_pfn_is_ram()

2021-09-29 Thread David Hildenbrand

On 29.09.21 16:22, Boris Ostrovsky wrote:


On 9/29/21 5:03 AM, David Hildenbrand wrote:

On 29.09.21 10:45, David Hildenbrand wrote:



Can we go one step further and do


@@ -20,24 +20,11 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn)
   struct xen_hvm_get_mem_type a = {
   .domid = DOMID_SELF,
   .pfn = pfn,
+   .mem_type = HVMMEM_ram_rw,
   };
-   int ram;
    -   if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a))
-   return -ENXIO;
-
-   switch (a.mem_type) {
-   case HVMMEM_mmio_dm:
-   ram = 0;
-   break;
-   case HVMMEM_ram_rw:
-   case HVMMEM_ram_ro:
-   default:
-   ram = 1;
-   break;
-   }
-
-   return ram;
+   HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a);
+   return a.mem_type != HVMMEM_mmio_dm;



I was actually thinking of asking you to add another patch with pr_warn_once() 
here (and print error code as well). This call failing is indication of 
something going quite wrong and it would be good to know about this.


Will include a patch in v2, thanks!


--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 2/8] x86/xen: simplify xen_oldmem_pfn_is_ram()

2021-09-29 Thread Boris Ostrovsky

On 9/29/21 5:03 AM, David Hildenbrand wrote:
> On 29.09.21 10:45, David Hildenbrand wrote:
>>>
>> Can we go one step further and do
>>
>>
>> @@ -20,24 +20,11 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn)
>>   struct xen_hvm_get_mem_type a = {
>>   .domid = DOMID_SELF,
>>   .pfn = pfn,
>> +   .mem_type = HVMMEM_ram_rw,
>>   };
>> -   int ram;
>>    -   if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a))
>> -   return -ENXIO;
>> -
>> -   switch (a.mem_type) {
>> -   case HVMMEM_mmio_dm:
>> -   ram = 0;
>> -   break;
>> -   case HVMMEM_ram_rw:
>> -   case HVMMEM_ram_ro:
>> -   default:
>> -   ram = 1;
>> -   break;
>> -   }
>> -
>> -   return ram;
>> +   HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a);
>> +   return a.mem_type != HVMMEM_mmio_dm;


I was actually thinking of asking you to add another patch with pr_warn_once() 
here (and print error code as well). This call failing is indication of 
something going quite wrong and it would be good to know about this.


>>    }
>>    #endif
>>
>>
>> Assuming that if HYPERVISOR_hvm_op() fails that
>> .mem_type is not set to HVMMEM_mmio_dm.


I don't think we can assume that argument described as OUT in the ABI will not 
be clobbered in case of error


>>
>
> Okay we can't, due to "__must_check" ...


so this is a good thing ;-)


-boris

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v1 6/6] x86: remove memory hotplug support on X86_32

2021-09-29 Thread David Hildenbrand
CONFIG_MEMORY_HOTPLUG was marked BROKEN over one year and we just
restricted it to 64 bit. Let's remove the unused x86 32bit implementation
and simplify the Kconfig.

Signed-off-by: David Hildenbrand 
---
 arch/x86/Kconfig  |  6 +++---
 arch/x86/mm/init_32.c | 31 ---
 2 files changed, 3 insertions(+), 34 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab83c22d274e..85f4762429f1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -62,7 +62,7 @@ config X86
select ARCH_32BIT_OFF_T if X86_32
select ARCH_CLOCKSOURCE_INIT
select ARCH_ENABLE_HUGEPAGE_MIGRATION if X86_64 && HUGETLB_PAGE && 
MIGRATION
-   select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64 || (X86_32 && HIGHMEM)
+   select ARCH_ENABLE_MEMORY_HOTPLUG if X86_64
select ARCH_ENABLE_MEMORY_HOTREMOVE if MEMORY_HOTPLUG
select ARCH_ENABLE_SPLIT_PMD_PTLOCK if (PGTABLE_LEVELS > 2) && (X86_64 
|| X86_PAE)
select ARCH_ENABLE_THP_MIGRATION if X86_64 && TRANSPARENT_HUGEPAGE
@@ -1615,7 +1615,7 @@ config ARCH_SELECT_MEMORY_MODEL
 
 config ARCH_MEMORY_PROBE
bool "Enable sysfs memory/probe interface"
-   depends on X86_64 && MEMORY_HOTPLUG
+   depends on MEMORY_HOTPLUG
help
  This option enables a sysfs memory/probe interface for testing.
  See Documentation/admin-guide/mm/memory-hotplug.rst for more 
information.
@@ -2395,7 +2395,7 @@ endmenu
 
 config ARCH_HAS_ADD_PAGES
def_bool y
-   depends on X86_64 && ARCH_ENABLE_MEMORY_HOTPLUG
+   depends on ARCH_ENABLE_MEMORY_HOTPLUG
 
 config ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
def_bool y
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index bd90b8fe81e4..5cd7ea6d645c 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -779,37 +779,6 @@ void __init mem_init(void)
test_wp_bit();
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
-int arch_add_memory(int nid, u64 start, u64 size,
-   struct mhp_params *params)
-{
-   unsigned long start_pfn = start >> PAGE_SHIFT;
-   unsigned long nr_pages = size >> PAGE_SHIFT;
-   int ret;
-
-   /*
-* The page tables were already mapped at boot so if the caller
-* requests a different mapping type then we must change all the
-* pages with __set_memory_prot().
-*/
-   if (params->pgprot.pgprot != PAGE_KERNEL.pgprot) {
-   ret = __set_memory_prot(start, nr_pages, params->pgprot);
-   if (ret)
-   return ret;
-   }
-
-   return __add_pages(nid, start_pfn, nr_pages, params);
-}
-
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
-{
-   unsigned long start_pfn = start >> PAGE_SHIFT;
-   unsigned long nr_pages = size >> PAGE_SHIFT;
-
-   __remove_pages(start_pfn, nr_pages, altmap);
-}
-#endif
-
 int kernel_set_to_readonly __read_mostly;
 
 static void mark_nxdata_nx(void)
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v1 5/6] mm/memory_hotplug: remove stale function declarations

2021-09-29 Thread David Hildenbrand
These functions no longer exist.

Signed-off-by: David Hildenbrand 
---
 include/linux/memory_hotplug.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index e5a867c950b2..be48e003a518 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -98,9 +98,6 @@ static inline void zone_seqlock_init(struct zone *zone)
 {
seqlock_init(&zone->span_seqlock);
 }
-extern int zone_grow_free_lists(struct zone *zone, unsigned long new_nr_pages);
-extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
-extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
 extern void adjust_present_page_count(struct page *page,
  struct memory_group *group,
  long nr_pages);
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v1 4/6] mm/memory_hotplug: remove HIGHMEM leftovers

2021-09-29 Thread David Hildenbrand
We don't support CONFIG_MEMORY_HOTPLUG on 32 bit and consequently not
HIGHMEM. Let's remove any leftover code -- including the unused
"status_change_nid_high" field part of the memory notifier.

Signed-off-by: David Hildenbrand 
---
 Documentation/core-api/memory-hotplug.rst |  3 --
 .../zh_CN/core-api/memory-hotplug.rst |  4 ---
 include/linux/memory.h|  1 -
 mm/memory_hotplug.c   | 36 ++-
 4 files changed, 2 insertions(+), 42 deletions(-)

diff --git a/Documentation/core-api/memory-hotplug.rst 
b/Documentation/core-api/memory-hotplug.rst
index de7467e48067..682259ee633a 100644
--- a/Documentation/core-api/memory-hotplug.rst
+++ b/Documentation/core-api/memory-hotplug.rst
@@ -57,7 +57,6 @@ The third argument (arg) passes a pointer of struct 
memory_notify::
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
-   int status_change_nid_high;
int status_change_nid;
}
 
@@ -65,8 +64,6 @@ The third argument (arg) passes a pointer of struct 
memory_notify::
 - nr_pages is # of pages of online/offline memory.
 - status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
   is (will be) set/clear, if this is -1, then nodemask status is not changed.
-- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
-  is (will be) set/clear, if this is -1, then nodemask status is not changed.
 - status_change_nid is set node id when N_MEMORY of nodemask is (will be)
   set/clear. It means a new(memoryless) node gets new memory by online and a
   node loses all memory. If this is -1, then nodemask status is not changed.
diff --git a/Documentation/translations/zh_CN/core-api/memory-hotplug.rst 
b/Documentation/translations/zh_CN/core-api/memory-hotplug.rst
index 161f4d2c18cc..9a204eb196f2 100644
--- a/Documentation/translations/zh_CN/core-api/memory-hotplug.rst
+++ b/Documentation/translations/zh_CN/core-api/memory-hotplug.rst
@@ -63,7 +63,6 @@ memory_notify结构体的指针::
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
-   int status_change_nid_high;
int status_change_nid;
}
 
@@ -74,9 +73,6 @@ memory_notify结构体的指针::
 - status_change_nid_normal是当nodemask的N_NORMAL_MEMORY被设置/清除时设置节
   点id,如果是-1,则nodemask状态不改变。
 
-- status_change_nid_high是当nodemask的N_HIGH_MEMORY被设置/清除时设置的节点
-  id,如果这个值为-1,那么nodemask状态不会改变。
-
 - status_change_nid是当nodemask的N_MEMORY被(将)设置/清除时设置的节点id。这
   意味着一个新的(没上线的)节点通过联机获得新的内存,而一个节点失去了所有的内
   存。如果这个值为-1,那么nodemask的状态就不会改变。
diff --git a/include/linux/memory.h b/include/linux/memory.h
index dd6e608c3e0b..c46ff374d48d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -96,7 +96,6 @@ struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
-   int status_change_nid_high;
int status_change_nid;
 };
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8d7b2b593a26..95c927c8bfb8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -21,7 +21,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -585,10 +584,6 @@ void generic_online_page(struct page *page, unsigned int 
order)
debug_pagealloc_map_pages(page, 1 << order);
__free_pages_core(page, order);
totalram_pages_add(1UL << order);
-#ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page))
-   totalhigh_pages_add(1UL << order);
-#endif
 }
 EXPORT_SYMBOL_GPL(generic_online_page);
 
@@ -625,16 +620,11 @@ static void node_states_check_changes_online(unsigned 
long nr_pages,
 
arg->status_change_nid = NUMA_NO_NODE;
arg->status_change_nid_normal = NUMA_NO_NODE;
-   arg->status_change_nid_high = NUMA_NO_NODE;
 
if (!node_state(nid, N_MEMORY))
arg->status_change_nid = nid;
if (zone_idx(zone) <= ZONE_NORMAL && !node_state(nid, N_NORMAL_MEMORY))
arg->status_change_nid_normal = nid;
-#ifdef CONFIG_HIGHMEM
-   if (zone_idx(zone) <= ZONE_HIGHMEM && !node_state(nid, N_HIGH_MEMORY))
-   arg->status_change_nid_high = nid;
-#endif
 }
 
 static void node_states_set_node(int node, struct memory_notify *arg)
@@ -642,9 +632,6 @@ static void node_states_set_node(int node, struct 
memory_notify *arg)
if (arg->status_change_nid_normal >= 0)
node_set_state(node, N_NORMAL_MEMORY);
 
-   if (arg->status_change_nid_high >= 0)
-   node_set_state(node, N_HIGH_MEMORY);
-
if (arg->status_change_nid >= 0)
node_set_state(node, N_MEMORY);
 }
@@ -1801,7 +1788,6 @@ static void node_states_check_changes_offline(unsigned 
long nr_pages,
 
arg->status_change_nid = NUMA_NO_NODE;
arg->status_change_nid_normal = NUMA_NO_NODE;
-   arg->status_change_

[PATCH v1 3/6] mm/memory_hotplug: restrict CONFIG_MEMORY_HOTPLUG to 64 bit

2021-09-29 Thread David Hildenbrand
32 bit support is broken in various ways: for example, we can online
memory that should actually go to ZONE_HIGHMEM to ZONE_MOVABLE or in
some cases even to one of the other kernel zones.

We marked it BROKEN in commit b59d02ed0869 ("mm/memory_hotplug: disable the
functionality for 32b") almost one year ago. According to that commit
it might be broken at least since 2017. Further, there is hardly a sane use
case nowadays.

Let's just depend completely on 64bit, dropping the "BROKEN" dependency to
make clear that we are not going to support it again. Next, we'll remove
some HIGHMEM leftovers from memory hotplug code to clean up.

Signed-off-by: David Hildenbrand 
---
 mm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index ea8762cd8e1e..88273dd5c6d6 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -125,7 +125,7 @@ config MEMORY_HOTPLUG
select MEMORY_ISOLATION
depends on SPARSEMEM
depends on ARCH_ENABLE_MEMORY_HOTPLUG
-   depends on 64BIT || BROKEN
+   depends on 64BIT
select NUMA_KEEP_MEMINFO if NUMA
 
 config MEMORY_HOTPLUG_DEFAULT_ONLINE
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v1 2/6] mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE

2021-09-29 Thread David Hildenbrand
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for
CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use
CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE.

Signed-off-by: David Hildenbrand 
---
 arch/powerpc/include/asm/machdep.h|  2 +-
 arch/powerpc/kernel/setup_64.c|  2 +-
 arch/powerpc/platforms/powernv/setup.c|  4 ++--
 arch/powerpc/platforms/pseries/setup.c|  2 +-
 drivers/base/Makefile |  2 +-
 drivers/base/node.c   |  9 -
 drivers/virtio/Kconfig|  2 +-
 include/linux/memory.h| 18 +++---
 include/linux/node.h  |  4 ++--
 lib/Kconfig.debug |  2 +-
 mm/Kconfig|  4 
 mm/memory_hotplug.c   |  2 --
 tools/testing/selftests/memory-hotplug/config |  1 -
 13 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 764f2732a821..d8a2ca007082 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -32,7 +32,7 @@ struct machdep_calls {
void(*iommu_save)(void);
void(*iommu_restore)(void);
 #endif
-#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+#ifdef CONFIG_MEMORY_HOTPLUG
unsigned long   (*memory_block_size)(void);
 #endif
 #endif /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index eaa79a0996d1..21f15d82f062 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -912,7 +912,7 @@ void __init setup_per_cpu_areas(void)
 }
 #endif
 
-#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+#ifdef CONFIG_MEMORY_HOTPLUG
 unsigned long memory_block_size_bytes(void)
 {
if (ppc_md.memory_block_size)
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index a8db3f153063..ad56a54ac9c5 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -440,7 +440,7 @@ static void pnv_kexec_cpu_down(int crash_shutdown, int 
secondary)
 }
 #endif /* CONFIG_KEXEC_CORE */
 
-#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+#ifdef CONFIG_MEMORY_HOTPLUG
 static unsigned long pnv_memory_block_size(void)
 {
/*
@@ -553,7 +553,7 @@ define_machine(powernv) {
 #ifdef CONFIG_KEXEC_CORE
.kexec_cpu_down = pnv_kexec_cpu_down,
 #endif
-#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+#ifdef CONFIG_MEMORY_HOTPLUG
.memory_block_size  = pnv_memory_block_size,
 #endif
 };
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index f79126f16258..d29f6f1f7f37 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -1089,7 +1089,7 @@ define_machine(pseries) {
.machine_kexec  = pSeries_machine_kexec,
.kexec_cpu_down = pseries_kexec_cpu_down,
 #endif
-#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+#ifdef CONFIG_MEMORY_HOTPLUG
.memory_block_size  = pseries_memory_block_size,
 #endif
 };
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index ef8e44a7d288..02f7f1358e86 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -13,7 +13,7 @@ obj-y += power/
 obj-$(CONFIG_ISA_BUS_API)  += isa.o
 obj-y  += firmware_loader/
 obj-$(CONFIG_NUMA) += node.o
-obj-$(CONFIG_MEMORY_HOTPLUG_SPARSE) += memory.o
+obj-$(CONFIG_MEMORY_HOTPLUG) += memory.o
 ifeq ($(CONFIG_SYSFS),y)
 obj-$(CONFIG_MODULES)  += module.o
 endif
diff --git a/drivers/base/node.c b/drivers/base/node.c
index c56d34f8158f..b5a4ba18f9f9 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -629,7 +629,7 @@ static void node_device_release(struct device *dev)
 {
struct node *node = to_node(dev);
 
-#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HUGETLBFS)
+#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS)
/*
 * We schedule the work only when a memory section is
 * onlined/offlined on this node. When we come here,
@@ -782,7 +782,7 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned 
int nid)
return 0;
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
+#ifdef CONFIG_MEMORY_HOTPLUG
 static int __ref get_nid_for_pfn(unsigned long pfn)
 {
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
@@ -958,10 +958,9 @@ static int node_memory_callback(struct notifier_block 
*self,
return NOTIFY_OK;
 }
 #endif /* CONFIG_HUGETLBFS */
-#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
+#endif /* CONFIG_MEMORY_HOTPLUG */
 
-#if !defined(CONFIG_MEMORY_HOTPLUG_SPARSE) || \
-!defined(CONFIG_HUGETLBFS)
+#if !defined(CONFIG_MEMORY_HOTPLUG) || !defined(CONFIG_HUGETLBFS)
 static inline int node_memory_callback(struct notifier_block *self,

[PATCH v1 1/6] mm/memory_hotplug: remove CONFIG_X86_64_ACPI_NUMA dependency from CONFIG_MEMORY_HOTPLUG

2021-09-29 Thread David Hildenbrand
SPARSEMEM is the only possible memory model for x86-64, FLATMEM is not
possible:
config ARCH_FLATMEM_ENABLE
def_bool y
depends on X86_32 && !NUMA

And X86_64_ACPI_NUMA (obviously) only supports x86-64:
config X86_64_ACPI_NUMA
def_bool y
depends on X86_64 && NUMA && ACPI && PCI

Let's just remove the CONFIG_X86_64_ACPI_NUMA dependency, as it does no
longer make sense.

Signed-off-by: David Hildenbrand 
---
 mm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index d16ba9249bc5..b7fb3f0b485e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -123,7 +123,7 @@ config ARCH_ENABLE_MEMORY_HOTPLUG
 config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
select MEMORY_ISOLATION
-   depends on SPARSEMEM || X86_64_ACPI_NUMA
+   depends on SPARSEMEM
depends on ARCH_ENABLE_MEMORY_HOTPLUG
depends on 64BIT || BROKEN
select NUMA_KEEP_MEMINFO if NUMA
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v1 0/6] mm/memory_hotplug: Kconfig and 32 bit cleanups

2021-09-29 Thread David Hildenbrand
Some cleanups around CONFIG_MEMORY_HOTPLUG, including removing 32 bit
leftovers of memory hotplug support.

Compile-tested on various architectures, quickly tested memory hotplug
on x86-64.

Cc: Andrew Morton 
Cc: Jonathan Corbet 
Cc: Alex Shi 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Greg Kroah-Hartman 
Cc: "Rafael J. Wysocki" 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Shuah Khan 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Mike Rapoport 
Cc: x...@kernel.org
Cc: linux...@kvack.org
Cc: linux-kselft...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org

David Hildenbrand (6):
  mm/memory_hotplug: remove CONFIG_X86_64_ACPI_NUMA dependency from
CONFIG_MEMORY_HOTPLUG
  mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE
  mm/memory_hotplug: restrict CONFIG_MEMORY_HOTPLUG to 64 bit
  mm/memory_hotplug: remove HIGHMEM leftovers
  mm/memory_hotplug: remove stale function declarations
  x86: remove memory hotplug support on X86_32

 Documentation/core-api/memory-hotplug.rst |  3 --
 .../zh_CN/core-api/memory-hotplug.rst |  4 --
 arch/powerpc/include/asm/machdep.h|  2 +-
 arch/powerpc/kernel/setup_64.c|  2 +-
 arch/powerpc/platforms/powernv/setup.c|  4 +-
 arch/powerpc/platforms/pseries/setup.c|  2 +-
 arch/x86/Kconfig  |  6 +--
 arch/x86/mm/init_32.c | 31 ---
 drivers/base/Makefile |  2 +-
 drivers/base/node.c   |  9 ++---
 drivers/virtio/Kconfig|  2 +-
 include/linux/memory.h| 19 --
 include/linux/memory_hotplug.h|  3 --
 include/linux/node.h  |  4 +-
 lib/Kconfig.debug |  2 +-
 mm/Kconfig|  8 +---
 mm/memory_hotplug.c   | 38 +--
 tools/testing/selftests/memory-hotplug/config |  1 -
 18 files changed, 28 insertions(+), 114 deletions(-)


base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29
-- 
2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [bug report] vdpa_sim_blk: implement ramdisk behaviour

2021-09-29 Thread Dan Carpenter
On Wed, Sep 29, 2021 at 02:07:12PM +0200, Stefano Garzarella wrote:
> On Wed, Sep 29, 2021 at 02:46:52PM +0300, Dan Carpenter wrote:
> > On Wed, Sep 29, 2021 at 02:37:42PM +0300, Dan Carpenter wrote:
> > > 89 /* The last byte is the status and we checked if the last 
> > > iov has
> > > 90  * enough room for it.
> > > 91  */
> > > 92 to_push = vringh_kiov_length(&vq->in_iov) - 1;
> > > 
> > > Are you positive that vringh_kiov_length() cannot be zero?  I looked at
> > > the range_check() and there is no check for "if (*len == 0)".
> > > 
> > > 93
> > > 94 to_pull = vringh_kiov_length(&vq->out_iov);
> > > 95
> > > 96 bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, 
> > > &hdr,
> > > 97   sizeof(hdr));
> > > 98 if (bytes != sizeof(hdr)) {
> > > 99 dev_err(&vdpasim->vdpa.dev, "request out header 
> > > too short\n");
> > > 100 return false;
> > > 101 }
> > > 102
> > > 103 to_pull -= bytes;
> > > 
> > > The same "bytes" is used for both to_pull and to_push.  In this
> > > assignment it would only be used for the default case which prints an
> > > error message.
> > > 
> > 
> > Sorry, no.  This part is wrong.  "bytes" is not used for "to_push"
> > either here or below.  But I still am not sure "*len == 0" or how we
> 
> At line 84 we check that the last `in_iov` has at least one byte, so
> vringh_kiov_length(&vq->in_iov) cannot be zero.
> It will return the sum of all lengths, so at least 1.
> 
> Maybe better to add a comment.
> 
> > know that "to_push >= VIRTIO_BLK_ID_BYTES".
> 
> vringh_iov_push_iotlb() will push at least the bytes available in `in_iov`,
> if these are less, it will copy less bytes of VIRTIO_BLK_ID_BYTES.
> 
> Maybe here it would be better to add a check because the driver isn't
> following the specification.
> 
> And I'd avoid the subtraction highlighted by Smatch static checker.
> 
> Thanks for reporting. I'll send patches to fix these issues.

Nothing to fix, really.  I've looked at what you've explained and it's
all true so the code is fine as-is.  Thanks so much.

regards,
dan carpenter

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


WorldCist'22 - 10th World Conference on Information Systems and Technologies | Montenegro

2021-09-29 Thread IT
* Conference listed in CORE Ranking

** Google Scholar H5-Index = 23

*** Best papers selected for SCI/SSCI journals



--

WorldCIST'22 - 10th World Conference on Information Systems and Technologies

12-14 April 2022, Budva, Montenegro

http://worldcist.org 

---


The WorldCist'22 - 10th World Conference on Information Systems and 
Technologies, to be held in Budva, Montenegro, 12-14 April 2022, is a global 
forum for researchers and practitioners to present and discuss the most recent 
innovations, trends, results, experiences and concerns in the several 
perspectives of Information Systems and Technologies.

We are pleased to invite you to submit your papers to WorldCist'22. All 
submissions will be reviewed on the basis of relevance, originality, importance 
and clarity.



TOPICS

Submitted papers should be related with one or more of the main themes proposed 
for the Conference:

A) Information and Knowledge Management (IKM);

B) Organizational Models and Information Systems (OMIS);

C) Software and Systems Modeling (SSM);

D) Software Systems, Architectures, Applications and Tools (SSAAT);

E) Multimedia Systems and Applications (MSA);

F) Computer Networks, Mobility and Pervasive Systems (CNMPS);

G) Intelligent and Decision Support Systems (IDSS);

H) Big Data Analytics and Applications (BDAA);

I) Human-Computer Interaction (HCI);

J) Ethics, Computers and Security (ECS)

K) Health Informatics (HIS);

L) Information Technologies in Education (ITE);

M) Technologies for Biomedical Applications (TBA)

N) Information Technologies in Radiocommunications (ITR);



TYPES of SUBMISSIONS and DECISIONS

Four types of papers can be submitted:

Full paper: Finished or consolidated R&D works, to be included in one of the 
Conference themes. These papers are assigned a 10-page limit.

Short paper: Ongoing works with relevant preliminary results, open to 
discussion. These papers are assigned a 7-page limit.

Poster paper: Initial work with relevant ideas, open to discussion. These 
papers are assigned to a 4-page limit.

Company paper: Companies' papers that show practical experience, R & D, tools, 
etc., focused on some topics of the conference. These papers are assigned to a 
4-page limit.

Submitted papers must comply with the format of Advances in Intelligent Systems 
and Computing Series (see Instructions for Authors at Springer Website), be 
written in English, must not have been published before, not be under review 
for any other conference or publication and not include any information leading 
to the authors’ identification. Therefore, the authors’ names, affiliations and 
bibliographic references should not be included in the version for evaluation 
by the Program Committee. This information should only be included in the 
camera-ready version, saved in Word or Latex format and also in PDF format. 
These files must be accompanied by the Consent to Publish form filled out, in a 
ZIP file, and uploaded at the conference management system.

All papers will be subjected to a “double-blind review” by at least two members 
of the Program Committee.

Based on Program Committee evaluation, a paper can be rejected or accepted by 
the Conference Chairs. In the later case, it can be accepted as the type 
originally submitted or as another type. Thus, full papers can be accepted as 
short papers or poster papers only. Similarly, short papers can be accepted as 
poster papers only.

Poster papers and Company papers are not published in the Conference 
Proceedings, being only presented and discussed. The authors of accepted poster 
papers should build and print a poster to be exhibited during the Conference. 
This poster must follow an A1 or A2 vertical format. The Conference includes 
Work Sessions where these posters are presented and orally discussed, with a 7 
minute limit per poster.

The authors of accepted Full papers will have 15 minutes to present their work 
in a Conference Work Session; approximately 5 minutes of discussion will follow 
each presentation. The authors of accepted Short papers and Company papers will 
have 11 minutes to present their work in a Conference Work Session; 
approximately 4 minutes of discussion will follow each presentation.



PUBLICATION & INDEXING

To ensure that a full paper or short paper is published, poster paper or 
company paper is presented, at least one of the authors must be fully 
registered by the 8nd of January 2022, and the paper must comply with the 
suggested layout and page-limit. Additionally, all recommended changes must be 
addressed by the authors before they submit the camera-ready version.

No more than one paper per registration will be published. An extra fee must be 
paid for publication of additional papers, with a ma

Re: [PATCH v3 7/7] eni_vdpa: add vDPA driver for Alibaba ENI

2021-09-29 Thread kernel test robot
Hi Wu,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.15-rc3 next-20210922]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Wu-Zongyong/virtio-pci-introduce-legacy-device-module/20210929-115033
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
a4e6f95a891ac08bd09d62e3e6dae239b150f4c1
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/86ed35603fb93a4bc8c8929ff89edd5f6556ca44
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Wu-Zongyong/virtio-pci-introduce-legacy-device-module/20210929-115033
git checkout 86ed35603fb93a4bc8c8929ff89edd5f6556ca44
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross 
ARCH=xtensa 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/vdpa/alibaba/eni_vdpa.c:446:13: error: 'eni_vdpa_free_irq_vectors' 
>> defined but not used [-Werror=unused-function]
 446 | static void eni_vdpa_free_irq_vectors(void *data)
 | ^
>> drivers/vdpa/alibaba/eni_vdpa.c:423:12: error: 'eni_vdpa_get_num_queues' 
>> defined but not used [-Werror=unused-function]
 423 | static u16 eni_vdpa_get_num_queues(struct eni_vdpa *eni_vdpa)
 |^~~
>> drivers/vdpa/alibaba/eni_vdpa.c:396:37: error: 'eni_vdpa_ops' defined but 
>> not used [-Werror=unused-const-variable=]
 396 | static const struct vdpa_config_ops eni_vdpa_ops = {
 | ^~~~
   cc1: all warnings being treated as errors


vim +/eni_vdpa_free_irq_vectors +446 drivers/vdpa/alibaba/eni_vdpa.c

   395  
 > 396  static const struct vdpa_config_ops eni_vdpa_ops = {
   397  .get_features   = eni_vdpa_get_features,
   398  .set_features   = eni_vdpa_set_features,
   399  .get_status = eni_vdpa_get_status,
   400  .set_status = eni_vdpa_set_status,
   401  .reset  = eni_vdpa_reset,
   402  .get_vq_num_max = eni_vdpa_get_vq_num_max,
   403  .get_vq_num_min = eni_vdpa_get_vq_num_min,
   404  .get_vq_state   = eni_vdpa_get_vq_state,
   405  .set_vq_state   = eni_vdpa_set_vq_state,
   406  .set_vq_cb  = eni_vdpa_set_vq_cb,
   407  .set_vq_ready   = eni_vdpa_set_vq_ready,
   408  .get_vq_ready   = eni_vdpa_get_vq_ready,
   409  .set_vq_num = eni_vdpa_set_vq_num,
   410  .set_vq_address = eni_vdpa_set_vq_address,
   411  .kick_vq= eni_vdpa_kick_vq,
   412  .get_device_id  = eni_vdpa_get_device_id,
   413  .get_vendor_id  = eni_vdpa_get_vendor_id,
   414  .get_vq_align   = eni_vdpa_get_vq_align,
   415  .get_config_size = eni_vdpa_get_config_size,
   416  .get_config = eni_vdpa_get_config,
   417  .set_config = eni_vdpa_set_config,
   418  .set_config_cb  = eni_vdpa_set_config_cb,
   419  .get_vq_irq = eni_vdpa_get_vq_irq,
   420  };
   421  
   422  
 > 423  static u16 eni_vdpa_get_num_queues(struct eni_vdpa *eni_vdpa)
   424  {
   425  struct virtio_pci_legacy_device *ldev = &eni_vdpa->ldev;
   426  u32 features = vp_legacy_get_features(ldev);
   427  u16 num = 2;
   428  
   429  if (features & BIT_ULL(VIRTIO_NET_F_MQ)) {
   430  __virtio16 max_virtqueue_pairs;
   431  
   432  eni_vdpa_get_config(&eni_vdpa->vdpa,
   433  offsetof(struct virtio_net_config, 
max_virtqueue_pairs),
   434  &max_virtqueue_pairs,
   435  sizeof(max_virtqueue_pairs));
   436  num = 2 * 
__virtio16_to_cpu(virtio_legacy_is_little_endian(),
   437  max_virtqueue_pairs);
   438  }
   439  
   440  if (features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
   441  num += 1;
   442  
   443  return num;
   444  }
   445  
 > 446  static void eni_vdpa_free_irq_vectors(void *data)
   447  {
   448  pci_free_irq_vectors(data);
   449  }
   450  

---
0-DAY CI Kernel Test Service, Inte

Re: [bug report] vdpa_sim_blk: implement ramdisk behaviour

2021-09-29 Thread Stefano Garzarella

On Wed, Sep 29, 2021 at 02:46:52PM +0300, Dan Carpenter wrote:

On Wed, Sep 29, 2021 at 02:37:42PM +0300, Dan Carpenter wrote:

89 /* The last byte is the status and we checked if the last iov has
90  * enough room for it.
91  */
92 to_push = vringh_kiov_length(&vq->in_iov) - 1;

Are you positive that vringh_kiov_length() cannot be zero?  I looked at
the range_check() and there is no check for "if (*len == 0)".

93
94 to_pull = vringh_kiov_length(&vq->out_iov);
95
96 bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, &hdr,
97   sizeof(hdr));
98 if (bytes != sizeof(hdr)) {
99 dev_err(&vdpasim->vdpa.dev, "request out header too 
short\n");
100 return false;
101 }
102
103 to_pull -= bytes;

The same "bytes" is used for both to_pull and to_push.  In this
assignment it would only be used for the default case which prints an
error message.



Sorry, no.  This part is wrong.  "bytes" is not used for "to_push"
either here or below.  But I still am not sure "*len == 0" or how we


At line 84 we check that the last `in_iov` has at least one byte, so 
vringh_kiov_length(&vq->in_iov) cannot be zero.

It will return the sum of all lengths, so at least 1.

Maybe better to add a comment.


know that "to_push >= VIRTIO_BLK_ID_BYTES".


vringh_iov_push_iotlb() will push at least the bytes available in 
`in_iov`, if these are less, it will copy less bytes of 
VIRTIO_BLK_ID_BYTES.


Maybe here it would be better to add a check because the driver isn't 
following the specification.


And I'd avoid the subtraction highlighted by Smatch static checker.

Thanks for reporting. I'll send patches to fix these issues.

Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [bug report] vdpa_sim_blk: implement ramdisk behaviour

2021-09-29 Thread Dan Carpenter
On Wed, Sep 29, 2021 at 02:37:42PM +0300, Dan Carpenter wrote:
> 89 /* The last byte is the status and we checked if the last iov 
> has
> 90  * enough room for it.
> 91  */
> 92 to_push = vringh_kiov_length(&vq->in_iov) - 1;
> 
> Are you positive that vringh_kiov_length() cannot be zero?  I looked at
> the range_check() and there is no check for "if (*len == 0)".
> 
> 93 
> 94 to_pull = vringh_kiov_length(&vq->out_iov);
> 95 
> 96 bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, &hdr,
> 97   sizeof(hdr));
> 98 if (bytes != sizeof(hdr)) {
> 99 dev_err(&vdpasim->vdpa.dev, "request out header too 
> short\n");
> 100 return false;
> 101 }
> 102 
> 103 to_pull -= bytes;
> 
> The same "bytes" is used for both to_pull and to_push.  In this
> assignment it would only be used for the default case which prints an
> error message.
> 

Sorry, no.  This part is wrong.  "bytes" is not used for "to_push"
either here or below.  But I still am not sure "*len == 0" or how we
know that "to_push >= VIRTIO_BLK_ID_BYTES".

regards,
dan carpenter

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[bug report] vdpa_sim_blk: implement ramdisk behaviour

2021-09-29 Thread Dan Carpenter
Hello Stefano Garzarella,

The patch 7d189f617f83: "vdpa_sim_blk: implement ramdisk behaviour"
from Mar 15, 2021, leads to the following
Smatch static checker warning:

drivers/vdpa/vdpa_sim/vdpa_sim_blk.c:179 vdpasim_blk_handle_req()
warn: unsigned subtraction: 'to_push - pushed' use '!='

drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
61 static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
62struct vdpasim_virtqueue *vq)
63 {
64 size_t pushed = 0, to_pull, to_push;
65 struct virtio_blk_outhdr hdr;
66 ssize_t bytes;
67 loff_t offset;
68 u64 sector;
69 u8 status;
70 u32 type;
71 int ret;
72 
73 ret = vringh_getdesc_iotlb(&vq->vring, &vq->out_iov, &vq->in_iov,
74&vq->head, GFP_ATOMIC);
75 if (ret != 1)
76 return false;
77 
78 if (vq->out_iov.used < 1 || vq->in_iov.used < 1) {
79 dev_err(&vdpasim->vdpa.dev, "missing headers - out_iov: 
%u in_iov %u\n",
80 vq->out_iov.used, vq->in_iov.used);
81 return false;
82 }
83 
84 if (vq->in_iov.iov[vq->in_iov.used - 1].iov_len < 1) {
85 dev_err(&vdpasim->vdpa.dev, "request in header too 
short\n");
86 return false;
87 }
88 
89 /* The last byte is the status and we checked if the last iov has
90  * enough room for it.
91  */
92 to_push = vringh_kiov_length(&vq->in_iov) - 1;

Are you positive that vringh_kiov_length() cannot be zero?  I looked at
the range_check() and there is no check for "if (*len == 0)".

93 
94 to_pull = vringh_kiov_length(&vq->out_iov);
95 
96 bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, &hdr,
97   sizeof(hdr));
98 if (bytes != sizeof(hdr)) {
99 dev_err(&vdpasim->vdpa.dev, "request out header too 
short\n");
100 return false;
101 }
102 
103 to_pull -= bytes;

The same "bytes" is used for both to_pull and to_push.  In this
assignment it would only be used for the default case which prints an
error message.

104 
105 type = vdpasim32_to_cpu(vdpasim, hdr.type);
106 sector = vdpasim64_to_cpu(vdpasim, hdr.sector);
107 offset = sector << SECTOR_SHIFT;
108 status = VIRTIO_BLK_S_OK;
109 
110 switch (type) {
111 case VIRTIO_BLK_T_IN:
112 if (!vdpasim_blk_check_range(sector, to_push)) {
113 dev_err(&vdpasim->vdpa.dev,
114 "reading over the capacity - offset: 
0x%llx len: 0x%zx\n",
115 offset, to_push);
116 status = VIRTIO_BLK_S_IOERR;
117 break;
118 }
119 
120 bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov,
121   vdpasim->buffer + offset,
122   to_push);
123 if (bytes < 0) {
124 dev_err(&vdpasim->vdpa.dev,
125 "vringh_iov_push_iotlb() error: %zd 
offset: 0x%llx len: 0x%zx\n",
126 bytes, offset, to_push);
127 status = VIRTIO_BLK_S_IOERR;
128 break;
129 }
130 
131 pushed += bytes;
132 break;
133 
134 case VIRTIO_BLK_T_OUT:
135 if (!vdpasim_blk_check_range(sector, to_pull)) {
136 dev_err(&vdpasim->vdpa.dev,
137 "writing over the capacity - offset: 
0x%llx len: 0x%zx\n",
138 offset, to_pull);
139 status = VIRTIO_BLK_S_IOERR;
140 break;
141 }
142 
143 bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov,
144   vdpasim->buffer + offset,
145   to_pull);

Here "bytes" is to_pull again.

146 if (bytes < 0) {
147 dev_err(&vdpasim->vdpa.dev,
148 "vringh_iov_pull_iotlb() error: %zd 
offset: 0x%llx len: 0x%zx\n",
149 bytes, offset, to_pull);
150 status = VIRTIO_BLK_S_IOERR;
151 break;
152 }
153

Re: [PATCH v1 2/8] x86/xen: simplify xen_oldmem_pfn_is_ram()

2021-09-29 Thread David Hildenbrand

On 29.09.21 10:45, David Hildenbrand wrote:


How about

      return a.mem_type != HVMMEM_mmio_dm;



Ha, how could I have missed that :)



Result should be promoted to int and this has added benefit of not requiring 
changes in patch 4.



Can we go one step further and do


@@ -20,24 +20,11 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn)
  struct xen_hvm_get_mem_type a = {
  .domid = DOMID_SELF,
  .pfn = pfn,
+   .mem_type = HVMMEM_ram_rw,
  };
-   int ram;
   
-   if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a))

-   return -ENXIO;
-
-   switch (a.mem_type) {
-   case HVMMEM_mmio_dm:
-   ram = 0;
-   break;
-   case HVMMEM_ram_rw:
-   case HVMMEM_ram_ro:
-   default:
-   ram = 1;
-   break;
-   }
-
-   return ram;
+   HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a);
+   return a.mem_type != HVMMEM_mmio_dm;
   }
   #endif


Assuming that if HYPERVISOR_hvm_op() fails that
.mem_type is not set to HVMMEM_mmio_dm.



Okay we can't, due to "__must_check" ...

--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v1 2/8] x86/xen: simplify xen_oldmem_pfn_is_ram()

2021-09-29 Thread David Hildenbrand


How about

     return a.mem_type != HVMMEM_mmio_dm;



Ha, how could I have missed that :)



Result should be promoted to int and this has added benefit of not requiring 
changes in patch 4.



Can we go one step further and do


@@ -20,24 +20,11 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn)
struct xen_hvm_get_mem_type a = {
.domid = DOMID_SELF,
.pfn = pfn,
+   .mem_type = HVMMEM_ram_rw,
};
-   int ram;
 
-   if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a))

-   return -ENXIO;
-
-   switch (a.mem_type) {
-   case HVMMEM_mmio_dm:
-   ram = 0;
-   break;
-   case HVMMEM_ram_rw:
-   case HVMMEM_ram_ro:
-   default:
-   ram = 1;
-   break;
-   }
-
-   return ram;
+   HYPERVISOR_hvm_op(HVMOP_get_mem_type, &a);
+   return a.mem_type != HVMMEM_mmio_dm;
 }
 #endif


Assuming that if HYPERVISOR_hvm_op() fails that
.mem_type is not set to HVMMEM_mmio_dm.

--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] vduse: Fix race condition between resetting and irq injecting

2021-09-29 Thread Jason Wang
On Wed, Sep 29, 2021 at 4:32 PM Xie Yongji  wrote:
>
> The interrupt might be triggered after a reset since there is
> no synchronization between resetting and irq injecting. And it
> might break something if the interrupt is delayed until a new
> round of device initialization.
>
> Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
> Signed-off-by: Xie Yongji 
> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 37 +
>  1 file changed, 25 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index cefb301b2ee4..841667a896dd 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -80,6 +80,7 @@ struct vduse_dev {
> struct vdpa_callback config_cb;
> struct work_struct inject;
> spinlock_t irq_lock;
> +   struct rw_semaphore rwsem;
> int minor;
> bool broken;
> bool connected;
> @@ -410,6 +411,8 @@ static void vduse_dev_reset(struct vduse_dev *dev)
> if (domain->bounce_map)
> vduse_domain_reset_bounce_map(domain);
>
> +   down_write(&dev->rwsem);
> +
> dev->status = 0;
> dev->driver_features = 0;
> dev->generation++;
> @@ -443,6 +446,8 @@ static void vduse_dev_reset(struct vduse_dev *dev)
> flush_work(&vq->inject);
> flush_work(&vq->kick);
> }
> +
> +   up_write(&dev->rwsem);

Rethink about this, do we need to sync set_status() as well?

Thanks

>  }
>
>  static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> @@ -885,6 +890,23 @@ static void vduse_vq_irq_inject(struct work_struct *work)
> spin_unlock_irq(&vq->irq_lock);
>  }
>
> +static int vduse_dev_queue_irq_work(struct vduse_dev *dev,
> +   struct work_struct *irq_work)
> +{
> +   int ret = -EINVAL;
> +
> +   down_read(&dev->rwsem);
> +   if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK))
> +   goto unlock;
> +
> +   ret = 0;
> +   queue_work(vduse_irq_wq, irq_work);
> +unlock:
> +   up_read(&dev->rwsem);
> +
> +   return ret;
> +}
> +
>  static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
> unsigned long arg)
>  {
> @@ -966,12 +988,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
> int cmd,
> break;
> }
> case VDUSE_DEV_INJECT_CONFIG_IRQ:
> -   ret = -EINVAL;
> -   if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK))
> -   break;
> -
> -   ret = 0;
> -   queue_work(vduse_irq_wq, &dev->inject);
> +   ret = vduse_dev_queue_irq_work(dev, &dev->inject);
> break;
> case VDUSE_VQ_SETUP: {
> struct vduse_vq_config config;
> @@ -1049,10 +1066,6 @@ static long vduse_dev_ioctl(struct file *file, 
> unsigned int cmd,
> case VDUSE_VQ_INJECT_IRQ: {
> u32 index;
>
> -   ret = -EINVAL;
> -   if (!(dev->status & VIRTIO_CONFIG_S_DRIVER_OK))
> -   break;
> -
> ret = -EFAULT;
> if (get_user(index, (u32 __user *)argp))
> break;
> @@ -1061,9 +1074,8 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
> int cmd,
> if (index >= dev->vq_num)
> break;
>
> -   ret = 0;
> index = array_index_nospec(index, dev->vq_num);
> -   queue_work(vduse_irq_wq, &dev->vqs[index].inject);
> +   ret = vduse_dev_queue_irq_work(dev, &dev->vqs[index].inject);
> break;
> }
> default:
> @@ -1144,6 +1156,7 @@ static struct vduse_dev *vduse_dev_create(void)
> INIT_LIST_HEAD(&dev->send_list);
> INIT_LIST_HEAD(&dev->recv_list);
> spin_lock_init(&dev->irq_lock);
> +   init_rwsem(&dev->rwsem);
>
> INIT_WORK(&dev->inject, vduse_dev_irq_inject);
> init_waitqueue_head(&dev->waitq);
> --
> 2.11.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1 8/8] virtio-mem: kdump mode to sanitize /proc/vmcore access

2021-09-29 Thread David Hildenbrand

[...]


+
+static bool virtio_mem_vmcore_pfn_is_ram(struct vmcore_cb *cb,
+unsigned long pfn)
+{
+   struct virtio_mem *vm = container_of(cb, struct virtio_mem,
+vmcore_cb);
+   uint64_t addr = PFN_PHYS(pfn);
+   bool is_ram;
+   int rc;
+
+   if (!virtio_mem_contains_range(vm, addr, addr + PAGE_SIZE))


Some more testing revealed that this has to be

if (!virtio_mem_contains_range(vm, addr, PAGE_SIZE))


--
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost-vdpa:fix the worng input in config_cb

2021-09-29 Thread Michael S. Tsirkin
On Wed, Sep 29, 2021 at 03:54:37PM +0800, Cindy Lu wrote:
> Fix the worng input in for config_cb,
> in function vhost_vdpa_config_cb, the input
> cb.private was used as struct vhost_vdpa,
> So the inuput was worng here, fix this issue
> 
> Signed-off-by: Cindy Lu 

Maybe add

Fixes: 776f395004d8 ("vhost_vdpa: Support config interrupt in vdpa")

and fix typos in the commit log.

> ---
>  drivers/vhost/vdpa.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 942666425a45..e532cbe3d2f7 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -322,7 +322,7 @@ static long vhost_vdpa_set_config_call(struct vhost_vdpa 
> *v, u32 __user *argp)
>   struct eventfd_ctx *ctx;
>  
>   cb.callback = vhost_vdpa_config_cb;
> - cb.private = v->vdpa;
> + cb.private = v;
>   if (copy_from_user(&fd, argp, sizeof(fd)))
>   return  -EFAULT;
>  
> -- 
> 2.21.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization