[bug report] vdpa: introduce virtio pci driver

2021-02-24 Thread Dan Carpenter
Hello Jason Wang,

The patch 010eee82c84e: "vdpa: introduce virtio pci driver" from Feb
23, 2021, leads to the following static checker warning:

drivers/vdpa/virtio_pci/vp_vdpa.c:168 vp_vdpa_request_irq()
warn: inconsistent indenting

drivers/vdpa/virtio_pci/vp_vdpa.c
   154  goto err;
   155  }
   156  vp_modern_queue_vector(mdev, i, i);
   157  vp_vdpa->vring[i].irq = irq;
   158  }
   159  
   160  snprintf(vp_vdpa->msix_name, VP_VDPA_NAME_SIZE, 
"vp-vdpa[%s]-config\n",
   161   pci_name(pdev));
   162  irq = pci_irq_vector(pdev, queues);
   163  ret = devm_request_irq(>dev, irq, vp_vdpa_config_handler, 
0,
   164 vp_vdpa->msix_name, vp_vdpa);
   165  if (ret) {
   166  dev_err(>dev,
   167  "vp_vdpa: fail to request irq for vq %d\n", i);
 
Is this error message right?

   168  goto err;

indented too far

   169  }
   170  vp_modern_config_vector(mdev, queues);
   171  vp_vdpa->config_irq = irq;
   172  
   173  return 0;
   174  err:
   175  vp_vdpa_free_irq(vp_vdpa);
   176  return ret;
   177  }

regards,
dan carpenter
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Jason Wang


On 2021/2/24 7:12 下午, Cornelia Huck wrote:

On Wed, 24 Feb 2021 17:29:07 +0800
Jason Wang  wrote:


On 2021/2/23 6:58 下午, Cornelia Huck wrote:

On Tue, 23 Feb 2021 18:31:07 +0800
Jason Wang  wrote:
  

On 2021/2/23 6:04 下午, Cornelia Huck wrote:

On Tue, 23 Feb 2021 17:46:20 +0800
Jason Wang  wrote:
 

On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:

On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:

On 2/21/2021 8:14 PM, Jason Wang wrote:

On 2021/2/19 7:54 下午, Si-Wei Liu wrote:

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set.

This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
driver to use.
"

Do we really want to workaround this?

Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?

I think the point is, since there's legacy guest we'd have to support, this
host side workaround is unavoidable. Although I agree the violating driver
should be fixed (yes, it's in today's upstream kernel which exists for a
while now).

Oh  you are right:


static int virtnet_validate(struct virtio_device *vdev)
{
if (!vdev->config->get) {
dev_err(>dev, "%s failure: config access disabled\n",
__func__);
return -EINVAL;
}

if (!virtnet_validate_features(vdev))
return -EINVAL;

if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
int mtu = virtio_cread16(vdev,
 offsetof(struct virtio_net_config,
  mtu));
if (mtu < MIN_MTU)
__virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);

I wonder why not simply fail here?

I think both failing or not accepting the feature can be argued to make
sense: "the device presented us with a mtu size that does not make
sense" would point to failing, "we cannot work with the mtu size that
the device presented us" would point to not negotiating the feature.
 
 

}

return 0;
}

And the spec says:


The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the device.
4. Read device feature bits, and write the subset of feature bits understood by 
the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the 
device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits 
after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, 
the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues for the 
device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and 
population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


Item 4 on the list explicitly allows reading config space before
FEATURES_OK.

I conclude that VIRTIO_NET_F_MTU is set means "set in device features".

So this probably need some clarification. "is set" is used many times in
the spec that has different implications.

Before FEATURES_OK is set by the driver, I guess it means "the device
has offered the feature";

For me this part is ok since it clarify that it's the driver that set
the bit.


  

during normal usage, it means "the feature
has been negotiated".

/?

It looks to me the feature negotiation is done only after device set
FEATURES_OK, or FEATURES_OK could be read from device status?

I'd consider feature negotiation done when the driver reads FEATURES_OK
back from the status.


I agree.


  
  

(This is a bit fuzzy for legacy mode.)

...because legacy does not have FEATURES_OK.
  

The problem is the MTU description for example:

"The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is set."

It looks to me need to use "if VIRTIO_NET_F_MTU is set by device".

"offered by the device"? I don't think it should 'disappear' from the
config space if the driver won't use it. (Same for other config space
fields that are tied to feature bits.)

Re: [PATCH v8 bpf-next 0/5] xsk: build skb by page (aka generic zerocopy xmit)

2021-02-24 Thread Daniel Borkmann

On 2/18/21 9:49 PM, Alexander Lobakin wrote:

This series introduces XSK generic zerocopy xmit by adding XSK umem
pages as skb frags instead of copying data to linear space.
The only requirement for this for drivers is to be able to xmit skbs
with skb_headlen(skb) == 0, i.e. all data including hard headers
starts from frag 0.
To indicate whether a particular driver supports this, a new netdev
priv flag, IFF_TX_SKB_NO_LINEAR, is added (and declared in virtio_net
as it's already capable of doing it). So consider implementing this
in your drivers to greatly speed-up generic XSK xmit.

[...]

Applied, thanks!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] blk-core: remove blk_put_request()

2021-02-24 Thread Christoph Hellwig
On Wed, Feb 24, 2021 at 09:48:21AM -0700, Jens Axboe wrote:
> Would make sense to rename blk_get_request() to blk_mq_alloc_request()
> and then we have API symmetry. The get/put don't make sense when there
> are no references involved.
> 
> But it's a lot of churn for very little reward, which is always kind
> of annoying. Especially for the person that has to carry the patches.

Let's do the following:

 - move the initialize_rq_fn call from blk_get_request into
   blk_mq_alloc_request and make the former a trivial alias for the
   latter
 - migrate to the blk_mq_* versions on a per-driver/subsystem basis.
   The scsi migration depends on the first item above, so it will have
   to go with that or wait for the next merge window
 - don't migrate the legacy ide driver, as it is about to be removed and
   has a huge number of blk_get_request calls
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Si-Wei Liu



On 2/23/2021 9:04 PM, Michael S. Tsirkin wrote:

On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:


On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:

On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:

On 2021/2/23 9:12 上午, Si-Wei Liu wrote:

On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:

On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:

On 2021/2/19 7:54 下午, Si-Wei Liu wrote:

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set.

This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is
set.
This field specifies the maximum MTU for the driver to use.
"

Do we really want to workaround this?

Thanks

And also:

The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the
device.
4. Read device feature bits, and write the subset of feature bits
understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write)
the device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
feature bits after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set:
otherwise, the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues
for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration
space, and population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


so accessing config space before FEATURES_OK is a spec violation, right?

It is, but it's not relevant to what this commit tries to address. I
thought the legacy guest still needs to be supported.

Having said, a separate patch has to be posted to fix the guest driver
issue where this discrepancy is introduced to virtnet_validate() (since
commit fe36cbe067). But it's not technically related to this patch.

-Siwei

I think it's a bug to read config space in validate, we should move it to
virtnet_probe().

Thanks

I take it back, reading but not writing seems to be explicitly allowed by spec.
So our way to detect a legacy guest is bogus, need to think what is
the best way to handle this.

Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
guest? Supposedly only config space write access needs to be guarded before
setting FEATURES_OK.

-Siwie

Detecting it isn't enough though, we will need a new ioctl to notify
the kernel that it's a legacy guest. Ugh :(
Well, although I think adding an ioctl is doable, may I know what the 
use case there will be for kernel to leverage such info directly? Is 
there a case QEMU can't do with dedicate ioctls later if there's indeed 
differentiation (legacy v.s. modern) needed?


One of the reason I asked is if this ioctl becomes a mandate for 
vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't 
see this ioctl coming?


If it's optional, suppose the kernel may need it only when it becomes 
necessary?


Thanks,
-Siwei






Rejecting reset to 0
prematurely causes correct MTU and link status unable to load
for the very first config space access, rendering issues like
guest showing inaccurate MTU value, or failure to reject
out-of-range MTU.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for
supported mlx5 devices")
Signed-off-by: Si-Wei Liu 
---
     drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +--
     1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 7c1f789..540dd67 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1490,14 +1490,6 @@ static u64
mlx5_vdpa_get_features(struct vdpa_device *vdev)
     return mvdev->mlx_features;
     }
-static int verify_min_features(struct mlx5_vdpa_dev *mvdev,
u64 features)
-{
-    if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
-    return -EOPNOTSUPP;
-
-    return 0;
-}
-
     static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
     {
     int err;
@@ -1558,18 +1550,13 @@ static int
mlx5_vdpa_set_features(struct vdpa_device *vdev, u64
features)
     {
     struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
     struct 

Re: [RFC PATCH] blk-core: remove blk_put_request()

2021-02-24 Thread Jens Axboe
On 2/24/21 4:53 AM, Stefan Hajnoczi wrote:
> On Mon, Feb 22, 2021 at 01:11:15PM -0800, Chaitanya Kulkarni wrote:
>> The function blk_put_request() is just a wrapper to
>> blk_mq_free_request(), remove the unnecessary wrapper.
>>
>> Any feedback is welcome on this RFC.
>>
>> Signed-off-by: Chaitanya Kulkarni 
>> ---
>>  block/blk-core.c   |  6 --
>>  block/blk-merge.c  |  2 +-
>>  block/bsg-lib.c|  4 ++--
>>  block/bsg.c|  4 ++--
>>  block/scsi_ioctl.c |  6 +++---
>>  drivers/block/paride/pd.c  |  2 +-
>>  drivers/block/pktcdvd.c|  2 +-
>>  drivers/block/virtio_blk.c |  2 +-
>>  drivers/cdrom/cdrom.c  |  4 ++--
>>  drivers/ide/ide-atapi.c|  2 +-
>>  drivers/ide/ide-cd.c   |  4 ++--
>>  drivers/ide/ide-cd_ioctl.c |  2 +-
>>  drivers/ide/ide-devsets.c  |  2 +-
>>  drivers/ide/ide-disk.c |  2 +-
>>  drivers/ide/ide-ioctls.c   |  4 ++--
>>  drivers/ide/ide-park.c |  2 +-
>>  drivers/ide/ide-pm.c   |  4 ++--
>>  drivers/ide/ide-tape.c |  2 +-
>>  drivers/ide/ide-taskfile.c |  2 +-
>>  drivers/md/dm-mpath.c  |  2 +-
>>  drivers/mmc/core/block.c   | 10 +-
>>  drivers/scsi/scsi_error.c  |  2 +-
>>  drivers/scsi/scsi_lib.c|  2 +-
>>  drivers/scsi/sg.c  |  6 +++---
>>  drivers/scsi/st.c  |  4 ++--
>>  drivers/scsi/ufs/ufshcd.c  |  6 +++---
>>  drivers/target/target_core_pscsi.c |  4 ++--
>>  fs/nfsd/blocklayout.c  |  4 ++--
>>  include/linux/blkdev.h |  1 -
>>  29 files changed, 46 insertions(+), 53 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index fc60ff208497..1754f5e7cc80 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -642,12 +642,6 @@ struct request *blk_get_request(struct request_queue 
>> *q, unsigned int op,
>>  }
>>  EXPORT_SYMBOL(blk_get_request);
>>  
>> -void blk_put_request(struct request *req)
>> -{
>> -blk_mq_free_request(req);
>> -}
>> -EXPORT_SYMBOL(blk_put_request);
> 
> blk_get_request() still exists after this patch. A "get" API usually has
> a corresponding "put" API. I'm not sure this patch helps the consistency
> and clarity of the code.
> 
> If you do go ahead, please update the blk_get_request() doc comment
> explicitly mentioning that blk_mq_free_request() needs to be called.

Would make sense to rename blk_get_request() to blk_mq_alloc_request()
and then we have API symmetry. The get/put don't make sense when there
are no references involved.

But it's a lot of churn for very little reward, which is always kind
of annoying. Especially for the person that has to carry the patches.

-- 
Jens Axboe

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4 1/3] virtio: don't prompt CONFIG_VIRTIO_PCI_MODERN

2021-02-24 Thread Christian Borntraeger



On 23.02.21 07:19, Jason Wang wrote:
> We used to prompt CONFIG_VIRTIO_PCI_MODERN to user which may bring a
> lot of confusion. E.g it may break various default configs which want
> virtio devices.
> 
> So this patch fixes this by hiding the prompot and documenting the
> dependency. While at it, rename the module to VIRTIO_PCI_LIB.
> 
> Cc: Arnd Bergmann 
> Cc: Anders Roxell 
> Cc: Guenter Roeck 
> Reported-by: Naresh Kamboju 
> Fixes: 86b87c9d858b6 ("virtio-pci: introduce modern device module")
> Signed-off-by: Jason Wang 

Michael,

Can we please add this NOW to next? virtio-pci is broken without it on
many defconfigs including s390. So linux-next is broken for more than 
2 weeks now and it actively breaks several parts of our CI.
I guess there are other CIs that will not run several testcases because
of this. And Naresh reported that on Feb 9th.
There IS value in CI tools on next. Not caring about regressions introduced
by a tree in next is harmful. Especially when we are close or in the
merge window. So please: either fix things quickly OR revert.

Christian



> ---
>  drivers/virtio/Kconfig  | 11 ++-
>  drivers/virtio/Makefile |  2 +-
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 6b9b81f4b8c2..ce1b3f6ec325 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -12,13 +12,13 @@ config ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
> This option is selected if the architecture may need to enforce
> VIRTIO_F_ACCESS_PLATFORM
>  
> -config VIRTIO_PCI_MODERN
> - tristate "Modern Virtio PCI Device"
> - depends on PCI
> +config VIRTIO_PCI_LIB
> + tristate
>   help
> Modern PCI device implementation. This module implements the
> basic probe and control for devices which are based on modern
> -   PCI device with possible vendor specific extensions.
> +   PCI device with possible vendor specific extensions. Any
> +   module that selects this module must depend on PCI.
>  
>  menuconfig VIRTIO_MENU
>   bool "Virtio drivers"
> @@ -28,7 +28,8 @@ if VIRTIO_MENU
>  
>  config VIRTIO_PCI
>   tristate "PCI driver for virtio devices"
> - depends on VIRTIO_PCI_MODERN
> + depends on PCI
> + select VIRTIO_PCI_LIB
>   select VIRTIO
>   help
> This driver provides support for virtio based paravirtual device
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index f097578aaa8f..699bbea0465f 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
> -obj-$(CONFIG_VIRTIO_PCI_MODERN) += virtio_pci_modern_dev.o
> +obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
>  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] blk-core: remove blk_put_request()

2021-02-24 Thread Stefan Hajnoczi
On Mon, Feb 22, 2021 at 01:11:15PM -0800, Chaitanya Kulkarni wrote:
> The function blk_put_request() is just a wrapper to
> blk_mq_free_request(), remove the unnecessary wrapper.
> 
> Any feedback is welcome on this RFC.
> 
> Signed-off-by: Chaitanya Kulkarni 
> ---
>  block/blk-core.c   |  6 --
>  block/blk-merge.c  |  2 +-
>  block/bsg-lib.c|  4 ++--
>  block/bsg.c|  4 ++--
>  block/scsi_ioctl.c |  6 +++---
>  drivers/block/paride/pd.c  |  2 +-
>  drivers/block/pktcdvd.c|  2 +-
>  drivers/block/virtio_blk.c |  2 +-
>  drivers/cdrom/cdrom.c  |  4 ++--
>  drivers/ide/ide-atapi.c|  2 +-
>  drivers/ide/ide-cd.c   |  4 ++--
>  drivers/ide/ide-cd_ioctl.c |  2 +-
>  drivers/ide/ide-devsets.c  |  2 +-
>  drivers/ide/ide-disk.c |  2 +-
>  drivers/ide/ide-ioctls.c   |  4 ++--
>  drivers/ide/ide-park.c |  2 +-
>  drivers/ide/ide-pm.c   |  4 ++--
>  drivers/ide/ide-tape.c |  2 +-
>  drivers/ide/ide-taskfile.c |  2 +-
>  drivers/md/dm-mpath.c  |  2 +-
>  drivers/mmc/core/block.c   | 10 +-
>  drivers/scsi/scsi_error.c  |  2 +-
>  drivers/scsi/scsi_lib.c|  2 +-
>  drivers/scsi/sg.c  |  6 +++---
>  drivers/scsi/st.c  |  4 ++--
>  drivers/scsi/ufs/ufshcd.c  |  6 +++---
>  drivers/target/target_core_pscsi.c |  4 ++--
>  fs/nfsd/blocklayout.c  |  4 ++--
>  include/linux/blkdev.h |  1 -
>  29 files changed, 46 insertions(+), 53 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index fc60ff208497..1754f5e7cc80 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -642,12 +642,6 @@ struct request *blk_get_request(struct request_queue *q, 
> unsigned int op,
>  }
>  EXPORT_SYMBOL(blk_get_request);
>  
> -void blk_put_request(struct request *req)
> -{
> - blk_mq_free_request(req);
> -}
> -EXPORT_SYMBOL(blk_put_request);

blk_get_request() still exists after this patch. A "get" API usually has
a corresponding "put" API. I'm not sure this patch helps the consistency
and clarity of the code.

If you do go ahead, please update the blk_get_request() doc comment
explicitly mentioning that blk_mq_free_request() needs to be called.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3] vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation

2021-02-24 Thread Gautam Dawar
Hi Michael,

I've submitted a fresh patch including the ack from Jason and after
testing with git am.

Regards,
Gautam

On Tue, 23 Feb 2021 at 18:20, Michael S. Tsirkin  wrote:
>
> On Tue, Feb 23, 2021 at 06:15:07PM +0530, Gautam Dawar wrote:
> > When qemu with vhost-vdpa netdevice is run for the first time, it works 
> > well.
> > But after the VM is powered off, the next qemu run causes kernel panic due 
> > to a
> > NULL pointer dereference in irq_bypass_register_producer().
> >
> > When the VM is powered off, vhost_vdpa_clean_irq() misses on calling
> > irq_bypass_unregister_producer() for irq 0 because of the existing check.
> >
> > This leaves stale producer nodes, which are reset in 
> > vhost_vring_call_reset()
> > when vhost_dev_init() is invoked during the second qemu run.
> >
> > As the node member of struct irq_bypass_producer is also initialized
> > to zero, traversal on the producers list causes crash due to NULL pointer
> > dereference.
> >
> > Fixes: 2cf1ba9a4d15c ("vhost_vdpa: implement IRQ offloading in vhost_vdpa")
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211711
> > Signed-off-by: Gautam Dawar 
>
>
> A bit better but still:
> Applying: vhost_vdpa: fix the missing irq_bypass_unregister_producer() 
> invocation
> error: corrupt patch at line 6
> error: could not build fake ancestor
> Patch failed at 0001 vhost_vdpa: fix the missing 
> irq_bypass_unregister_producer() invocation
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
>
>
> you also want to include acks you got on the previous versions.
>
> > -
> > changelog:
> > v2->v3:
> >  - Re-submitting the patch in plain text format as suggested by Michael
> >  - Added the fixes tag
> >
> > v1->v2:
> >  - Addressed Jason's comment to remove the irq check and use
> >vhost_vdpa_unsetup_vq_irq() to avoid local variable vq
> > -
> >
> >
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > index 62a9bb0efc55..e00573b87aba 100644
> > --- a/drivers/vhost/vdpa.c
> > +++ b/drivers/vhost/vdpa.c
> > @@ -844,14 +844,10 @@ static int vhost_vdpa_open(struct inode *inode,
> > struct file *filep)
> >
> >  static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)
> >  {
> > -   struct vhost_virtqueue *vq;
> > int i;
> >
> > -   for (i = 0; i < v->nvqs; i++) {
> > -   vq = >vqs[i];
> > -   if (vq->call_ctx.producer.irq)
> > -   
> > irq_bypass_unregister_producer(>call_ctx.producer);
> > -   }
> > +   for (i = 0; i < v->nvqs; i++)
> > +   vhost_vdpa_unsetup_vq_irq(v, i);
> >  }
> >
> >  static int vhost_vdpa_release(struct inode *inode, struct file *filep)
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost_vdpa: fix the missing irq_bypass_unregister_producer() invocation

2021-02-24 Thread Gautam Dawar
When qemu with vhost-vdpa netdevice is run for the first time,
it works well. But after the VM is powered off, the next qemu run
causes kernel panic due to a NULL pointer dereference in
irq_bypass_register_producer().

When the VM is powered off, vhost_vdpa_clean_irq() misses on calling
irq_bypass_unregister_producer() for irq 0 because of the existing check.

This leaves stale producer nodes, which are reset in
vhost_vring_call_reset() when vhost_dev_init() is invoked during the
second qemu run.

As the node member of struct irq_bypass_producer is also initialized
to zero, traversal on the producers list causes crash due to NULL
pointer dereference.

Fixes: 2cf1ba9a4d15c ("vhost_vdpa: implement IRQ offloading in vhost_vdpa")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211711
Signed-off-by: Gautam Dawar 
Acked-by: Jason Wang 
---
 drivers/vhost/vdpa.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 62a9bb0efc55..e00573b87aba 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -844,14 +844,10 @@ static int vhost_vdpa_open(struct inode *inode, struct 
file *filep)
 
 static void vhost_vdpa_clean_irq(struct vhost_vdpa *v)
 {
-   struct vhost_virtqueue *vq;
int i;
 
-   for (i = 0; i < v->nvqs; i++) {
-   vq = >vqs[i];
-   if (vq->call_ctx.producer.irq)
-   irq_bypass_unregister_producer(>call_ctx.producer);
-   }
+   for (i = 0; i < v->nvqs; i++)
+   vhost_vdpa_unsetup_vq_irq(v, i);
 }
 
 static int vhost_vdpa_release(struct inode *inode, struct file *filep)
-- 
2.30.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH linux-next 2/9] vdpa: Introduce query of device config layout

2021-02-24 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Wednesday, February 24, 2021 12:33 PM
> 
> > +
> > +   features = vdev->config->get_features(vdev);
> > +   if ((features & (1ULL << VIRTIO_NET_F_MQ)) == 0)
> > +   return 0;
> > +
> > +   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> > +   config->max_virtqueue_pairs))
> > +   return -EMSGSIZE;
> > +   if (nla_put_u8(msg,
> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN,
> > +  config->rss_max_key_size))
> 
> Why is it ok to poke at RSS fields without checking the relevant feature bits?
> 
It should be checked.
I relied on _MQ feature bit but yes, even with MQ, RSS can be off.
Will fix to read RSS feature bit.

> > +   return -EMSGSIZE;
> 
> Did you check this with sparse?
> max_virtqueue_pairs is __virtio16.
I will check and go through the types.

> 
> > +
> > +   rss_field = le16_to_cpu(config->rss_max_key_size);
> > +   if (nla_put_u16(msg,
> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN, rss_field))
> > +   return -EMSGSIZE;
> > +
> > +   hash_types = le32_to_cpu(config->supported_hash_types);
> 
> unused variable
Will remove.

> 
> > +   if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES,
> > +   config->supported_hash_types))
> > +   return -EMSGSIZE;
> > +   return 0;
> > +}
> > +
> > +static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct
> > +sk_buff *msg) {
> > +   struct virtio_net_config config = {};
> > +
> > +   vdev->config->get_config(vdev, 0, , sizeof(config));
> > +   if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
> sizeof(config.mac), config.mac))
> > +   return -EMSGSIZE;
> > +   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, config.status))
> > +   return -EMSGSIZE;
> > +   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU,
> config.mtu))
> > +   return -EMSGSIZE;
> > +   if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_SPEED,
> config.speed))
> > +   return -EMSGSIZE;
> 
> looks like a bunch of endian-ness/sparse errors to me, and a bunch of fields
> checked without checking the feature bits.
Yes, few are missed out.
Fixing them in v2.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Cornelia Huck
On Wed, 24 Feb 2021 17:29:07 +0800
Jason Wang  wrote:

> On 2021/2/23 6:58 下午, Cornelia Huck wrote:
> > On Tue, 23 Feb 2021 18:31:07 +0800
> > Jason Wang  wrote:
> >  
> >> On 2021/2/23 6:04 下午, Cornelia Huck wrote:  
> >>> On Tue, 23 Feb 2021 17:46:20 +0800
> >>> Jason Wang  wrote:
> >>> 
>  On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:  
> > On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:  
> >> On 2/21/2021 8:14 PM, Jason Wang wrote:  
> >>> On 2021/2/19 7:54 下午, Si-Wei Liu wrote:  
>  Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
>  for legacy") made an exception for legacy guests to reset
>  features to 0, when config space is accessed before features
>  are set. We should relieve the verify_min_features() check
>  and allow features reset to 0 for this case.
> 
>  It's worth noting that not just legacy guests could access
>  config space before features are set. For instance, when
>  feature VIRTIO_NET_F_MTU is advertised some modern driver
>  will try to access and validate the MTU present in the config
>  space before virtio features are set.  
> >>> This looks like a spec violation:
> >>>
> >>> "
> >>>
> >>> The following driver-read-only field, mtu only exists if
> >>> VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
> >>> driver to use.
> >>> "
> >>>
> >>> Do we really want to workaround this?  
> >> Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?
> >>
> >> I think the point is, since there's legacy guest we'd have to support, 
> >> this
> >> host side workaround is unavoidable. Although I agree the violating 
> >> driver
> >> should be fixed (yes, it's in today's upstream kernel which exists for 
> >> a
> >> while now).  
> > Oh  you are right:
> >
> >
> > static int virtnet_validate(struct virtio_device *vdev)
> > {
> >if (!vdev->config->get) {
> >dev_err(>dev, "%s failure: config access 
> > disabled\n",
> >__func__);
> >return -EINVAL;
> >}
> >
> >if (!virtnet_validate_features(vdev))
> >return -EINVAL;
> >
> >if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> >int mtu = virtio_cread16(vdev,
> > offsetof(struct 
> > virtio_net_config,
> >  mtu));
> >if (mtu < MIN_MTU)
> >__virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);  
>  I wonder why not simply fail here?  
> >>> I think both failing or not accepting the feature can be argued to make
> >>> sense: "the device presented us with a mtu size that does not make
> >>> sense" would point to failing, "we cannot work with the mtu size that
> >>> the device presented us" would point to not negotiating the feature.
> >>> 
>  
> >}
> >
> >return 0;
> > }
> >
> > And the spec says:
> >
> >
> > The driver MUST follow this sequence to initialize a device:
> > 1. Reset the device.
> > 2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
> > 3. Set the DRIVER status bit: the guest OS knows how to drive the 
> > device.
> > 4. Read device feature bits, and write the subset of feature bits 
> > understood by the OS and driver to the
> > device. During this step the driver MAY read (but MUST NOT write) the 
> > device-specific configuration
> > fields to check that it can support the device before accepting it.
> > 5. Set the FEATURES_OK status bit. The driver MUST NOT accept new 
> > feature bits after this step.
> > 6. Re-read device status to ensure the FEATURES_OK bit is still set: 
> > otherwise, the device does not
> > support our subset of features and the device is unusable.
> > 7. Perform device-specific setup, including discovery of virtqueues for 
> > the device, optional per-bus setup,
> > reading and possibly writing the device’s virtio configuration space, 
> > and population of virtqueues.
> > 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> >
> >
> > Item 4 on the list explicitly allows reading config space before
> > FEATURES_OK.
> >
> > I conclude that VIRTIO_NET_F_MTU is set means "set in device features". 
> >  
>  So this probably need some clarification. "is set" is used many times in
>  the spec that has different implications.  
> >>> Before FEATURES_OK is set by the driver, I guess it means "the device
> >>> has offered the feature";  
> >>
> >> For me this part is ok since it 

Re: [PATCH 3/7] x86/boot/compressed/64: Setup IDT in startup_32 boot path

2021-02-24 Thread Borislav Petkov
On Wed, Feb 10, 2021 at 11:21:31AM +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> This boot path needs exception handling when it is used with SEV-ES.

For ?

Let's explain pls.

> Setup an IDT and provide a helper function to write IDT entries for
> use in 32-bit protected mode.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/boot/compressed/head_64.S | 73 +-
>  1 file changed, 72 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/boot/compressed/head_64.S 
> b/arch/x86/boot/compressed/head_64.S
> index c59c80ca546d..8deeec78cdb4 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -116,6 +116,11 @@ SYM_FUNC_START(startup_32)
>   lretl
>  1:
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + /* Setup Exception handling for SEV-ES */
> + callstartup32_load_idt
> +#endif
> +

You can push that ifdeffery out of the main path (diff ontop):

---
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 8deeec78cdb4..cb5a6849fb29 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -116,10 +116,7 @@ SYM_FUNC_START(startup_32)
lretl
 1:
 
-#ifdef CONFIG_AMD_MEM_ENCRYPT
-   /* Setup Exception handling for SEV-ES */
callstartup32_load_idt
-#endif
 
/* Make sure cpu supports long mode. */
callverify_cpu
@@ -854,16 +851,18 @@ SYM_FUNC_START(startup32_set_idt_entry)
pop %ebx
ret
 SYM_FUNC_END(startup32_set_idt_entry)
+#endif
 
+/* Setup Exception handling for SEV-ES */
 SYM_FUNC_START(startup32_load_idt)
+#ifdef CONFIG_AMD_MEM_ENCRYPT
/* Load IDT */
lealrva(boot32_idt)(%ebp), %eax
movl%eax, rva(boot32_idt_desc+2)(%ebp)
lidtrva(boot32_idt_desc)(%ebp)
-
+#endif
ret
 SYM_FUNC_END(startup32_load_idt)
-#endif
 /*
  * Stack and heap for uncompression
  */
---

> +SYM_FUNC_START(startup32_set_idt_entry)
> + push%ebx
> + push%ecx
> +
> + /* IDT entry address to %ebx */
> + lealrva(boot32_idt)(%ebp), %ebx
> + shl $3, %edx
> + addl%edx, %ebx
> +
> + /* Build IDT entry, lower 4 bytes */
> + movl%eax, %edx

Let's add some side comments here:

 +  andl$0x, %edx   # Target code segment offset 
[15:0]


 +  movl$__KERNEL32_CS, %ecx# Target code segment selector 
> + shl $16, %ecx
> + orl %ecx, %edx
> +
> + /* Store lower 4 bytes to IDT */
> + movl%edx, (%ebx)
> +
> + /* Build IDT entry, upper 4 bytes */
> + movl%eax, %edx
 +  andl$0x, %edx   # Target code segment offset 
[31:16]
 +  orl $0x8e00, %edx   # Present, Type 32-bit 
Interrupt Gate

so that a reader like me can quickly find interrupt gates in the docs.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Jason Wang


On 2021/2/23 6:17 下午, Jason Wang wrote:


On 2021/2/23 6:01 下午, Michael S. Tsirkin wrote:

On Tue, Feb 23, 2021 at 05:46:20PM +0800, Jason Wang wrote:

On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:

On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:

On 2/21/2021 8:14 PM, Jason Wang wrote:

On 2021/2/19 7:54 下午, Si-Wei Liu wrote:

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set.

This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for 
the

driver to use.
"

Do we really want to workaround this?
Isn't the commit 452639a64ad8 itself is a workaround for legacy 
guest?


I think the point is, since there's legacy guest we'd have to 
support, this
host side workaround is unavoidable. Although I agree the 
violating driver
should be fixed (yes, it's in today's upstream kernel which exists 
for a

while now).

Oh  you are right:


static int virtnet_validate(struct virtio_device *vdev)
{
  if (!vdev->config->get) {
  dev_err(>dev, "%s failure: config access 
disabled\n",

  __func__);
  return -EINVAL;
  }

  if (!virtnet_validate_features(vdev))
  return -EINVAL;

  if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
  int mtu = virtio_cread16(vdev,
   offsetof(struct 
virtio_net_config,

    mtu));
  if (mtu < MIN_MTU)
  __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);


I wonder why not simply fail here?

Back in 2016 it went like this:

On Thu, Jun 02, 2016 at 05:10:59PM -0400, Aaron Conole wrote:
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> + dev->mtu = virtio_cread16(vdev,
> +   offsetof(struct 
virtio_net_config,

> +    mtu));
> + }
> +
>   if (vi->any_header_sg)
>   dev->needed_headroom = vi->hdr_len;
>

One comment though: I think we should validate the mtu.
If it's invalid, clear VIRTIO_NET_F_MTU and ignore.


Too late at this point :)

I guess it's a way to tell device "I can not live with this MTU",
device can fail FEATURES_OK if it wants to. MIN_MTU
is an internal linux thing and at the time I felt it's better to
try to make progress.



What if e.g the device advertise a large MTU. E.g 64K here?



Ok, consider we use add_recvbuf_small() when neither GSO nor mrg_rxbuf. 
This means we should fail the probing if MTU is greater than 1500 in 
this case.


Thanks


In that case, the driver can not live either. Clearing MTU won't help 
here.


Thanks






  }

  return 0;
}

And the spec says:


The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the 
device.
3. Set the DRIVER status bit: the guest OS knows how to drive the 
device.
4. Read device feature bits, and write the subset of feature bits 
understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) 
the device-specific configuration

fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new 
feature bits after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still 
set: otherwise, the device does not

support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues 
for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration 
space, and population of virtqueues.

8. Set the DRIVER_OK status bit. At this point the device is “live”.


Item 4 on the list explicitly allows reading config space before
FEATURES_OK.

I conclude that VIRTIO_NET_F_MTU is set means "set in device 
features".


So this probably need some clarification. "is set" is used many 
times in the

spec that has different implications.

Thanks



Generally it is worth going over feature dependent config fields
and checking whether they should be present when device feature is set
or when feature bit has been negotiated, and making this clear.





___
Virtualization 

Re: [virtio-comment] [RFC PATCH v1 1/1] virtio-vsock: add SOCK_SEQPACKET description

2021-02-24 Thread Stefano Garzarella

On Thu, Feb 18, 2021 at 09:08:23AM +0300, Arseny Krasnov wrote:

Signed-off-by: Arseny Krasnov 
---
virtio-vsock.tex | 40 +---
1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/virtio-vsock.tex b/virtio-vsock.tex
index da7e641..1ee8f99 100644
--- a/virtio-vsock.tex
+++ b/virtio-vsock.tex
@@ -102,6 +102,11 @@ \subsection{Device Operation}\label{sec:Device Types / 
Socket Device / Device Op
VIRTIO_VSOCK_OP_CREDIT_UPDATE = 6,
/* Request the peer to send the credit info to us */
VIRTIO_VSOCK_OP_CREDIT_REQUEST = 7,
+
+   /* Message begin for SOCK_SEQPACKET */
+   VIRTIO_VSOCK_OP_SEQ_BEGIN = 8,
+   /* Message end for SOCK_SEQPACKET */
+   VIRTIO_VSOCK_OP_SEQ_END = 9,
};
\end{lstlisting}

@@ -140,11 +145,11 @@ \subsubsection{Addressing}\label{sec:Device Types / 
Socket Device / Device Opera
consists of a (cid, port number) tuple. The header fields used for this are
\field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.

-Currently only stream sockets are supported. \field{type} is 1 for stream
-socket types.
+Currently stream and seqpacket sockets are supported. \field{type} is 1 for
+stream socket types. \field{type} is 2 for seqpacket socket types.

Stream sockets provide in-order, guaranteed, connection-oriented delivery
-without message boundaries.
+without message boundaries. Seqpacket sockets also provide message boundaries.

\subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device 
/ Device Operation / Buffer Space Management}
\field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
@@ -240,6 +245,35 @@ \subsubsection{Stream Sockets}\label{sec:Device Types / 
Socket Device / Device O
destination) address tuple for a new connection while the other peer is still
processing the old connection.

+\subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / 
Device Operation / Seqpacket Sockets}
+
+Seqpacket sockets differ from stream sockets only in data transmission way: in
+stream sockets all data is sent using only VIRTIO_VSOCK_OP_RW packets. In
+seqpacket sockets, to provide message boundaries, every sequence of
+VIRTIO_VSOCK_OP_RW packets of each message is headed with

 ^
Since this is a spec, I think we should use MUST when something must be 
respected by the peer, for example here we can say "MUST be headed"



+VIRTIO_VSOCK_OP_SEQ_BEGIN and tailed with VIRTIO_VSOCK_OP_SEQ_END packets.
+Both VIRTIO_VSOCK_OP_SEQ_BEGIN and VIRTIO_VSOCK_OP_SEQ_END packets carry

 ^
Same here "MUST carry" and in the rest of the patch.


+special header in payload:
+
+\begin{lstlisting}
+struct virtio_vsock_seq_hdr {
+   __le32  msg_cnt;
+   __le32  msg_len;
+};
+\end{lstlisting}
+
+\field{msg_cnt} is per socket and incremented by 1 on every send of
+VIRTIO_VSOCK_OP_SEQ_BEGIN or VIRTIO_VSOCK_OP_SEQ_END. \field{msg_len} contains
+message length. This header is used to check integrity of each message: message
+is valid if length of data in VIRTIO_VSOCK_OP_RW packets is equal to
+\field{msg_len} of prepending VIRTIO_VSOCK_OP_SEQ_BEGIN and \field{msg_cnt} of
+VIRTIO_VSOCK_OP_SEQ_END differs from \field{msg_cnt} of
+VIRTIO_VSOCK_OP_SEQ_BEGIN by 1.


If we replace msg_cnt with msg_id, I think we should have the same 
'msg_id' for VIRTIO_VSOCK_OP_SEQ_BEGIN or VIRTIO_VSOCK_OP_SEQ_END. If 
you want to use 'msg_cnt' and you increment it between BEGIN and END, we 
should consider the overflow case.



+
+POSIX MSG_EOR flag is supported by special value of \field{flags} in packet's
+header. If this flag is set for message, then all VIRTIO_VSOCK_OP_RW packets
+of this message have bit 0 is set to 1 in \field{flags}.


Maybe we need to define what MSG_EOR means.

Thanks,
Stefano


+
\subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device 
Operation / Device Events}

Certain events are communicated by the device to the driver using the event
--
2.25.1


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscr...@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscr...@lists.oasis-open.org
List help: virtio-comment-h...@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Jason Wang


On 2021/2/24 4:43 下午, Michael S. Tsirkin wrote:

On Wed, Feb 24, 2021 at 04:26:43PM +0800, Jason Wang wrote:

 Basically on first guest access QEMU would tell kernel whether
 guest is using the legacy or the modern interface.
 E.g. virtio_pci_config_read/virtio_pci_config_write will call 
ioctl(ENABLE_LEGACY, 1)
 while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0)


But this trick work only for PCI I think?

Thanks

ccw has a revision it can check. mmio does not have transitional devices
at all.



Ok, then we can do the workaround in the qemu, isn't it?

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Jason Wang


On 2021/2/23 6:58 下午, Cornelia Huck wrote:

On Tue, 23 Feb 2021 18:31:07 +0800
Jason Wang  wrote:


On 2021/2/23 6:04 下午, Cornelia Huck wrote:

On Tue, 23 Feb 2021 17:46:20 +0800
Jason Wang  wrote:
  

On 2021/2/23 下午5:25, Michael S. Tsirkin wrote:

On Mon, Feb 22, 2021 at 09:09:28AM -0800, Si-Wei Liu wrote:

On 2/21/2021 8:14 PM, Jason Wang wrote:

On 2021/2/19 7:54 下午, Si-Wei Liu wrote:

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set.

This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is set. This field specifies the maximum MTU for the
driver to use.
"

Do we really want to workaround this?

Isn't the commit 452639a64ad8 itself is a workaround for legacy guest?

I think the point is, since there's legacy guest we'd have to support, this
host side workaround is unavoidable. Although I agree the violating driver
should be fixed (yes, it's in today's upstream kernel which exists for a
while now).

Oh  you are right:


static int virtnet_validate(struct virtio_device *vdev)
{
   if (!vdev->config->get) {
   dev_err(>dev, "%s failure: config access disabled\n",
   __func__);
   return -EINVAL;
   }

   if (!virtnet_validate_features(vdev))
   return -EINVAL;

   if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
   int mtu = virtio_cread16(vdev,
offsetof(struct virtio_net_config,
 mtu));
   if (mtu < MIN_MTU)
   __virtio_clear_bit(vdev, VIRTIO_NET_F_MTU);

I wonder why not simply fail here?

I think both failing or not accepting the feature can be argued to make
sense: "the device presented us with a mtu size that does not make
sense" would point to failing, "we cannot work with the mtu size that
the device presented us" would point to not negotiating the feature.
  
  

   }

   return 0;
}

And the spec says:


The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the device.
4. Read device feature bits, and write the subset of feature bits understood by 
the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write) the 
device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new feature bits 
after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set: otherwise, 
the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues for the 
device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration space, and 
population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


Item 4 on the list explicitly allows reading config space before
FEATURES_OK.

I conclude that VIRTIO_NET_F_MTU is set means "set in device features".

So this probably need some clarification. "is set" is used many times in
the spec that has different implications.

Before FEATURES_OK is set by the driver, I guess it means "the device
has offered the feature";


For me this part is ok since it clarify that it's the driver that set
the bit.




during normal usage, it means "the feature
has been negotiated".

/?

It looks to me the feature negotiation is done only after device set
FEATURES_OK, or FEATURES_OK could be read from device status?

I'd consider feature negotiation done when the driver reads FEATURES_OK
back from the status.



I agree.







   (This is a bit fuzzy for legacy mode.)

...because legacy does not have FEATURES_OK.
   


The problem is the MTU description for example:

"The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is set."

It looks to me need to use "if VIRTIO_NET_F_MTU is set by device".

"offered by the device"? I don't think it should 'disappear' from the
config space if the driver won't use it. (Same for other config space
fields that are tied to feature bits.)



But what happens if e.g device doesn't offer VIRTIO_NET_F_MTU? It looks 
to according to the spec there will be no mtu field.



Re: [PATCH linux-next 9/9] vdpa/mlx5: Forward only packets with allowed MAC address

2021-02-24 Thread Jason Wang


On 2021/2/24 2:18 下午, Parav Pandit wrote:

From: Eli Cohen 

Add rules to forward packets to the net device's TIR only if the
destination MAC is equal to the configured MAC. This is required to
prevent the netdevice from receiving traffic not destined to its
configured MAC.



Just to confirm, this will only apply to unicast packet and we will 
still let multicast/broadcast to go?


Thanks




Signed-off-by: Eli Cohen 
Reviewed-by: Parav Pandit 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 84 +++
  1 file changed, 64 insertions(+), 20 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 51a3fc4cde4d..9b580c67acda 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -147,7 +147,8 @@ struct mlx5_vdpa_net {
struct mutex reslock;
struct mlx5_flow_table *rxft;
struct mlx5_fc *rx_counter;
-   struct mlx5_flow_handle *rx_rule;
+   struct mlx5_flow_handle *rx_rule_ucast;
+   struct mlx5_flow_handle *rx_rule_mcast;
bool setup;
u16 mtu;
  };
@@ -1294,21 +1295,34 @@ static int add_fwd_to_tir(struct mlx5_vdpa_net *ndev)
struct mlx5_flow_table_attr ft_attr = {};
struct mlx5_flow_act flow_act = {};
struct mlx5_flow_namespace *ns;
+   struct mlx5_flow_spec *spec;
+   void *headers_c;
+   void *headers_v;
+   u8 *dmac_c;
+   u8 *dmac_v;
int err;
  
-	/* for now, one entry, match all, forward to tir */

-   ft_attr.max_fte = 1;
-   ft_attr.autogroup.max_num_groups = 1;
+   spec = kvzalloc(sizeof(*spec), GFP_KERNEL);
+   if (!spec)
+   return -ENOMEM;
+
+   spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
+   ft_attr.max_fte = 2;
+   ft_attr.autogroup.max_num_groups = 2;
  
-	ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, MLX5_FLOW_NAMESPACE_BYPASS);

+   ns = mlx5_get_flow_namespace(ndev->mvdev.mdev,
+MLX5_FLOW_NAMESPACE_BYPASS);
if (!ns) {
-   mlx5_vdpa_warn(>mvdev, "get flow namespace\n");
-   return -EOPNOTSUPP;
+   mlx5_vdpa_warn(>mvdev, "failed to get flow namespace\n");
+   err = -EOPNOTSUPP;
+   goto err_ns;
}
  
  	ndev->rxft = mlx5_create_auto_grouped_flow_table(ns, _attr);

-   if (IS_ERR(ndev->rxft))
-   return PTR_ERR(ndev->rxft);
+   if (IS_ERR(ndev->rxft)) {
+   err = PTR_ERR(ndev->rxft);
+   goto err_ns;
+   }
  
  	ndev->rx_counter = mlx5_fc_create(ndev->mvdev.mdev, false);

if (IS_ERR(ndev->rx_counter)) {
@@ -1316,37 +1330,67 @@ static int add_fwd_to_tir(struct mlx5_vdpa_net *ndev)
goto err_fc;
}
  
-	flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_COUNT;

+   headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, 
outer_headers);
+   dmac_c = MLX5_ADDR_OF(fte_match_param, headers_c, 
outer_headers.dmac_47_16);
+   memset(dmac_c, 0xff, ETH_ALEN);
+   headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, 
outer_headers);
+   dmac_v = MLX5_ADDR_OF(fte_match_param, headers_v, 
outer_headers.dmac_47_16);
+   ether_addr_copy(dmac_v, ndev->config.mac);
+
+   flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST |
+ MLX5_FLOW_CONTEXT_ACTION_COUNT;
dest[0].type = MLX5_FLOW_DESTINATION_TYPE_TIR;
dest[0].tir_num = ndev->res.tirn;
dest[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
dest[1].counter_id = mlx5_fc_id(ndev->rx_counter);
-   ndev->rx_rule = mlx5_add_flow_rules(ndev->rxft, NULL, _act, dest, 
2);
-   if (IS_ERR(ndev->rx_rule)) {
-   err = PTR_ERR(ndev->rx_rule);
-   ndev->rx_rule = NULL;
-   goto err_rule;
+   ndev->rx_rule_ucast = mlx5_add_flow_rules(ndev->rxft, spec, _act,
+ dest, 2);
+
+   if (IS_ERR(ndev->rx_rule_ucast)) {
+   err = PTR_ERR(ndev->rx_rule_ucast);
+   ndev->rx_rule_ucast = NULL;
+   goto err_rule_ucast;
+   }
+
+   memset(dmac_c, 0, ETH_ALEN);
+   memset(dmac_v, 0, ETH_ALEN);
+   dmac_c[0] = 1;
+   dmac_v[0] = 1;
+   flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
+   ndev->rx_rule_mcast = mlx5_add_flow_rules(ndev->rxft, spec, _act,
+ dest, 1);
+   if (IS_ERR(ndev->rx_rule_mcast)) {
+   err = PTR_ERR(ndev->rx_rule_mcast);
+   ndev->rx_rule_mcast = NULL;
+   goto err_rule_mcast;
}
  
+	kvfree(spec);

return 0;
  
-err_rule:

+err_rule_mcast:
+   mlx5_del_flow_rules(ndev->rx_rule_ucast);
+   ndev->rx_rule_ucast = NULL;
+err_rule_ucast:
mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter);
  err_fc:

Re: [PATCH linux-next 7/9] vdpa/mlx5: Provide device generated random MAC address

2021-02-24 Thread Jason Wang


On 2021/2/24 2:18 下午, Parav Pandit wrote:

From: Eli Cohen 

Use a randomly generated MAC address to be applied in case it is not
configured by management tool.

The value queried through mlx5_query_nic_vport_mac_address() is not
relelavnt to vdpa since it is the mac that should be used by the regular
NIC driver.

Signed-off-by: Eli Cohen 
Reviewed-by: Parav Pandit 



Acked-by: Jason Wang 



---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b67bba581dfd..ece2183e7b20 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2005,10 +2005,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name)
if (err)
goto err_mtu;
  
-	err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);

-   if (err)
-   goto err_mtu;
-
+   eth_random_addr(config->mac);
mvdev->vdev.dma_dev = mdev->device;
err = mlx5_vdpa_alloc_resources(>mvdev);
if (err)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH linux-next 8/9] vdpa/mlx5: Support configuration of MAC

2021-02-24 Thread Jason Wang


On 2021/2/24 2:18 下午, Parav Pandit wrote:

From: Eli Cohen 

Add code to accept MAC configuration through vdpa tool. The MAC is
written into the config struct and later can be retrieved through
get_config().

Examples:
1. Configure MAC:
$ vdpa dev config set vdpa0 mac 00:11:22:33:44:55

2. Show configured params:
$ vdpa dev config show
vdpa0: mac 00:11:22:33:44:55 link down link_announce false mtu 0 speed 0 duplex 0

Signed-off-by: Eli Cohen 
Reviewed-by: Parav Pandit 
---



Acked-by: Jason Wang 



  drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index ece2183e7b20..51a3fc4cde4d 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1972,6 +1972,22 @@ struct mlx5_vdpa_mgmtdev {
struct mlx5_vdpa_net *ndev;
  };
  
+static int mlx5_vdpa_net_dev_config_set(struct vdpa_mgmt_dev *v_mdev,

+   struct vdpa_device *vdev,
+   const struct vdpa_dev_config_set_attr 
*attrs)
+{
+   struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
mlx5_vdpa_mgmtdev, mgtdev);
+   struct mlx5_vdpa_net *ndev = mgtdev->ndev;
+
+   if (attrs->mask.mtu_valid)
+   return -EOPNOTSUPP;
+
+   if (attrs->mask.mac_valid)
+   memcpy(ndev->config.mac, attrs->cfg.mac, ETH_ALEN);
+
+   return 0;
+}
+
  static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name)
  {
struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct 
mlx5_vdpa_mgmtdev, mgtdev);
@@ -2044,6 +2060,7 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev 
*v_mdev, struct vdpa_device *
  static const struct vdpa_mgmtdev_ops mdev_ops = {
.dev_add = mlx5_vdpa_dev_add,
.dev_del = mlx5_vdpa_dev_del,
+   .dev_config_set = mlx5_vdpa_net_dev_config_set,
  };
  
  static struct virtio_device_id id_table[] = {


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH linux-next 2/9] vdpa: Introduce query of device config layout

2021-02-24 Thread Jason Wang


On 2021/2/24 2:18 下午, Parav Pandit wrote:

Introduce a command to query a device config layout.

An example query of network vdpa device:

$ vdpa dev add name bar mgmtdev vdpasim_net

$ vdpa dev config show
bar: mac 00:35:09:19:48:05 link up link_announce false mtu 1500 speed 0 duplex 0

$ vdpa dev config show -jp
{
 "config": {
 "bar": {
 "mac": "00:35:09:19:48:05",
 "link ": "up",
 "link_announce ": false,
 "mtu": 1500,
 "speed": 0,
 "duplex": 0
 }
 }
}

Signed-off-by: Parav Pandit 
Reviewed-by: Eli Cohen 
---
changelog:
v1->v2:
  - read whole net config layout instead of individual fields
  - added error extack for unmanaged vdpa device
---
  drivers/vdpa/vdpa.c   | 181 ++
  include/uapi/linux/vdpa.h |  11 +++
  2 files changed, 192 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 3d997b389345..cebbba500638 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -14,6 +14,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  
  static LIST_HEAD(mdev_head);

  /* A global mutex that protects vdpa management device and device level 
operations. */
@@ -603,6 +605,179 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff 
*msg, struct netlink_callba
return msg->len;
  }
  
+static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev,

+  struct sk_buff *msg,
+  const struct virtio_net_config *config)
+{
+   u32 hash_types;
+   u16 rss_field;
+   u64 features;
+
+   features = vdev->config->get_features(vdev);
+   if ((features & (1ULL << VIRTIO_NET_F_MQ)) == 0)
+   return 0;
+
+   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
+   config->max_virtqueue_pairs))
+   return -EMSGSIZE;



We probably need to check VIRTIO_NET_F_RSS here.



+   if (nla_put_u8(msg, VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN,
+  config->rss_max_key_size))
+   return -EMSGSIZE;
+
+   rss_field = le16_to_cpu(config->rss_max_key_size);
+   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN, rss_field))
+   return -EMSGSIZE;
+
+   hash_types = le32_to_cpu(config->supported_hash_types);
+   if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES,
+   config->supported_hash_types))
+   return -EMSGSIZE;
+   return 0;
+}
+
+static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff 
*msg)
+{
+   struct virtio_net_config config = {};
+
+   vdev->config->get_config(vdev, 0, , sizeof(config));



Do we need to sync with other possible get_config calls here?



+   if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, sizeof(config.mac), 
config.mac))
+   return -EMSGSIZE;
+   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, config.status))
+   return -EMSGSIZE;
+   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, config.mtu))
+   return -EMSGSIZE;



And check VIRTIO_NET_F_SPEED_DUPLEX.



+   if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_SPEED, config.speed))
+   return -EMSGSIZE;
+   if (nla_put_u8(msg, VDPA_ATTR_DEV_NET_CFG_DUPLEX, config.duplex))
+   return -EMSGSIZE;
+
+   return vdpa_dev_net_mq_config_fill(vdev, msg, );
+}
+
+static int
+vdpa_dev_config_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 
portid, u32 seq,
+int flags, struct netlink_ext_ack *extack)
+{
+   u32 device_id;
+   void *hdr;
+   int err;
+
+   hdr = genlmsg_put(msg, portid, seq, _nl_family, flags,
+ VDPA_CMD_DEV_CONFIG_GET);
+   if (!hdr)
+   return -EMSGSIZE;
+
+   if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(>dev))) {
+   err = -EMSGSIZE;
+   goto msg_err;
+   }
+
+   device_id = vdev->config->get_device_id(vdev);
+   if (nla_put_u32(msg, VDPA_ATTR_DEV_ID, device_id)) {
+   err = -EMSGSIZE;
+   goto msg_err;
+   }
+
+   switch (device_id) {
+   case VIRTIO_ID_NET:
+   err = vdpa_dev_net_config_fill(vdev, msg);
+   break;
+   default:
+   err = -EOPNOTSUPP;
+   break;
+   }
+   if (err)
+   goto msg_err;
+
+   genlmsg_end(msg, hdr);
+   return 0;
+
+msg_err:
+   genlmsg_cancel(msg, hdr);
+   return err;
+}
+
+static int vdpa_nl_cmd_dev_config_get_doit(struct sk_buff *skb, struct 
genl_info *info)
+{
+   struct vdpa_device *vdev;
+   struct sk_buff *msg;
+   const char *devname;
+   struct device *dev;
+   int err;
+
+   if (!info->attrs[VDPA_ATTR_DEV_NAME])
+   return -EINVAL;
+   devname = nla_data(info->attrs[VDPA_ATTR_DEV_NAME]);
+

Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Michael S. Tsirkin
On Wed, Feb 24, 2021 at 04:26:43PM +0800, Jason Wang wrote:
> Basically on first guest access QEMU would tell kernel whether
> guest is using the legacy or the modern interface.
> E.g. virtio_pci_config_read/virtio_pci_config_write will call 
> ioctl(ENABLE_LEGACY, 1)
> while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0)
> 
> 
> But this trick work only for PCI I think?
> 
> Thanks

ccw has a revision it can check. mmio does not have transitional devices
at all.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v5 00/19] virtio/vsock: introduce SOCK_SEQPACKET support

2021-02-24 Thread Stefano Garzarella

On Wed, Feb 24, 2021 at 11:28:50AM +0300, Arseny Krasnov wrote:


On 24.02.2021 11:23, Stefano Garzarella wrote:

On Wed, Feb 24, 2021 at 07:29:25AM +0300, Arseny Krasnov wrote:

On 23.02.2021 17:50, Stefano Garzarella wrote:

On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote:

Hi Arseny,

On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote:

This patchset impelements support of SOCK_SEQPACKET for virtio
transport.
As SOCK_SEQPACKET guarantees to save record boundaries, so to
do it, two new packet operations were added: first for start of record
and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also,
both operations carries metadata - to maintain boundaries and payload
integrity. Metadata is introduced by adding special header with two
fields - message count and message length:

struct virtio_vsock_seq_hdr {
__le32  msg_cnt;
__le32  msg_len;
} __attribute__((packed));

This header is transmitted as payload of SEQ_BEGIN and SEQ_END
packets(buffer of second virtio descriptor in chain) in the same way as
data transmitted in RW packets. Payload was chosen as buffer for this
header to avoid touching first virtio buffer which carries header of
packet, because someone could check that size of this buffer is equal
to size of packet header. To send record, packet with start marker is
sent first(it's header contains length of record and counter), then
counter is incremented and all data is sent as usual 'RW' packets and
finally SEQ_END is sent(it also carries counter of message, which is
counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is
incremented again. On receiver's side, length of record is known from
packet with start record marker. To check that no packets were dropped
by transport, counters of two sequential SEQ_BEGIN and SEQ_END are
checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by
1) and length of data between two markers is compared to length in
SEQ_BEGIN header.
Now as  packets of one socket are not reordered neither on
vsock nor on vhost transport layers, such markers allows to restore
original record on receiver's side. If user's buffer is smaller that
record length, when all out of size data is dropped.
Maximum length of datagram is not limited as in stream socket,
because same credit logic is used. Difference with stream socket is
that user is not woken up until whole record is received or error
occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
Tests also implemented.

I reviewed the first part (af_vsock.c changes), tomorrow I'll review
the rest. That part looks great to me, only found a few minor issues.

I revieiwed the rest of it as well, left a few minor comments, but I
think we're well on track.

I'll take a better look at the specification patch tomorrow.

Great, Thank You

Thanks,
Stefano


In the meantime, however, I'm getting a doubt, especially with regard
to other transports besides virtio.

Should we hide the begin/end marker sending in the transport?

I mean, should the transport just provide a seqpacket_enqueue()
callbacl?
Inside it then the transport will send the markers. This is because
some transports might not need to send markers.

But thinking about it more, they could actually implement stubs for
that calls, if they don't need to send markers.

So I think for now it's fine since it allows us to reuse a lot of
code, unless someone has some objection.

I thought about that, I'll try to implement it in next version. Let's see...

If you want to discuss it first, write down the idea you want to
implement, I wouldn't want to make you do unnecessary work. :-)


Idea is simple, in iov iterator of 'struct msghdr' which is passed to

enqueue callback we have two fields: 'iov_offset' which is byte

offset inside io vector where next data must be picked and 'count'

which is rest of unprocessed bytes in io vector. So in seqpacket

enqueue callback if 'iov_offset' is 0 i'll send SEQBEGIN, and if

'count' is 0 i'll send SEQEND.



Got it, make sense and it's defently more transparent for the vsock 
core!
Go head, maybe adding a comment in the vsock core explaining this, so 
other developers can understand better if they want to support SEPACKET 
in other transports.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v5 11/19] virtio/vsock: dequeue callback for SOCK_SEQPACKET

2021-02-24 Thread Stefano Garzarella

On Wed, Feb 24, 2021 at 01:41:56AM -0500, Michael S. Tsirkin wrote:

On Wed, Feb 24, 2021 at 08:07:48AM +0300, Arseny Krasnov wrote:


On 23.02.2021 17:17, Michael S. Tsirkin wrote:
> On Thu, Feb 18, 2021 at 08:39:37AM +0300, Arseny Krasnov wrote:
>> This adds transport callback and it's logic for SEQPACKET dequeue.
>> Callback fetches RW packets from rx queue of socket until whole record
>> is copied(if user's buffer is full, user is not woken up). This is done
>> to not stall sender, because if we wake up user and it leaves syscall,
>> nobody will send credit update for rest of record, and sender will wait
>> for next enter of read syscall at receiver's side. So if user buffer is
>> full, we just send credit update and drop data. If during copy SEQ_BEGIN
>> was found(and not all data was copied), copying is restarted by reset
>> user's iov iterator(previous unfinished data is dropped).
>>
>> Signed-off-by: Arseny Krasnov 
>> ---
>>  include/linux/virtio_vsock.h|  10 +++
>>  include/uapi/linux/virtio_vsock.h   |  16 
>>  net/vmw_vsock/virtio_transport_common.c | 114 
>>  3 files changed, 140 insertions(+)
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index dc636b727179..003d06ae4a85 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -36,6 +36,11 @@ struct virtio_vsock_sock {
>>u32 rx_bytes;
>>u32 buf_alloc;
>>struct list_head rx_queue;
>> +
>> +  /* For SOCK_SEQPACKET */
>> +  u32 user_read_seq_len;
>> +  u32 user_read_copied;
>> +  u32 curr_rx_msg_cnt;
>
> wrap these in a struct to make it's clearer they
> are related?
Ack
>
>>  };
>>
>>  struct virtio_vsock_pkt {
>> @@ -80,6 +85,11 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
>>   struct msghdr *msg,
>>   size_t len, int flags);
>>
>> +int
>> +virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
>> + struct msghdr *msg,
>> + int flags,
>> + bool *msg_ready);
>>  s64 virtio_transport_stream_has_data(struct vsock_sock *vsk);
>>  s64 virtio_transport_stream_has_space(struct vsock_sock *vsk);
>>
>> diff --git a/include/uapi/linux/virtio_vsock.h 
b/include/uapi/linux/virtio_vsock.h
>> index 1d57ed3d84d2..cf9c165e5cca 100644
>> --- a/include/uapi/linux/virtio_vsock.h
>> +++ b/include/uapi/linux/virtio_vsock.h
>> @@ -63,8 +63,14 @@ struct virtio_vsock_hdr {
>>__le32  fwd_cnt;
>>  } __attribute__((packed));
>>
>> +struct virtio_vsock_seq_hdr {
>> +  __le32  msg_cnt;
>> +  __le32  msg_len;
>> +} __attribute__((packed));
>> +
>>  enum virtio_vsock_type {
>>VIRTIO_VSOCK_TYPE_STREAM = 1,
>> +  VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
>>  };
>>
>>  enum virtio_vsock_op {
>> @@ -83,6 +89,11 @@ enum virtio_vsock_op {
>>VIRTIO_VSOCK_OP_CREDIT_UPDATE = 6,
>>/* Request the peer to send the credit info to us */
>>VIRTIO_VSOCK_OP_CREDIT_REQUEST = 7,
>> +
>> +  /* Record begin for SOCK_SEQPACKET */
>> +  VIRTIO_VSOCK_OP_SEQ_BEGIN = 8,
>> +  /* Record end for SOCK_SEQPACKET */
>> +  VIRTIO_VSOCK_OP_SEQ_END = 9,
>>  };
>>
>>  /* VIRTIO_VSOCK_OP_SHUTDOWN flags values */
>> @@ -91,4 +102,9 @@ enum virtio_vsock_shutdown {
>>VIRTIO_VSOCK_SHUTDOWN_SEND = 2,
>>  };
>>
>> +/* VIRTIO_VSOCK_OP_RW flags values */
>> +enum virtio_vsock_rw {
>> +  VIRTIO_VSOCK_RW_EOR = 1,
>> +};
>> +
>>  #endif /* _UAPI_LINUX_VIRTIO_VSOCK_H */
> Probably a good idea to also have a feature bit gating
> this functionality.

IIUC this also requires some qemu patch, because in current

implementation of vsock device in qemu, there is no 'set_features'

callback for such device. This callback will handle guest's write

to feature register, by calling vhost kernel backend, where this

bit will be processed by host.


Well patching userspace to make use of a kernel feature
is par for the course, isn't it?



IMHO I'm not sure that SEQPACKET support needs feature

bit - it is just two new ops for virtio vsock protocol, and from point

of view of virtio device it is same as STREAM. May be it is needed

for cases when client tries to connect to server which doesn't support

SEQPACKET, so without bit result will be "Connection reset by peer",

and with such bit client will know that server doesn't support it and

'socket(SOCK_SEQPACKET)' will return error?


Yes, a better error handling would be one reason to do it like this.


Agree, in this way we could implement a 'seqpacket_allow' callback 
(similar to 'stream_allow'), and we can return 'true' if the feature is 
negotiated.


So instead of checking all the seqpacket callbacks, we can use only this 
callback to understand if the transport support it.
We can implement it also for other transports (vmci, hyperv) and return 
always false for now.


Thanks,
Stefano


Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-02-24 Thread Jason Wang


On 2021/2/24 3:17 下午, Michael S. Tsirkin wrote:

On Wed, Feb 24, 2021 at 02:53:08PM +0800, Jason Wang wrote:

On 2021/2/24 2:46 下午, Michael S. Tsirkin wrote:

On Wed, Feb 24, 2021 at 02:04:36PM +0800, Jason Wang wrote:

On 2021/2/24 1:04 下午, Michael S. Tsirkin wrote:

On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:

On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:

On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:

On 2021/2/23 9:12 上午, Si-Wei Liu wrote:

On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:

On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang wrote:

On 2021/2/19 7:54 下午, Si-Wei Liu wrote:

Commit 452639a64ad8 ("vdpa: make sure set_features is invoked
for legacy") made an exception for legacy guests to reset
features to 0, when config space is accessed before features
are set. We should relieve the verify_min_features() check
and allow features reset to 0 for this case.

It's worth noting that not just legacy guests could access
config space before features are set. For instance, when
feature VIRTIO_NET_F_MTU is advertised some modern driver
will try to access and validate the MTU present in the config
space before virtio features are set.

This looks like a spec violation:

"

The following driver-read-only field, mtu only exists if
VIRTIO_NET_F_MTU is
set.
This field specifies the maximum MTU for the driver to use.
"

Do we really want to workaround this?

Thanks

And also:

The driver MUST follow this sequence to initialize a device:
1. Reset the device.
2. Set the ACKNOWLEDGE status bit: the guest OS has noticed the device.
3. Set the DRIVER status bit: the guest OS knows how to drive the
device.
4. Read device feature bits, and write the subset of feature bits
understood by the OS and driver to the
device. During this step the driver MAY read (but MUST NOT write)
the device-specific configuration
fields to check that it can support the device before accepting it.
5. Set the FEATURES_OK status bit. The driver MUST NOT accept new
feature bits after this step.
6. Re-read device status to ensure the FEATURES_OK bit is still set:
otherwise, the device does not
support our subset of features and the device is unusable.
7. Perform device-specific setup, including discovery of virtqueues
for the device, optional per-bus setup,
reading and possibly writing the device’s virtio configuration
space, and population of virtqueues.
8. Set the DRIVER_OK status bit. At this point the device is “live”.


so accessing config space before FEATURES_OK is a spec violation, right?

It is, but it's not relevant to what this commit tries to address. I
thought the legacy guest still needs to be supported.

Having said, a separate patch has to be posted to fix the guest driver
issue where this discrepancy is introduced to virtnet_validate() (since
commit fe36cbe067). But it's not technically related to this patch.

-Siwei

I think it's a bug to read config space in validate, we should move it to
virtnet_probe().

Thanks

I take it back, reading but not writing seems to be explicitly allowed by spec.
So our way to detect a legacy guest is bogus, need to think what is
the best way to handle this.

Then maybe revert commit fe36cbe067 and friends, and have QEMU detect legacy
guest? Supposedly only config space write access needs to be guarded before
setting FEATURES_OK.

-Siwie

Detecting it isn't enough though, we will need a new ioctl to notify
the kernel that it's a legacy guest. Ugh:(

I'm not sure I get this, how can we know if there's a legacy driver before
set_features()?

qemu knows for sure. It does not communicate this information to the
kernel right now unfortunately.

I may miss something, but I still don't get how the new ioctl is supposed to
work.

Thanks


Basically on first guest access QEMU would tell kernel whether
guest is using the legacy or the modern interface.
E.g. virtio_pci_config_read/virtio_pci_config_write will call 
ioctl(ENABLE_LEGACY, 1)
while virtio_pci_common_read will call ioctl(ENABLE_LEGACY, 0)



But this trick work only for PCI I think?

Thanks




Or maybe we just add GET_CONFIG_MODERN and GET_CONFIG_LEGACY and
call the correct ioctl ... there are many ways to build this API.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v5 00/19] virtio/vsock: introduce SOCK_SEQPACKET support

2021-02-24 Thread Stefano Garzarella

On Wed, Feb 24, 2021 at 07:29:25AM +0300, Arseny Krasnov wrote:


On 23.02.2021 17:50, Stefano Garzarella wrote:

On Mon, Feb 22, 2021 at 03:23:11PM +0100, Stefano Garzarella wrote:

Hi Arseny,

On Thu, Feb 18, 2021 at 08:33:44AM +0300, Arseny Krasnov wrote:

This patchset impelements support of SOCK_SEQPACKET for virtio
transport.
As SOCK_SEQPACKET guarantees to save record boundaries, so to
do it, two new packet operations were added: first for start of record
and second to mark end of record(SEQ_BEGIN and SEQ_END later). Also,
both operations carries metadata - to maintain boundaries and payload
integrity. Metadata is introduced by adding special header with two
fields - message count and message length:

struct virtio_vsock_seq_hdr {
__le32  msg_cnt;
__le32  msg_len;
} __attribute__((packed));

This header is transmitted as payload of SEQ_BEGIN and SEQ_END
packets(buffer of second virtio descriptor in chain) in the same way as
data transmitted in RW packets. Payload was chosen as buffer for this
header to avoid touching first virtio buffer which carries header of
packet, because someone could check that size of this buffer is equal
to size of packet header. To send record, packet with start marker is
sent first(it's header contains length of record and counter), then
counter is incremented and all data is sent as usual 'RW' packets and
finally SEQ_END is sent(it also carries counter of message, which is
counter of SEQ_BEGIN + 1), also after sedning SEQ_END counter is
incremented again. On receiver's side, length of record is known from
packet with start record marker. To check that no packets were dropped
by transport, counters of two sequential SEQ_BEGIN and SEQ_END are
checked(counter of SEQ_END must be bigger that counter of SEQ_BEGIN by
1) and length of data between two markers is compared to length in
SEQ_BEGIN header.
Now as  packets of one socket are not reordered neither on
vsock nor on vhost transport layers, such markers allows to restore
original record on receiver's side. If user's buffer is smaller that
record length, when all out of size data is dropped.
Maximum length of datagram is not limited as in stream socket,
because same credit logic is used. Difference with stream socket is
that user is not woken up until whole record is received or error
occurred. Implementation also supports 'MSG_EOR' and 'MSG_TRUNC' flags.
Tests also implemented.

I reviewed the first part (af_vsock.c changes), tomorrow I'll review
the rest. That part looks great to me, only found a few minor issues.

I revieiwed the rest of it as well, left a few minor comments, but I
think we're well on track.

I'll take a better look at the specification patch tomorrow.

Great, Thank You


Thanks,
Stefano


In the meantime, however, I'm getting a doubt, especially with regard
to other transports besides virtio.

Should we hide the begin/end marker sending in the transport?

I mean, should the transport just provide a seqpacket_enqueue()
callbacl?
Inside it then the transport will send the markers. This is because
some transports might not need to send markers.

But thinking about it more, they could actually implement stubs for
that calls, if they don't need to send markers.

So I think for now it's fine since it allows us to reuse a lot of
code, unless someone has some objection.


I thought about that, I'll try to implement it in next version. Let's see...


If you want to discuss it first, write down the idea you want to 
implement, I wouldn't want to make you do unnecessary work. :-)


Cheers,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH linux-next 0/9] vdpa: Enable user to set mac address,

2021-02-24 Thread Parav Pandit



> From: Michael S. Tsirkin 
> Sent: Wednesday, February 24, 2021 12:21 PM
> 
> On Wed, Feb 24, 2021 at 08:18:35AM +0200, Parav Pandit wrote:
> > Currently user cannot set the mac address and mtu of the vdpa device.
> > This patchset enables users to set the mac address and mtu of the vdpa
[..]

> >
> > Patch summary:
> > Patch-1 uses read only features bit to detect endianness
> > Patch-2 implements new config layout query command
> > Patch-3 implements callback for setting vdpa net config fields
> > Patch-4 extends vdpa_sim_net driver to implement mac, mtu setting
> > Patch-5 removed redundant get_config callback
> > Patch-6 mlx5 vdpa driver migrates to user created vdpa device
> > Patch-7 mlx5 vdpa driver uses random mac address when not configured
> > Patch-8 mlx5 vdpa driver supports user provided mac config
> > Patch-9 mlx5 vdpa driver uses user provided mac during rx flow
> > steering
> 
> which tree is this for? does not seem to apply to linux-next branch of vhost 
> ...

It is for linux-next branch of vhost. However I missed to rebase before sending.
My bad.
Preparing to send v2 after addressing other comments and after rebase.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization