[PATCH v3 1/1] virtio: write back F_VERSION_1 before validate

2021-10-10 Thread Halil Pasic
The virtio specification virtio-v1.1-cs01 states: "Transitional devices
MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not
been acknowledged by the driver."  This is exactly what QEMU as of 6.1
has done relying solely on VIRTIO_F_VERSION_1 for detecting that.

However, the specification also says: "... the driver MAY read (but MUST
NOT write) the device-specific configuration fields to check that it can
support the device ..." before setting FEATURES_OK.

In that case, any transitional device relying solely on
VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in
legacy format.  In particular, this implies that it is in big endian
format for big endian guests. This naturally confuses the driver which
expects little endian in the modern mode.

It is probably a good idea to amend the spec to clarify that
VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
is complete. Before validate callback existed, config space was only
read after FEATURES_OK. However, we already have two regressions, so
let's address this here as well.

The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and
the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when
virtio 1.0 is used on both sides. The latter renders virtio-blk unusable
with DASD backing, because things simply don't work with the default.
See Fixes tags for relevant commits.

For QEMU, we can work around the issue by writing out the feature bits
with VIRTIO_F_VERSION_1 bit set.  We (ab)use the finalize_features
config op for this. This isn't enough to address all vhost devices since
these do not get the features until FEATURES_OK, however it looks like
the affected devices actually never handled the endianness for legacy
mode correctly, so at least that's not a regression.

No devices except virtio net and virtio blk seem to be affected.

Long term the right thing to do is to fix the hypervisors.

Cc:  #v4.11
Signed-off-by: Halil Pasic 
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config 
space")
Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range")
Reported-by: mark...@us.ibm.com
Reviewed-by: Cornelia Huck 
---

@Connie: I made some more commit message changes to accommodate Michael's
requests. I just assumed these will work or you as well and kept your
r-b. Please shout at me if it needs to be dropped :)
---
 drivers/virtio/virtio.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 0a5b54034d4b..236081afe9a2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -239,6 +239,17 @@ static int virtio_dev_probe(struct device *_d)
driver_features_legacy = driver_features;
}
 
+   /*
+* Some devices detect legacy solely via F_VERSION_1. Write
+* F_VERSION_1 to force LE config space accesses before FEATURES_OK for
+* these when needed.
+*/
+   if (drv->validate && !virtio_legacy_is_little_endian()
+ && device_features & BIT_ULL(VIRTIO_F_VERSION_1)) {
+   dev->features = BIT_ULL(VIRTIO_F_VERSION_1);
+   dev->config->finalize_features(dev);
+   }
+
if (device_features & (1ULL << VIRTIO_F_VERSION_1))
dev->features = driver_features & device_features;
else

base-commit: 60a9483534ed0d99090a2ee1d4bb0b8179195f51
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] VDUSE: fix documentation underline warning

2021-10-10 Thread Jason Wang
On Thu, Oct 7, 2021 at 4:29 AM Randy Dunlap  wrote:
>
> Fix a VDUSE documentation build warning:
>
> Documentation/userspace-api/vduse.rst:21: WARNING: Title underline too short.
>
> Fixes: 7bc7f61897b6 ("Documentation: Add documentation for VDUSE")
> Signed-off-by: Randy Dunlap 
> Cc: Xie Yongji 
> Cc: Jason Wang 
> Cc: Michael S. Tsirkin 
> Cc: virtualization@lists.linux-foundation.org
> Cc: Jonathan Corbet 
> Cc: linux-...@vger.kernel.org
> ---

Acked-by: Jason Wang 

>  Documentation/userspace-api/vduse.rst |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- lnx-515-rc4.orig/Documentation/userspace-api/vduse.rst
> +++ lnx-515-rc4/Documentation/userspace-api/vduse.rst
> @@ -18,7 +18,7 @@ types can be added after the security is
>  is clarified or fixed in the future.
>
>  Create/Destroy VDUSE devices
> -
> +
>
>  VDUSE devices are created as follows:
>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 7/7] eni_vdpa: add vDPA driver for Alibaba ENI

2021-10-10 Thread Jason Wang


在 2021/9/29 下午2:11, Wu Zongyong 写道:

This patch adds a new vDPA driver for Alibaba ENI(Elastic Network
Interface) which is build upon virtio 0.9.5 specification.
And this driver doesn't support to run on BE host.



If this is true, I think it's still better to exclude this driver via 
Kconfig.





Signed-off-by: Wu Zongyong 
---
  drivers/vdpa/Kconfig|   8 +
  drivers/vdpa/Makefile   |   1 +
  drivers/vdpa/alibaba/Makefile   |   3 +
  drivers/vdpa/alibaba/eni_vdpa.c | 553 
  4 files changed, 565 insertions(+)
  create mode 100644 drivers/vdpa/alibaba/Makefile
  create mode 100644 drivers/vdpa/alibaba/eni_vdpa.c

diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 3d91982d8371..9587b9177b05 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -78,4 +78,12 @@ config VP_VDPA
help
  This kernel module bridges virtio PCI device to vDPA bus.
  
+config ALIBABA_ENI_VDPA

+   tristate "vDPA driver for Alibaba ENI"
+   select VIRTIO_PCI_LEGACY_LIB
+   depends on PCI_MSI
+   help
+ VDPA driver for Alibaba ENI(Elastic Network Interface) which is build 
upon
+ virtio 0.9.5 specification.
+
  endif # VDPA
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index f02ebed33f19..15665563a7f4 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_VDPA_USER) += vdpa_user/
  obj-$(CONFIG_IFCVF)+= ifcvf/
  obj-$(CONFIG_MLX5_VDPA) += mlx5/
  obj-$(CONFIG_VP_VDPA)+= virtio_pci/
+obj-$(CONFIG_ALIBABA_ENI_VDPA) += alibaba/
diff --git a/drivers/vdpa/alibaba/Makefile b/drivers/vdpa/alibaba/Makefile
new file mode 100644
index ..ef4aae69f87a
--- /dev/null
+++ b/drivers/vdpa/alibaba/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_ALIBABA_ENI_VDPA) += eni_vdpa.o
+
diff --git a/drivers/vdpa/alibaba/eni_vdpa.c b/drivers/vdpa/alibaba/eni_vdpa.c
new file mode 100644
index ..6a09f157d810
--- /dev/null
+++ b/drivers/vdpa/alibaba/eni_vdpa.c
@@ -0,0 +1,553 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * vDPA bridge driver for Alibaba ENI(Elastic Network Interface)
+ *
+ * Copyright (c) 2021, Alibaba Inc. All rights reserved.
+ * Author: Wu Zongyong 
+ *
+ */
+
+#include "linux/bits.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define ENI_MSIX_NAME_SIZE 256
+
+#define ENI_ERR(pdev, fmt, ...)\
+   dev_err(&pdev->dev, "%s"fmt, "eni_vdpa: ", ##__VA_ARGS__)
+#define ENI_DBG(pdev, fmt, ...)\
+   dev_dbg(&pdev->dev, "%s"fmt, "eni_vdpa: ", ##__VA_ARGS__)
+#define ENI_INFO(pdev, fmt, ...) \
+   dev_info(&pdev->dev, "%s"fmt, "eni_vdpa: ", ##__VA_ARGS__)
+
+struct eni_vring {
+   void __iomem *notify;
+   char msix_name[ENI_MSIX_NAME_SIZE];
+   struct vdpa_callback cb;
+   int irq;
+};
+
+struct eni_vdpa {
+   struct vdpa_device vdpa;
+   struct virtio_pci_legacy_device ldev;
+   struct eni_vring *vring;
+   struct vdpa_callback config_cb;
+   char msix_name[ENI_MSIX_NAME_SIZE];
+   int config_irq;
+   int queues;
+   int vectors;
+};
+
+static struct eni_vdpa *vdpa_to_eni(struct vdpa_device *vdpa)
+{
+   return container_of(vdpa, struct eni_vdpa, vdpa);
+}
+
+static struct virtio_pci_legacy_device *vdpa_to_ldev(struct vdpa_device *vdpa)
+{
+   struct eni_vdpa *eni_vdpa = vdpa_to_eni(vdpa);
+
+   return &eni_vdpa->ldev;
+}
+
+static u64 eni_vdpa_get_features(struct vdpa_device *vdpa)
+{
+   struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
+   u64 features = vp_legacy_get_features(ldev);
+
+   features |= BIT_ULL(VIRTIO_F_ACCESS_PLATFORM);
+   features |= BIT_ULL(VIRTIO_F_ORDER_PLATFORM);



VERSION_1 is also needed?



+
+   return features;
+}
+
+static int eni_vdpa_set_features(struct vdpa_device *vdpa, u64 features)
+{
+   struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
+
+   if (!(features & BIT_ULL(VIRTIO_NET_F_MRG_RXBUF)) && features) {
+   ENI_ERR(ldev->pci_dev,
+   "VIRTIO_NET_F_MRG_RXBUF is not negotiated\n");
+   return -EINVAL;



Do we need to make sure FEATURE_OK is not set in this case or the ENI 
can do this for us?


Other looks good.

Thanks



+   }
+
+   vp_legacy_set_features(ldev, (u32)features);
+
+   return 0;
+}
+
+static u8 eni_vdpa_get_status(struct vdpa_device *vdpa)
+{
+   struct virtio_pci_legacy_device *ldev = vdpa_to_ldev(vdpa);
+
+   return vp_legacy_get_status(ldev);
+}
+
+static int eni_vdpa_get_vq_irq(struct vdpa_device *vdpa, u16 idx)
+{
+   struct eni_vdpa *eni_vdpa = vdpa_to_eni(vdpa);
+   int irq = eni_vdpa->vring[idx].irq;
+
+   if (irq == VIRTIO_MSI_NO_VECTOR)
+   return -EINVAL;
+
+   return irq;
+}
+
+static void eni_vdpa_free_irq(struct eni_vdpa *eni_vdpa)
+{
+   struct virtio_pci_

Re: [PATCH v4 6/7] vdpa: add new attribute VDPA_ATTR_DEV_MIN_VQ_SIZE

2021-10-10 Thread Jason Wang


在 2021/9/29 下午2:11, Wu Zongyong 写道:

This attribute advertises the min value of virtqueue size. The value is
0 by default.



I think 0 is not a correct value. If I understand the spec correctly, it 
should be 1.


Thanks




Signed-off-by: Wu Zongyong 
---
  drivers/vdpa/vdpa.c   | 5 +
  include/uapi/linux/vdpa.h | 1 +
  2 files changed, 6 insertions(+)

diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 1dc121a07a93..6ed79fba33e4 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -492,6 +492,7 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff 
*msg, u32 portid, u32 seq
  int flags, struct netlink_ext_ack *extack)
  {
u16 max_vq_size;
+   u16 min_vq_size = 0;
u32 device_id;
u32 vendor_id;
void *hdr;
@@ -508,6 +509,8 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff 
*msg, u32 portid, u32 seq
device_id = vdev->config->get_device_id(vdev);
vendor_id = vdev->config->get_vendor_id(vdev);
max_vq_size = vdev->config->get_vq_num_max(vdev);
+   if (vdev->config->get_vq_num_min)
+   min_vq_size = vdev->config->get_vq_num_min(vdev);
  
  	err = -EMSGSIZE;

if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(&vdev->dev)))
@@ -520,6 +523,8 @@ vdpa_dev_fill(struct vdpa_device *vdev, struct sk_buff 
*msg, u32 portid, u32 seq
goto msg_err;
if (nla_put_u16(msg, VDPA_ATTR_DEV_MAX_VQ_SIZE, max_vq_size))
goto msg_err;
+   if (nla_put_u16(msg, VDPA_ATTR_DEV_MIN_VQ_SIZE, min_vq_size))
+   goto msg_err;
  
  	genlmsg_end(msg, hdr);

return 0;
diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
index 66a41e4ec163..e3b87879514c 100644
--- a/include/uapi/linux/vdpa.h
+++ b/include/uapi/linux/vdpa.h
@@ -32,6 +32,7 @@ enum vdpa_attr {
VDPA_ATTR_DEV_VENDOR_ID,/* u32 */
VDPA_ATTR_DEV_MAX_VQS,  /* u32 */
VDPA_ATTR_DEV_MAX_VQ_SIZE,  /* u16 */
+   VDPA_ATTR_DEV_MIN_VQ_SIZE,  /* u16 */
  
  	/* new attributes must be added above here */

VDPA_ATTR_MAX,


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 5/7] virtio_vdpa: setup correct vq size with callbacks get_vq_num_{max,min}

2021-10-10 Thread Jason Wang


在 2021/9/29 下午2:11, Wu Zongyong 写道:


Signed-off-by: Wu Zongyong 



Commit log please.



---
  drivers/virtio/virtio_vdpa.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index 72eaef2caeb1..8aa4ebe2a2a2 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -145,7 +145,8 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
/* Assume split virtqueue, switch to packed if necessary */
struct vdpa_vq_state state = {0};
unsigned long flags;
-   u32 align, num;
+   u32 align, max_num, min_num = 0;
+   bool may_reduce_num = true;
int err;
  
  	if (!name)

@@ -163,22 +164,36 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
if (!info)
return ERR_PTR(-ENOMEM);
  
-	num = ops->get_vq_num_max(vdpa);

-   if (num == 0) {
+   max_num = ops->get_vq_num_max(vdpa);
+   if (max_num == 0) {
err = -ENOENT;
goto error_new_virtqueue;
}
  
+	if (ops->get_vq_num_min)

+   min_num = ops->get_vq_num_min(vdpa);
+   if (min_num > max_num) {
+   err = -ENOENT;
+   goto error_new_virtqueue;
+   }



If we really want to do this, let's move this to vdpa core during device 
probing.


Or just leave it as is (device risk itself).

Thanks



+
+   may_reduce_num = (max_num == min_num) ? false : true;
+
/* Create the vring */
align = ops->get_vq_align(vdpa);
-   vq = vring_create_virtqueue(index, num, align, vdev,
-   true, true, ctx,
+   vq = vring_create_virtqueue(index, max_num, align, vdev,
+   true, may_reduce_num, ctx,
virtio_vdpa_notify, callback, name);
if (!vq) {
err = -ENOMEM;
goto error_new_virtqueue;
}
  
+	if (virtqueue_get_vring_size(vq) < min_num) {

+   err = -EINVAL;
+   goto err_vq;
+   }
+
/* Setup virtqueue callback */
cb.callback = virtio_vdpa_virtqueue_cb;
cb.private = info;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 4/7] vdpa: add new callback get_vq_num_min in vdpa_config_ops

2021-10-10 Thread Jason Wang


在 2021/9/29 下午2:11, Wu Zongyong 写道:

This callback is optional. For vdpa devices that not support to change
virtqueue size, get_vq_num_min and get_vq_num_max will return the same
value, so that users can choose a correct value for that device.

Suggested-by: Jason Wang 
Signed-off-by: Wu Zongyong 



Acked-by: Jason Wang 



---
  include/linux/vdpa.h | 4 
  1 file changed, 4 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index a896ee021e5f..30864848950b 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -171,6 +171,9 @@ struct vdpa_map_file {
   * @get_vq_num_max:   Get the max size of virtqueue
   *@vdev: vdpa device
   *Returns u16: max size of virtqueue
+ * @get_vq_num_min:Get the min size of virtqueue (optional)
+ * @vdev: vdpa device
+ * Returns u16: min size of virtqueue
   * @get_device_id:Get virtio device id
   *@vdev: vdpa device
   *Returns u32: virtio device id
@@ -266,6 +269,7 @@ struct vdpa_config_ops {
void (*set_config_cb)(struct vdpa_device *vdev,
  struct vdpa_callback *cb);
u16 (*get_vq_num_max)(struct vdpa_device *vdev);
+   u16 (*get_vq_num_min)(struct vdpa_device *vdev);
u32 (*get_device_id)(struct vdpa_device *vdev);
u32 (*get_vendor_id)(struct vdpa_device *vdev);
u8 (*get_status)(struct vdpa_device *vdev);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 2/7] vdpa: fix typo

2021-10-10 Thread Jason Wang


在 2021/9/29 下午2:11, Wu Zongyong 写道:

Signed-off-by: Wu Zongyong 



Acked-by: Jason Wang 



---
  include/linux/vdpa.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 3972ab765de1..a896ee021e5f 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -257,7 +257,7 @@ struct vdpa_config_ops {
struct vdpa_notification_area
(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
/* vq irq is not expected to be changed once DRIVER_OK is set */
-   int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
+   int (*get_vq_irq)(struct vdpa_device *vdev, u16 idx);
  
  	/* Device ops */

u32 (*get_vq_align)(struct vdpa_device *vdev);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 1/7] virtio-pci: introduce legacy device module

2021-10-10 Thread Jason Wang


在 2021/9/29 下午2:11, Wu Zongyong 写道:

Split common codes from virtio-pci-legacy so vDPA driver can reuse it
later.

Signed-off-by: Wu Zongyong 
---
  drivers/virtio/Kconfig |  10 ++
  drivers/virtio/Makefile|   1 +
  drivers/virtio/virtio_pci_common.c |  10 +-
  drivers/virtio/virtio_pci_common.h |   9 +-
  drivers/virtio/virtio_pci_legacy.c | 101 +++-
  drivers/virtio/virtio_pci_legacy_dev.c | 220 +
  include/linux/virtio_pci_legacy.h  |  44 +
  7 files changed, 312 insertions(+), 83 deletions(-)
  create mode 100644 drivers/virtio/virtio_pci_legacy_dev.c
  create mode 100644 include/linux/virtio_pci_legacy.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index ce1b3f6ec325..8fcf94cd2c96 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -20,6 +20,15 @@ config VIRTIO_PCI_LIB
  PCI device with possible vendor specific extensions. Any
  module that selects this module must depend on PCI.
  
+config VIRTIO_PCI_LIB_LEGACY

+   tristate
+   help
+ Legacy PCI device (Virtio PCI Card 0.9.x Draft and older device)
+ implementation.
+ This module implements the basic probe and control for devices
+ which are based on legacy PCI device. Any module that selects this
+ module must depend on PCI.
+
  menuconfig VIRTIO_MENU
bool "Virtio drivers"
default y
@@ -43,6 +52,7 @@ config VIRTIO_PCI_LEGACY
bool "Support for legacy virtio draft 0.9.X and older devices"
default y
depends on VIRTIO_PCI
+   select VIRTIO_PCI_LIB_LEGACY
help
Virtio PCI Card 0.9.X Draft (circa 2014) and older device support.
  
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile

index 699bbea0465f..0a82d0873248 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,6 +1,7 @@
  # SPDX-License-Identifier: GPL-2.0
  obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
  obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
+obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index b35bb2d57f62..d724f676608b 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -549,6 +549,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
  
  	pci_set_master(pci_dev);
  
+	vp_dev->is_legacy = vp_dev->ldev.ioaddr ? true : false;

+
rc = register_virtio_device(&vp_dev->vdev);
reg_dev = vp_dev;
if (rc)
@@ -557,10 +559,10 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
return 0;
  
  err_register:

-   if (vp_dev->ioaddr)
-virtio_pci_legacy_remove(vp_dev);
+   if (vp_dev->is_legacy)
+   virtio_pci_legacy_remove(vp_dev);
else
-virtio_pci_modern_remove(vp_dev);
+   virtio_pci_modern_remove(vp_dev);
  err_probe:
pci_disable_device(pci_dev);
  err_enable_device:
@@ -587,7 +589,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
  
  	unregister_virtio_device(&vp_dev->vdev);
  
-	if (vp_dev->ioaddr)

+   if (vp_dev->is_legacy)
virtio_pci_legacy_remove(vp_dev);
else
virtio_pci_modern_remove(vp_dev);
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index beec047a8f8d..eb17a29fc7ef 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -44,16 +45,14 @@ struct virtio_pci_vq_info {
  struct virtio_pci_device {
struct virtio_device vdev;
struct pci_dev *pci_dev;
+   struct virtio_pci_legacy_device ldev;
struct virtio_pci_modern_device mdev;
  
-	/* In legacy mode, these two point to within ->legacy. */

+   bool is_legacy;
+
/* Where to read and clear interrupt */
u8 __iomem *isr;
  
-	/* Legacy only field */

-   /* the IO mapping for the PCI config space */
-   void __iomem *ioaddr;
-
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index d62e9835aeec..82eb437ad920 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -14,6 +14,7 @@
   *  Michael S. Tsirkin 
   */
  
+#include "linux/virtio_pci_legacy.h"

  #include "virtio_pci_common.h"
  
  /* virtio config->get_features() implementation */

@@ -23,7 +24,7 @@ static u64 vp_get_features(struct virtio_device *vdev)
  
  	/* When someone needs more than 32 feature bits, we'll need to

 * steal a bit t

Re: [PATCH v4 1/7] virtio-pci: introduce legacy device module

2021-10-10 Thread Jason Wang


在 2021/9/29 下午2:11, Wu Zongyong 写道:

Split common codes from virtio-pci-legacy so vDPA driver can reuse it
later.

Signed-off-by: Wu Zongyong 



Acked-by: Jason Wang 



---
  drivers/virtio/Kconfig |  10 ++
  drivers/virtio/Makefile|   1 +
  drivers/virtio/virtio_pci_common.c |  10 +-
  drivers/virtio/virtio_pci_common.h |   9 +-
  drivers/virtio/virtio_pci_legacy.c | 101 +++-
  drivers/virtio/virtio_pci_legacy_dev.c | 220 +
  include/linux/virtio_pci_legacy.h  |  44 +
  7 files changed, 312 insertions(+), 83 deletions(-)
  create mode 100644 drivers/virtio/virtio_pci_legacy_dev.c
  create mode 100644 include/linux/virtio_pci_legacy.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index ce1b3f6ec325..8fcf94cd2c96 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -20,6 +20,15 @@ config VIRTIO_PCI_LIB
  PCI device with possible vendor specific extensions. Any
  module that selects this module must depend on PCI.
  
+config VIRTIO_PCI_LIB_LEGACY

+   tristate
+   help
+ Legacy PCI device (Virtio PCI Card 0.9.x Draft and older device)
+ implementation.
+ This module implements the basic probe and control for devices
+ which are based on legacy PCI device. Any module that selects this
+ module must depend on PCI.
+
  menuconfig VIRTIO_MENU
bool "Virtio drivers"
default y
@@ -43,6 +52,7 @@ config VIRTIO_PCI_LEGACY
bool "Support for legacy virtio draft 0.9.X and older devices"
default y
depends on VIRTIO_PCI
+   select VIRTIO_PCI_LIB_LEGACY
help
Virtio PCI Card 0.9.X Draft (circa 2014) and older device support.
  
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile

index 699bbea0465f..0a82d0873248 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,6 +1,7 @@
  # SPDX-License-Identifier: GPL-2.0
  obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
  obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
+obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index b35bb2d57f62..d724f676608b 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -549,6 +549,8 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
  
  	pci_set_master(pci_dev);
  
+	vp_dev->is_legacy = vp_dev->ldev.ioaddr ? true : false;

+
rc = register_virtio_device(&vp_dev->vdev);
reg_dev = vp_dev;
if (rc)
@@ -557,10 +559,10 @@ static int virtio_pci_probe(struct pci_dev *pci_dev,
return 0;
  
  err_register:

-   if (vp_dev->ioaddr)
-virtio_pci_legacy_remove(vp_dev);
+   if (vp_dev->is_legacy)
+   virtio_pci_legacy_remove(vp_dev);
else
-virtio_pci_modern_remove(vp_dev);
+   virtio_pci_modern_remove(vp_dev);
  err_probe:
pci_disable_device(pci_dev);
  err_enable_device:
@@ -587,7 +589,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
  
  	unregister_virtio_device(&vp_dev->vdev);
  
-	if (vp_dev->ioaddr)

+   if (vp_dev->is_legacy)
virtio_pci_legacy_remove(vp_dev);
else
virtio_pci_modern_remove(vp_dev);
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index beec047a8f8d..eb17a29fc7ef 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -44,16 +45,14 @@ struct virtio_pci_vq_info {
  struct virtio_pci_device {
struct virtio_device vdev;
struct pci_dev *pci_dev;
+   struct virtio_pci_legacy_device ldev;
struct virtio_pci_modern_device mdev;
  
-	/* In legacy mode, these two point to within ->legacy. */

+   bool is_legacy;
+
/* Where to read and clear interrupt */
u8 __iomem *isr;
  
-	/* Legacy only field */

-   /* the IO mapping for the PCI config space */
-   void __iomem *ioaddr;
-
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index d62e9835aeec..82eb437ad920 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -14,6 +14,7 @@
   *  Michael S. Tsirkin 
   */
  
+#include "linux/virtio_pci_legacy.h"

  #include "virtio_pci_common.h"
  
  /* virtio config->get_features() implementation */

@@ -23,7 +24,7 @@ static u64 vp_get_features(struct virtio_device *vdev)
  
  	/* When someone needs more than 32 feature bits, we'll need 

Re: [PATCH v5 16/16] x86/tdx: Add cmdline option to force use of ioremap_host_shared

2021-10-10 Thread Andi Kleen




The connection is quite unfortunate IMHO.
Can't there be an option
that unbreaks drivers *without* opening up security holes by
making BIOS shared?


That would require new low level APIs that distinguish both cases, and a 
tree sweep.



-Andi

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/5] iova: Move fast alloc size roundup into alloc_iova_fast()

2021-10-10 Thread Jason Wang
On Fri, Sep 24, 2021 at 6:07 PM John Garry  wrote:
>
> It really is a property of the IOVA rcache code that we need to alloc a
> power-of-2 size, so relocate the functionality to resize into
> alloc_iova_fast(), rather than the callsites.
>
> Signed-off-by: John Garry 

Acked-by: Jason Wang 

> ---
>  drivers/iommu/dma-iommu.c| 8 
>  drivers/iommu/iova.c | 9 +
>  drivers/vdpa/vdpa_user/iova_domain.c | 8 
>  3 files changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 896bea04c347..a99b3445fef8 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -444,14 +444,6 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
> iommu_domain *domain,
>
> shift = iova_shift(iovad);
> iova_len = size >> shift;
> -   /*
> -* Freeing non-power-of-two-sized allocations back into the IOVA 
> caches
> -* will come back to bite us badly, so we have to waste a bit of space
> -* rounding up anything cacheable to make sure that can't happen. The
> -* order of the unadjusted size will still match upon freeing.
> -*/
> -   if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
> -   iova_len = roundup_pow_of_two(iova_len);
>
> dma_limit = min_not_zero(dma_limit, dev->bus_dma_limit);
>
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 9e8bc802ac05..ff567cbc42f7 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -497,6 +497,15 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
> size,
> unsigned long iova_pfn;
> struct iova *new_iova;
>
> +   /*
> +* Freeing non-power-of-two-sized allocations back into the IOVA 
> caches
> +* will come back to bite us badly, so we have to waste a bit of space
> +* rounding up anything cacheable to make sure that can't happen. The
> +* order of the unadjusted size will still match upon freeing.
> +*/
> +   if (size < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
> +   size = roundup_pow_of_two(size);
> +
> iova_pfn = iova_rcache_get(iovad, size, limit_pfn + 1);
> if (iova_pfn)
> return iova_pfn;
> diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
> b/drivers/vdpa/vdpa_user/iova_domain.c
> index 1daae2608860..2b1143f11d8f 100644
> --- a/drivers/vdpa/vdpa_user/iova_domain.c
> +++ b/drivers/vdpa/vdpa_user/iova_domain.c
> @@ -292,14 +292,6 @@ vduse_domain_alloc_iova(struct iova_domain *iovad,
> unsigned long iova_len = iova_align(iovad, size) >> shift;
> unsigned long iova_pfn;
>
> -   /*
> -* Freeing non-power-of-two-sized allocations back into the IOVA 
> caches
> -* will come back to bite us badly, so we have to waste a bit of space
> -* rounding up anything cacheable to make sure that can't happen. The
> -* order of the unadjusted size will still match upon freeing.
> -*/
> -   if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
> -   iova_len = roundup_pow_of_two(iova_len);
> iova_pfn = alloc_iova_fast(iovad, iova_len, limit >> shift, true);
>
> return iova_pfn << shift;
> --
> 2.26.2
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix for skb_over_panic inside big mode

2021-10-10 Thread Jason Wang
On Sat, Oct 9, 2021 at 5:18 PM Michael S. Tsirkin  wrote:
>
> From: Xuan Zhuo 
>
> commit 126285651b7f ("Merge 
> ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
> accidentally reverted the effect of
> commit 1a8024239da ("virtio-net: fix for skb_over_panic inside big mode")
> on drivers/net/virtio_net.c
>
> As a result, users of crosvm (which is using large packet mode)
> are experiencing crashes with 5.14-rc1 and above that do not
> occur with 5.13.
>
> Crash trace:
>
> [   61.346677] skbuff: skb_over_panic: text:881ae2c7 len:3762 
> put:3762 head:8a5ec8c22000 data:8a5ec8c22010 tail:0xec2 end:0xec0 
> dev:
> [   61.369192] kernel BUG at net/core/skbuff.c:111!
> [   61.372840] invalid opcode:  [#1] SMP PTI
> [   61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-rc1 
> linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1
> [   61.376450] Hardware name: ChromiumOS crosvm, BIOS 0
>
> ..
>
> [   61.393635] Call Trace:
> [   61.394127]  
> [   61.394488]  skb_put.cold+0x10/0x10
> [   61.395095]  page_to_skb+0xf7/0x410
> [   61.395689]  receive_buf+0x81/0x1660
> [   61.396228]  ? netif_receive_skb_list_internal+0x1ad/0x2b0
> [   61.397180]  ? napi_gro_flush+0x97/0xe0
> [   61.397896]  ? detach_buf_split+0x67/0x120
> [   61.398573]  virtnet_poll+0x2cf/0x420
> [   61.399197]  __napi_poll+0x25/0x150
> [   61.399764]  net_rx_action+0x22f/0x280
> [   61.400394]  __do_softirq+0xba/0x257
> [   61.401012]  irq_exit_rcu+0x8e/0xb0
> [   61.401618]  common_interrupt+0x7b/0xa0
> [   61.402270]  
>
> See
> https://lore.kernel.org/r/5edaa2b7c2fe4abd0347b8454b2ac032b6694e2c.camel%40collabora.com
> for the report.
>
> Apply the original 1a8024239da ("virtio-net: fix for skb_over_panic inside 
> big mode")
> again, the original logic still holds:
>
> In virtio-net's large packet mode, there is a hole in the space behind
> buf.
>
> hdr_padded_len - hdr_len
>
> We must take this into account when calculating tailroom.
>
> Cc: Greg KH 
> Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's 
> sufficient tailroom")
> Fixes: 126285651b7f ("Merge 
> ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
> Signed-off-by: Xuan Zhuo 
> Reported-by: Corentin Noël 
> Tested-by: Corentin Noël 
> Signed-off-by: Michael S. Tsirkin 
> ---

Acked-by: Jason Wang 

>
> David, I think we need this in stable, too.
>
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 096c2ac6b7a6..6b0812f44bbf 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -406,7 +406,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>  * add_recvbuf_mergeable() + get_mergeable_buf_len()
>  */
> truesize = headroom ? PAGE_SIZE : truesize;
> -   tailroom = truesize - len - headroom;
> +   tailroom = truesize - len - headroom - (hdr_padded_len - hdr_len);
> buf = p - headroom;
>
> len -= hdr_len;
> --
> MST
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-10 Thread Andi Kleen




To which Andi replied
One problem with removing the ioremap opt-in is that
it's still possible for drivers to get at devices without going through 
probe.

To which Greg replied:
https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/
If there are in-kernel PCI drivers that do not do this, they need to be
fixed today.

Can you guys resolve the differences here?



I addressed this in my other mail, but we may need more discussion.




And once they are resolved, mention this in the commit log so
I don't get to re-read the series just to find out nothing
changed in this respect?

I frankly do not believe we are anywhere near being able to harden
an arbitrary kernel config against attack.


Why not? Device filter and the opt-ins together are a fairly strong 
mechanism.


And it's not that they're a lot of code or super complicated either.

You're essentially objecting to a single line change in your subsystem here.



How about creating a defconfig that makes sense for TDX then?


TDX can be used in many different ways, I don't think a defconfig is 
practical.


In theory you could do some Kconfig dependency (at the pain point of 
having separate kernel binariees), but why not just do it at run time 
then if you maintain the list anyways. That's much easier and saner for 
everyone. In the past we usually always ended up with runtime mechanism 
for similar things anyways.


Also it turns out that the filter mechanisms are needed for some arch 
drivers which are not even configurable, so alone it's probably not enough,




Anyone deviating from that better know what they are doing,
this API tweaking is just putting policy into the kernel  ...


Hardening drivers is kernel policy. It cannot be done anywhere else.


-Andi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-10 Thread Andi Kleen



On 10/9/2021 1:39 PM, Dan Williams wrote:

On Sat, Oct 9, 2021 at 2:53 AM Michael S. Tsirkin  wrote:

On Fri, Oct 08, 2021 at 05:37:07PM -0700, Kuppuswamy Sathyanarayanan wrote:

From: Andi Kleen 

For Confidential VM guests like TDX, the host is untrusted and hence
the devices emulated by the host or any data coming from the host
cannot be trusted. So the drivers that interact with the outside world
have to be hardened by sharing memory with host on need basis
with proper hardening fixes.

For the PCI driver case, to share the memory with the host add
pci_iomap_host_shared() and pci_iomap_host_shared_range() APIs.

Signed-off-by: Andi Kleen 
Signed-off-by: Kuppuswamy Sathyanarayanan 


So I proposed to make all pci mappings shared, eliminating the need
to patch drivers.

To which Andi replied
 One problem with removing the ioremap opt-in is that
 it's still possible for drivers to get at devices without going 
through probe.

To which Greg replied:
https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/
 If there are in-kernel PCI drivers that do not do this, they need to be
 fixed today.

Can you guys resolve the differences here?

I agree with you and Greg here. If a driver is accessing hardware
resources outside of the bind lifetime of one of the devices it
supports, and in a way that neither modrobe-policy nor
device-authorization -policy infrastructure can block, that sounds
like a bug report.


The 5.15 tree has something like ~2.4k IO accesses (including MMIO and 
others) in init functions that also register drivers (thanks Elena for 
the number)


Some are probably old drivers that could be fixed, but it's quite a few 
legitimate cases. For example for platform or ISA drivers that's the 
only way they can be implemented because they often have no other 
enumeration mechanism. For PCI drivers it's rarer, but also still can 
happen. One example that comes to mind here is the x86 Intel uncore 
drivers, which support a mix of MSR, ioremap and PCI config space 
accesses all from the same driver. This particular example can (and 
should be) fixed in other ways, but similar things also happen in other 
drivers, and they're not all broken. Even for the broken ones they're 
usually for some crufty old devices that has very few users, so it's 
likely untestable in practice.


My point is just that the ecosystem of devices that Linux supports is 
messy enough that there are legitimate exceptions from the "First IO 
only in probe call only" rule.


And we can't just fix them all. Even if we could it would be hard to 
maintain.


Using a "firewall model" hooking into a few strategic points like we're 
proposing here is much saner for everyone.


Now we can argue about the details. Right now what we're proposing has 
some redundancies: it has both a device model filter and low level 
filter for ioremap (this patch and some others). The low level filter is 
for catching issues that don't clearly fit into the 
"enumeration<->probe" model. You could call that redundant, but I would 
call it defense in depth or better safe than sorry. In theory it would 
be enough to have the low level opt-in only, but that would have the 
drawback that is something gets enumerated after all you would have all 
kind of weird device driver failures and in some cases even killed 
guests. So I think it makes sense to have




Fix those drivers instead of sprinkling
ioremap_shared in select places and with unclear rules about when a
driver is allowed to do "shared" mappings.


Only add it when the driver has been audited and hardened.

But I agree we need on a documented process for this. I will work on 
some documentation for a proposal. But essentially I think it should be 
some variant of what Elena has outlined in her talk at Security Summit.


https://static.sched.com/hosted_files/lssna2021/b6/LSS-HardeningLinuxGuestForCCC.pdf

That is using extra auditing/scrutiny at review time, supported with 
some static code analysis that points to the interaction points, and 
code needs to be fuzzed explicitly.


However short term it's only three virtio drivers, so this is not a 
urgent problem.



Let the new
device-authorization mechanism (with policy in userspace)



Default policy in user space just seems to be a bad idea here. Who 
should know if a driver is hardened other than the kernel? Maintaining 
the list somewhere else just doesn't make sense to me.


Also there is the more practical problem that some devices are needed 
for booting. For example in TDX we can't print something to the console 
with this mechanism, so you would never get any output before the 
initrd. Just seems like a nightmare for debugging anything. There really 
needs to be an authorization mechanism that works reasonably early.


I can see a point of having user space overrides though, but we need to 
have a sane kernel default that works early.


-Andi
___
Virtualization mailing l