Re: [RFC v4 10/11] vduse: Introduce a workqueue for irq injection

2021-03-04 Thread Jason Wang


On 2021/3/5 3:27 下午, Yongji Xie wrote:

On Fri, Mar 5, 2021 at 3:01 PM Jason Wang  wrote:


On 2021/3/5 2:36 下午, Yongji Xie wrote:

On Fri, Mar 5, 2021 at 11:42 AM Jason Wang  wrote:

On 2021/3/5 11:30 上午, Yongji Xie wrote:

On Fri, Mar 5, 2021 at 11:05 AM Jason Wang  wrote:

On 2021/3/4 4:58 下午, Yongji Xie wrote:

On Thu, Mar 4, 2021 at 2:59 PM Jason Wang  wrote:

On 2021/2/23 7:50 下午, Xie Yongji wrote:

This patch introduces a workqueue to support injecting
virtqueue's interrupt asynchronously. This is mainly
for performance considerations which makes sure the push()
and pop() for used vring can be asynchronous.

Do you have pref numbers for this patch?


No, I can do some tests for it if needed.

Another problem is the VIRTIO_RING_F_EVENT_IDX feature will be useless
if we call irq callback in ioctl context. Something like:

virtqueue_push();
virtio_notify();
ioctl()
-
irq_cb()
virtqueue_get_buf()

The used vring is always empty each time we call virtqueue_push() in
userspace. Not sure if it is what we expected.

I'm not sure I get the issue.

THe used ring should be filled by virtqueue_push() which is done by
userspace before?


After userspace call virtqueue_push(), it always call virtio_notify()
immediately. In traditional VM (vhost-vdpa) cases, virtio_notify()
will inject an irq to VM and return, then vcpu thread will call
interrupt handler. But in container (virtio-vdpa) cases,
virtio_notify() will call interrupt handler directly. So it looks like
we have to optimize the virtio-vdpa cases. But one problem is we don't
know whether we are in the VM user case or container user case.

Yes, but I still don't get why used ring is empty after the ioctl()?
Used ring does not use bounce page so it should be visible to the kernel
driver. What did I miss :) ?


Sorry, I'm not saying the kernel can't see the correct used vring. I
mean the kernel will consume the used vring in the ioctl context
directly in the virtio-vdpa case. In userspace's view, that means
virtqueue_push() is used vring's producer and virtio_notify() is used
vring's consumer. They will be called one by one in one thread rather
than different threads, which looks odd and has a bad effect on
performance.


Yes, that's why we need a workqueue (WQ_UNBOUND you used). Or do you
want to squash this patch into patch 8?

So I think we can see obvious difference when virtio-vdpa is used.


But it looks like we don't need this workqueue in vhost-vdpa cases.
Any suggestions?



I haven't had a deep thought. But I feel we can solve this by using the 
irq bypass manager (or something similar). Then we don't need it to be 
relayed via workqueue and vdpa. But I'm not sure how hard it will be.


Do you see any obvious performance regression by using the workqueue? Or 
we can optimize it in the future.


Thanks




Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v4 06/11] vduse: Implement an MMU-based IOMMU driver

2021-03-04 Thread Jason Wang


On 2021/3/5 3:13 下午, Yongji Xie wrote:

On Fri, Mar 5, 2021 at 2:52 PM Jason Wang  wrote:


On 2021/3/5 2:15 下午, Yongji Xie wrote:

Sorry if I've asked this before.

But what's the reason for maintaing a dedicated IOTLB here? I think we
could reuse vduse_dev->iommu since the device can not be used by both
virtio and vhost in the same time or use vduse_iova_domain->iotlb for
set_map().

The main difference between domain->iotlb and dev->iotlb is the way to
deal with bounce buffer. In the domain->iotlb case, bounce buffer
needs to be mapped each DMA transfer because we need to get the bounce
pages by an IOVA during DMA unmapping. In the dev->iotlb case, bounce
buffer only needs to be mapped once during initialization, which will
be used to tell userspace how to do mmap().

Also, since vhost IOTLB support per mapping token (opauqe), can we use
that instead of the bounce_pages *?

Sorry, I didn't get you here. Which value do you mean to store in the
opaque pointer?

So I would like to have a way to use a single IOTLB for manage all kinds
of mappings. Two possible ideas:

1) map bounce page one by one in vduse_dev_map_page(), in
VDUSE_IOTLB_GET_FD, try to merge the result if we had the same fd. Then
for bounce pages, userspace still only need to map it once and we can
maintain the actual mapping by storing the page or pa in the opaque
field of IOTLB entry.

Looks like userspace still needs to unmap the old region and map a new
region (size is changed) with the fd in each VDUSE_IOTLB_GET_FD ioctl.


I don't get here. Can you give an example?


For example, userspace needs to process two I/O requests (one page per
request). To process the first request, userspace uses
VDUSE_IOTLB_GET_FD ioctl to query the iova region (0 ~ 4096) and mmap
it.



I think in this case we should let VDUSE_IOTLB_GET_FD return the maximum 
range as far as they are backed by the same fd.


In the case of bounce page, it's bascially the whole range of bounce buffer?

Thanks



To process the second request, userspace uses VDUSE_IOTLB_GET_FD
ioctl to query the new iova region and map a new region (0 ~ 8192).
Then userspace needs to traverse the list of iova regions and unmap
the old region (0 ~ 4096). Looks like this is a little complex.

Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver

2021-03-04 Thread Jason Wang


On 2021/3/5 1:47 下午, Jie Deng wrote:


On 2021/3/4 17:15, Jason Wang wrote:


On 2021/3/4 9:59 上午, Jie Deng wrote:

Add an I2C bus driver for virtio para-virtualization.

The controller can be emulated by the backend driver in
any device model software by following the virtio protocol.

The device specification can be found on
https://lists.oasis-open.org/archives/virtio-comment/202101/msg8.html. 



By following the specification, people may implement different
backend drivers to emulate different controllers according to
their needs.

Co-developed-by: Conghui Chen 
Signed-off-by: Conghui Chen 
Signed-off-by: Jie Deng 
---
  drivers/i2c/busses/Kconfig  |  11 ++
  drivers/i2c/busses/Makefile |   3 +
  drivers/i2c/busses/i2c-virtio.c | 289 


  include/uapi/linux/virtio_i2c.h |  42 ++
  include/uapi/linux/virtio_ids.h |   1 +
  5 files changed, 346 insertions(+)
  create mode 100644 drivers/i2c/busses/i2c-virtio.c
  create mode 100644 include/uapi/linux/virtio_i2c.h

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 05ebf75..0860395 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -21,6 +21,17 @@ config I2C_ALI1535
    This driver can also be built as a module.  If so, the module
    will be called i2c-ali1535.
  +config I2C_VIRTIO
+    tristate "Virtio I2C Adapter"
+    depends on VIRTIO



Please use select here. There's no prompt for VIRTIO.


OK.



+    help
+  If you say yes to this option, support will be included for 
the virtio
+  I2C adapter driver. The hardware can be emulated by any 
device model

+  software according to the virtio protocol.
+
+  This driver can also be built as a module. If so, the module
+  will be called i2c-virtio.
+
  config I2C_ALI1563
  tristate "ALI 1563"
  depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 615f35e..b88da08 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -6,6 +6,9 @@
  # ACPI drivers
  obj-$(CONFIG_I2C_SCMI)    += i2c-scmi.o
  +# VIRTIO I2C host controller driver
+obj-$(CONFIG_I2C_VIRTIO)    += i2c-virtio.o
+
  # PC SMBus host controller drivers
  obj-$(CONFIG_I2C_ALI1535)    += i2c-ali1535.o
  obj-$(CONFIG_I2C_ALI1563)    += i2c-ali1563.o
diff --git a/drivers/i2c/busses/i2c-virtio.c 
b/drivers/i2c/busses/i2c-virtio.c

new file mode 100644
index 000..98c0fcc
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,289 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * The Virtio I2C Specification:
+ * 
https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex

+ *
+ * Copyright (c) 2021 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/**
+ * struct virtio_i2c - virtio I2C data
+ * @vdev: virtio device for this controller
+ * @completion: completion of virtio I2C message
+ * @adap: I2C adapter for this controller
+ * @i2c_lock: lock for virtqueue processing
+ * @vq: the virtio virtqueue for communication
+ */
+struct virtio_i2c {
+    struct virtio_device *vdev;
+    struct completion completion;
+    struct i2c_adapter adap;
+    struct mutex i2c_lock;
+    struct virtqueue *vq;
+};
+
+/**
+ * struct virtio_i2c_req - the virtio I2C request structure
+ * @out_hdr: the OUT header of the virtio I2C message
+ * @buf: the buffer into which data is read, or from which it's 
written

+ * @in_hdr: the IN header of the virtio I2C message
+ */
+struct virtio_i2c_req {
+    struct virtio_i2c_out_hdr out_hdr;
+    u8 *buf;
+    struct virtio_i2c_in_hdr in_hdr;
+};
+
+static void virtio_i2c_msg_done(struct virtqueue *vq)
+{
+    struct virtio_i2c *vi = vq->vdev->priv;
+
+    complete(&vi->completion);
+}
+
+static int virtio_i2c_send_reqs(struct virtqueue *vq,
+    struct virtio_i2c_req *reqs,
+    struct i2c_msg *msgs, int nr)
+{
+    struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr;
+    int i, outcnt, incnt, err = 0;
+    u8 *buf;
+
+    for (i = 0; i < nr; i++) {
+    if (!msgs[i].len)
+    break;
+
+    /* Only 7-bit mode supported for this moment. For the 
address format,

+ * Please check the Virtio I2C Specification.
+ */
+    reqs[i].out_hdr.addr = cpu_to_le16(msgs[i].addr << 1);
+
+    if (i != nr - 1)
+    reqs[i].out_hdr.flags |= VIRTIO_I2C_FLAGS_FAIL_NEXT;
+
+    outcnt = incnt = 0;
+    sg_init_one(&out_hdr, &reqs[i].out_hdr, 
sizeof(reqs[i].out_hdr));

+    sgs[outcnt++] = &out_hdr;
+
+    buf = kzalloc(msgs[i].len, GFP_KERNEL);
+    if (!buf)
+    break;
+
+    reqs[i].buf = buf;
+    sg_init_one(&msg_buf, reqs[i].buf, msgs[i].len);
+
+    if (msgs[i].flags & I2C_M_RD) {
+    sgs[outcnt + incnt++] = &msg_buf;
+   

Re: [PATCH v5 net-next] virtio-net: support XDP when not more queues

2021-03-04 Thread Jason Wang


On 2021/3/1 11:22 上午, Xuan Zhuo wrote:

The number of queues implemented by many virtio backends is limited,
especially some machines have a large number of CPUs. In this case, it
is often impossible to allocate a separate queue for
XDP_TX/XDP_REDIRECT, then xdp cannot be loaded to work, even xdp does
not use the XDP_TX/XDP_REDIRECT.

This patch allows XDP_TX/XDP_REDIRECT to run by reuse the existing SQ
with __netif_tx_lock() hold when there are not enough queues.

Signed-off-by: Xuan Zhuo 
Reviewed-by: Dust Li 
---
v5: change subject from 'support XDP_TX when not more queues'

v4: make sparse happy
 suggested by Jakub Kicinski

v3: add warning when no more queues
 suggested by Jesper Dangaard Brouer

  drivers/net/virtio_net.c | 53 
  1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ba8e637..55f1dd1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -195,6 +195,9 @@ struct virtnet_info {
/* # of XDP queue pairs currently used by the driver */
u16 xdp_queue_pairs;

+   /* xdp_queue_pairs may be 0, when xdp is already loaded. So add this. */
+   bool xdp_enabled;
+
/* I like... big packets and I cannot lie! */
bool big_packets;

@@ -481,14 +484,42 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
return 0;
  }

-static struct send_queue *virtnet_xdp_sq(struct virtnet_info *vi)
+static struct send_queue *virtnet_get_xdp_sq(struct virtnet_info *vi)
+   __acquires(lock)
  {
+   struct netdev_queue *txq;
unsigned int qp;

-   qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+   if (vi->curr_queue_pairs > nr_cpu_ids) {
+   qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + 
smp_processor_id();
+
+   /* tell sparse we took the lock, but don't really take it */
+   __acquire(lock);



The code can explain itself but you need to explain why we don't need to 
hold tx lock here.


And it looks to me we should use __netif_tx_acquire()/__netif_tx_release()?

Btw, is it better to refactor the code then we can annote the code with 
something like __acquire(txq->xmit_lock)?


Thanks




+   } else {
+   qp = smp_processor_id() % vi->curr_queue_pairs;
+   txq = netdev_get_tx_queue(vi->dev, qp);
+   __netif_tx_lock(txq, raw_smp_processor_id());
+   }
+
return &vi->sq[qp];
  }

+static void virtnet_put_xdp_sq(struct virtnet_info *vi, struct send_queue *sq)
+   __releases(lock)
+{
+   struct netdev_queue *txq;
+   unsigned int qp;
+
+   if (vi->curr_queue_pairs <= nr_cpu_ids) {
+   qp = sq - vi->sq;
+   txq = netdev_get_tx_queue(vi->dev, qp);
+   __netif_tx_unlock(txq);
+   } else {
+   /* make sparse happy */
+   __release(lock);
+   }
+}
+
  static int virtnet_xdp_xmit(struct net_device *dev,
int n, struct xdp_frame **frames, u32 flags)
  {
@@ -512,7 +543,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
if (!xdp_prog)
return -ENXIO;

-   sq = virtnet_xdp_sq(vi);
+   sq = virtnet_get_xdp_sq(vi);

if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) {
ret = -EINVAL;
@@ -560,12 +591,13 @@ static int virtnet_xdp_xmit(struct net_device *dev,
sq->stats.kicks += kicks;
u64_stats_update_end(&sq->stats.syncp);

+   virtnet_put_xdp_sq(vi, sq);
return ret;
  }

  static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
  {
-   return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
+   return vi->xdp_enabled ? VIRTIO_XDP_HEADROOM : 0;
  }

  /* We copy the packet for XDP in the following cases:
@@ -1457,12 +1489,13 @@ static int virtnet_poll(struct napi_struct *napi, int 
budget)
xdp_do_flush();

if (xdp_xmit & VIRTIO_XDP_TX) {
-   sq = virtnet_xdp_sq(vi);
+   sq = virtnet_get_xdp_sq(vi);
if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) 
{
u64_stats_update_begin(&sq->stats.syncp);
sq->stats.kicks++;
u64_stats_update_end(&sq->stats.syncp);
}
+   virtnet_put_xdp_sq(vi, sq);
}

return received;
@@ -2417,10 +2450,9 @@ static int virtnet_xdp_set(struct net_device *dev, 
struct bpf_prog *prog,

/* XDP requires extra queues for XDP_TX */
if (curr_qp + xdp_qp > vi->max_queue_pairs) {
-   NL_SET_ERR_MSG_MOD(extack, "Too few free TX rings available");
-   netdev_warn(dev, "request %i queues but max is %i\n",
+   netdev_warn(dev, "XDP request %i queues but max is %i. XDP_TX and 
XDP_REDIRECT will operate in a slower locked tx mode.\n",
 

Re: [RFC v4 10/11] vduse: Introduce a workqueue for irq injection

2021-03-04 Thread Jason Wang


On 2021/3/5 2:36 下午, Yongji Xie wrote:

On Fri, Mar 5, 2021 at 11:42 AM Jason Wang  wrote:


On 2021/3/5 11:30 上午, Yongji Xie wrote:

On Fri, Mar 5, 2021 at 11:05 AM Jason Wang  wrote:

On 2021/3/4 4:58 下午, Yongji Xie wrote:

On Thu, Mar 4, 2021 at 2:59 PM Jason Wang  wrote:

On 2021/2/23 7:50 下午, Xie Yongji wrote:

This patch introduces a workqueue to support injecting
virtqueue's interrupt asynchronously. This is mainly
for performance considerations which makes sure the push()
and pop() for used vring can be asynchronous.

Do you have pref numbers for this patch?


No, I can do some tests for it if needed.

Another problem is the VIRTIO_RING_F_EVENT_IDX feature will be useless
if we call irq callback in ioctl context. Something like:

virtqueue_push();
virtio_notify();
   ioctl()
-
   irq_cb()
   virtqueue_get_buf()

The used vring is always empty each time we call virtqueue_push() in
userspace. Not sure if it is what we expected.

I'm not sure I get the issue.

THe used ring should be filled by virtqueue_push() which is done by
userspace before?


After userspace call virtqueue_push(), it always call virtio_notify()
immediately. In traditional VM (vhost-vdpa) cases, virtio_notify()
will inject an irq to VM and return, then vcpu thread will call
interrupt handler. But in container (virtio-vdpa) cases,
virtio_notify() will call interrupt handler directly. So it looks like
we have to optimize the virtio-vdpa cases. But one problem is we don't
know whether we are in the VM user case or container user case.


Yes, but I still don't get why used ring is empty after the ioctl()?
Used ring does not use bounce page so it should be visible to the kernel
driver. What did I miss :) ?


Sorry, I'm not saying the kernel can't see the correct used vring. I
mean the kernel will consume the used vring in the ioctl context
directly in the virtio-vdpa case. In userspace's view, that means
virtqueue_push() is used vring's producer and virtio_notify() is used
vring's consumer. They will be called one by one in one thread rather
than different threads, which looks odd and has a bad effect on
performance.



Yes, that's why we need a workqueue (WQ_UNBOUND you used). Or do you 
want to squash this patch into patch 8?


So I think we can see obvious difference when virtio-vdpa is used.

Thanks




Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver

2021-03-04 Thread Jie Deng



On 2021/3/5 11:09, Viresh Kumar wrote:

On 05-03-21, 09:46, Jie Deng wrote:

On 2021/3/4 14:06, Viresh Kumar wrote:

depends on I2C as well ?

No need that. The dependency of I2C is included in the Kconfig in its parent
directory.

Sorry about that, I must have figured that out myself.

(Though a note on the way we reply to messages, please leave an empty line
before and after your messages, it gets difficult to find the inline replies
otherwise. )


+   if (!(req && req == &reqs[i])) {

I find this less readable compared to:
if (!req || req != &reqs[i]) {

Different people may have different tastes.

Please check Andy's comment in this link.

https://lists.linuxfoundation.org/pipermail/virtualization/2020-September/049933.html

Heh, everyone wants you to do it differently :)

If we leave compilers optimizations aside (because it will generate the exact
same code for both the cases, I tested it as well to be doubly sure), The
original statement used in this patch has 3 conditional statements in it and the
way I suggested has only two.

Andy, thoughts ?

And anyway, this isn't biggest of my worries, just that I had to notice it
somehow :)


For all the above errors where you simply break out, you still need to free the
memory for buf, right ?

Will try to use reqs[i].buf = msgs[i].buf to avoid allocation.

I think it would be better to have all such deallocations done at a single
place, i.e. after the completion callback is finished.. Trying to solve this
everywhere is going to make this more messy.



I think there is no need to have allocations/deallocations/memcpy for 
reqs[i].buf any more

if using msgs[i].buf directly.



+   mutex_lock(&vi->i2c_lock);

I have never worked with i2c stuff earlier, but I don't think you need a lock
here. The transactions seem to be serialized by the i2c-core by itself (look at
i2c_transfer() in i2c-core-base.c), though there is another exported version
__i2c_transfer() but the comment over it says the callers must take adapter lock
before calling it.

Lock is needed since no "lock_ops" is registered in this i2c_adapter.

drivers/i2c/i2c-core-base.c:

static int i2c_register_adapter(struct i2c_adapter *adap)
{
 ...

 if (!adap->lock_ops)
 adap->lock_ops = &i2c_adapter_lock_ops;

 ...
}

This should take care of it ?



The problem is that you can't guarantee that adap->algo->master_xfer is 
only called
from i2c_transfer. Any function who holds the adapter can call 
adap->algo->master_xfer

directly. So I think it is safer to have a lock in virtio_i2c_xfer.



+
+   ret = virtio_i2c_send_reqs(vq, reqs, msgs, num);
+   if (ret == 0)
+   goto err_unlock_free;
+
+   nr = ret;
+
+   virtqueue_kick(vq);
+
+   time_left = wait_for_completion_timeout(&vi->completion, adap->timeout);
+   if (!time_left) {
+   dev_err(&adap->dev, "virtio i2c backend timeout.\n");
+   ret = -ETIMEDOUT;

You need to free bufs of the requests here as well..

Just want to make sure you didn't miss this comment.



Will try to use msgs[i].buf directly. So it should be no free bufs any more.



+static struct i2c_adapter virtio_adapter = {
+   .owner = THIS_MODULE,
+   .name = "Virtio I2C Adapter",
+   .class = I2C_CLASS_DEPRECATED,

Why are we using something that is deprecated here ?

Warn users that the adapter doesn't support classes anymore.

So this is the right thing to do? Or this is what we expect from new drivers?
Sorry, I am just new to this stuff and so...



https://patchwork.ozlabs.org/project/linux-i2c/patch/20170729121143.3980-1-...@the-dreams.de/



+struct virtio_i2c_out_hdr {
+   __le16 addr;
+   __le16 padding;
+   __le32 flags;
+};

It might be worth setting __packed for the structures here, even when we have
taken care of padding ourselves, for both the structures..

Please check Michael's comment https://lkml.org/lkml/2020/9/3/339.
I agreed to remove "__packed" .

When Michael commented the structure looked like this:

Actually it can be ignored as the compiler isn't going to add any padding by
itself in this case (since you already took care of it) as the structure will be
aligned to the size of the biggest element here. I generally consider it to be a
good coding-style to make sure we don't add any unwanted stuff in there by
mistake.

Anyway, we can see it in future if this is going to be required or not, if and
when we add new fields here.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC v4 06/11] vduse: Implement an MMU-based IOMMU driver

2021-03-04 Thread Jason Wang


On 2021/3/5 2:15 下午, Yongji Xie wrote:

Sorry if I've asked this before.

But what's the reason for maintaing a dedicated IOTLB here? I think we
could reuse vduse_dev->iommu since the device can not be used by both
virtio and vhost in the same time or use vduse_iova_domain->iotlb for
set_map().


The main difference between domain->iotlb and dev->iotlb is the way to
deal with bounce buffer. In the domain->iotlb case, bounce buffer
needs to be mapped each DMA transfer because we need to get the bounce
pages by an IOVA during DMA unmapping. In the dev->iotlb case, bounce
buffer only needs to be mapped once during initialization, which will
be used to tell userspace how to do mmap().


Also, since vhost IOTLB support per mapping token (opauqe), can we use
that instead of the bounce_pages *?


Sorry, I didn't get you here. Which value do you mean to store in the
opaque pointer?

So I would like to have a way to use a single IOTLB for manage all kinds
of mappings. Two possible ideas:

1) map bounce page one by one in vduse_dev_map_page(), in
VDUSE_IOTLB_GET_FD, try to merge the result if we had the same fd. Then
for bounce pages, userspace still only need to map it once and we can
maintain the actual mapping by storing the page or pa in the opaque
field of IOTLB entry.

Looks like userspace still needs to unmap the old region and map a new
region (size is changed) with the fd in each VDUSE_IOTLB_GET_FD ioctl.



I don't get here. Can you give an example?





2) map bounce page once in vduse_dev_map_page() and store struct page
**bounce_pages in the opaque field of this single IOTLB entry.


We can get the struct page **bounce_pages from vduse_iova_domain. Why
do we need to store it in the opaque field?  Should the opaque field
be used to store vdpa_map_file?



Oh yes, you're right.




And I think it works. One problem is we need to find a place to store
the original DMA buffer's address and size. I think we can modify the
array of bounce_pages for this purpose.

Thanks,
Yongji



Yes.

Thanks




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver

2021-03-04 Thread Jie Deng


On 2021/3/4 17:15, Jason Wang wrote:


On 2021/3/4 9:59 上午, Jie Deng wrote:

Add an I2C bus driver for virtio para-virtualization.

The controller can be emulated by the backend driver in
any device model software by following the virtio protocol.

The device specification can be found on
https://lists.oasis-open.org/archives/virtio-comment/202101/msg8.html. 



By following the specification, people may implement different
backend drivers to emulate different controllers according to
their needs.

Co-developed-by: Conghui Chen 
Signed-off-by: Conghui Chen 
Signed-off-by: Jie Deng 
---
  drivers/i2c/busses/Kconfig  |  11 ++
  drivers/i2c/busses/Makefile |   3 +
  drivers/i2c/busses/i2c-virtio.c | 289 


  include/uapi/linux/virtio_i2c.h |  42 ++
  include/uapi/linux/virtio_ids.h |   1 +
  5 files changed, 346 insertions(+)
  create mode 100644 drivers/i2c/busses/i2c-virtio.c
  create mode 100644 include/uapi/linux/virtio_i2c.h

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 05ebf75..0860395 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -21,6 +21,17 @@ config I2C_ALI1535
    This driver can also be built as a module.  If so, the module
    will be called i2c-ali1535.
  +config I2C_VIRTIO
+    tristate "Virtio I2C Adapter"
+    depends on VIRTIO



Please use select here. There's no prompt for VIRTIO.


OK.



+    help
+  If you say yes to this option, support will be included for 
the virtio
+  I2C adapter driver. The hardware can be emulated by any device 
model

+  software according to the virtio protocol.
+
+  This driver can also be built as a module. If so, the module
+  will be called i2c-virtio.
+
  config I2C_ALI1563
  tristate "ALI 1563"
  depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 615f35e..b88da08 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -6,6 +6,9 @@
  # ACPI drivers
  obj-$(CONFIG_I2C_SCMI)    += i2c-scmi.o
  +# VIRTIO I2C host controller driver
+obj-$(CONFIG_I2C_VIRTIO)    += i2c-virtio.o
+
  # PC SMBus host controller drivers
  obj-$(CONFIG_I2C_ALI1535)    += i2c-ali1535.o
  obj-$(CONFIG_I2C_ALI1563)    += i2c-ali1563.o
diff --git a/drivers/i2c/busses/i2c-virtio.c 
b/drivers/i2c/busses/i2c-virtio.c

new file mode 100644
index 000..98c0fcc
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,289 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * The Virtio I2C Specification:
+ * 
https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex

+ *
+ * Copyright (c) 2021 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/**
+ * struct virtio_i2c - virtio I2C data
+ * @vdev: virtio device for this controller
+ * @completion: completion of virtio I2C message
+ * @adap: I2C adapter for this controller
+ * @i2c_lock: lock for virtqueue processing
+ * @vq: the virtio virtqueue for communication
+ */
+struct virtio_i2c {
+    struct virtio_device *vdev;
+    struct completion completion;
+    struct i2c_adapter adap;
+    struct mutex i2c_lock;
+    struct virtqueue *vq;
+};
+
+/**
+ * struct virtio_i2c_req - the virtio I2C request structure
+ * @out_hdr: the OUT header of the virtio I2C message
+ * @buf: the buffer into which data is read, or from which it's written
+ * @in_hdr: the IN header of the virtio I2C message
+ */
+struct virtio_i2c_req {
+    struct virtio_i2c_out_hdr out_hdr;
+    u8 *buf;
+    struct virtio_i2c_in_hdr in_hdr;
+};
+
+static void virtio_i2c_msg_done(struct virtqueue *vq)
+{
+    struct virtio_i2c *vi = vq->vdev->priv;
+
+    complete(&vi->completion);
+}
+
+static int virtio_i2c_send_reqs(struct virtqueue *vq,
+    struct virtio_i2c_req *reqs,
+    struct i2c_msg *msgs, int nr)
+{
+    struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr;
+    int i, outcnt, incnt, err = 0;
+    u8 *buf;
+
+    for (i = 0; i < nr; i++) {
+    if (!msgs[i].len)
+    break;
+
+    /* Only 7-bit mode supported for this moment. For the 
address format,

+ * Please check the Virtio I2C Specification.
+ */
+    reqs[i].out_hdr.addr = cpu_to_le16(msgs[i].addr << 1);
+
+    if (i != nr - 1)
+    reqs[i].out_hdr.flags |= VIRTIO_I2C_FLAGS_FAIL_NEXT;
+
+    outcnt = incnt = 0;
+    sg_init_one(&out_hdr, &reqs[i].out_hdr, 
sizeof(reqs[i].out_hdr));

+    sgs[outcnt++] = &out_hdr;
+
+    buf = kzalloc(msgs[i].len, GFP_KERNEL);
+    if (!buf)
+    break;
+
+    reqs[i].buf = buf;
+    sg_init_one(&msg_buf, reqs[i].buf, msgs[i].len);
+
+    if (msgs[i].flags & I2C_M_RD) {
+    sgs[outcnt + incnt++] = &msg_buf;
+    } else {
+    memcpy(reqs[i]

Re: [RFC v4 11/11] vduse: Support binding irq to the specified cpu

2021-03-04 Thread Jason Wang


On 2021/3/5 11:37 上午, Yongji Xie wrote:

On Fri, Mar 5, 2021 at 11:11 AM Jason Wang  wrote:


On 2021/3/4 4:19 下午, Yongji Xie wrote:

On Thu, Mar 4, 2021 at 3:30 PM Jason Wang  wrote:

On 2021/2/23 7:50 下午, Xie Yongji wrote:

Add a parameter for the ioctl VDUSE_INJECT_VQ_IRQ to support
injecting virtqueue's interrupt to the specified cpu.

How userspace know which CPU is this irq for? It looks to me we need to
do it at different level.

E.g introduce some API in sys to allow admin to tune for that.

But I think we can do that in antoher patch on top of this series.


OK. I will think more about it.


It should be soemthing like
/sys/class/vduse/$dev_name/vq/0/irq_affinity. Also need to make sure
eventfd could not be reused.


Looks like we doesn't use eventfd now. Do you mean we need to use
eventfd in this case?



No, I meant if we're using eventfd, do we allow a single eventfd to be 
used for injecting irq for more than one virtqueue? (If not, I guess it 
should be ok).


Thanks




Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v4 10/11] vduse: Introduce a workqueue for irq injection

2021-03-04 Thread Jason Wang


On 2021/3/5 11:30 上午, Yongji Xie wrote:

On Fri, Mar 5, 2021 at 11:05 AM Jason Wang  wrote:


On 2021/3/4 4:58 下午, Yongji Xie wrote:

On Thu, Mar 4, 2021 at 2:59 PM Jason Wang  wrote:

On 2021/2/23 7:50 下午, Xie Yongji wrote:

This patch introduces a workqueue to support injecting
virtqueue's interrupt asynchronously. This is mainly
for performance considerations which makes sure the push()
and pop() for used vring can be asynchronous.

Do you have pref numbers for this patch?


No, I can do some tests for it if needed.

Another problem is the VIRTIO_RING_F_EVENT_IDX feature will be useless
if we call irq callback in ioctl context. Something like:

virtqueue_push();
virtio_notify();
  ioctl()
-
  irq_cb()
  virtqueue_get_buf()

The used vring is always empty each time we call virtqueue_push() in
userspace. Not sure if it is what we expected.


I'm not sure I get the issue.

THe used ring should be filled by virtqueue_push() which is done by
userspace before?


After userspace call virtqueue_push(), it always call virtio_notify()
immediately. In traditional VM (vhost-vdpa) cases, virtio_notify()
will inject an irq to VM and return, then vcpu thread will call
interrupt handler. But in container (virtio-vdpa) cases,
virtio_notify() will call interrupt handler directly. So it looks like
we have to optimize the virtio-vdpa cases. But one problem is we don't
know whether we are in the VM user case or container user case.



Yes, but I still don't get why used ring is empty after the ioctl()? 
Used ring does not use bounce page so it should be visible to the kernel 
driver. What did I miss :) ?


Thanks





Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v4 06/11] vduse: Implement an MMU-based IOMMU driver

2021-03-04 Thread Jason Wang


On 2021/3/4 1:12 下午, Yongji Xie wrote:

On Thu, Mar 4, 2021 at 12:21 PM Jason Wang  wrote:


On 2021/2/23 7:50 下午, Xie Yongji wrote:

This implements a MMU-based IOMMU driver to support mapping
kernel dma buffer into userspace. The basic idea behind it is
treating MMU (VA->PA) as IOMMU (IOVA->PA). The driver will set
up MMU mapping instead of IOMMU mapping for the DMA transfer so
that the userspace process is able to use its virtual address to
access the dma buffer in kernel.

And to avoid security issue, a bounce-buffering mechanism is
introduced to prevent userspace accessing the original buffer
directly.

Signed-off-by: Xie Yongji 
---
   drivers/vdpa/vdpa_user/iova_domain.c | 486 
+++
   drivers/vdpa/vdpa_user/iova_domain.h |  61 +
   2 files changed, 547 insertions(+)
   create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
   create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h

diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
b/drivers/vdpa/vdpa_user/iova_domain.c
new file mode 100644
index ..9285d430d486
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -0,0 +1,486 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * MMU-based IOMMU implementation
+ *
+ * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define IOVA_START_PFN 1
+#define IOVA_ALLOC_ORDER 12
+#define IOVA_ALLOC_SIZE (1 << IOVA_ALLOC_ORDER)
+
+static inline struct page *
+vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova)
+{
+ u64 index = iova >> PAGE_SHIFT;
+
+ return domain->bounce_pages[index];
+}
+
+static inline void
+vduse_domain_set_bounce_page(struct vduse_iova_domain *domain,
+ u64 iova, struct page *page)
+{
+ u64 index = iova >> PAGE_SHIFT;
+
+ domain->bounce_pages[index] = page;
+}
+
+static enum dma_data_direction perm_to_dir(int perm)
+{
+ enum dma_data_direction dir;
+
+ switch (perm) {
+ case VHOST_MAP_WO:
+ dir = DMA_FROM_DEVICE;
+ break;
+ case VHOST_MAP_RO:
+ dir = DMA_TO_DEVICE;
+ break;
+ case VHOST_MAP_RW:
+ dir = DMA_BIDIRECTIONAL;
+ break;
+ default:
+ break;
+ }
+
+ return dir;
+}
+
+static int dir_to_perm(enum dma_data_direction dir)
+{
+ int perm = -EFAULT;
+
+ switch (dir) {
+ case DMA_FROM_DEVICE:
+ perm = VHOST_MAP_WO;
+ break;
+ case DMA_TO_DEVICE:
+ perm = VHOST_MAP_RO;
+ break;
+ case DMA_BIDIRECTIONAL:
+ perm = VHOST_MAP_RW;
+ break;
+ default:
+ break;
+ }
+
+ return perm;
+}


Let's move the above two helpers to vhost_iotlb.h so they could be used
by other driver e.g (vpda_sim)


Sure.


+
+static void do_bounce(phys_addr_t orig, void *addr, size_t size,
+ enum dma_data_direction dir)
+{
+ unsigned long pfn = PFN_DOWN(orig);
+
+ if (PageHighMem(pfn_to_page(pfn))) {
+ unsigned int offset = offset_in_page(orig);
+ char *buffer;
+ unsigned int sz = 0;
+ unsigned long flags;
+
+ while (size) {
+ sz = min_t(size_t, PAGE_SIZE - offset, size);
+
+ local_irq_save(flags);
+ buffer = kmap_atomic(pfn_to_page(pfn));
+ if (dir == DMA_TO_DEVICE)
+ memcpy(addr, buffer + offset, sz);
+ else
+ memcpy(buffer + offset, addr, sz);
+ kunmap_atomic(buffer);
+ local_irq_restore(flags);


I wonder why we need to deal with highmem and irq flags explicitly like
this. Doesn't kmap_atomic() will take care all of those?


Yes, irq flags is useless here. Will remove it.


+
+ size -= sz;
+ pfn++;
+ addr += sz;
+ offset = 0;
+ }
+ } else if (dir == DMA_TO_DEVICE) {
+ memcpy(addr, phys_to_virt(orig), size);
+ } else {
+ memcpy(phys_to_virt(orig), addr, size);
+ }
+}
+
+static struct page *
+vduse_domain_get_mapping_page(struct vduse_iova_domain *domain, u64 iova)
+{
+ u64 start = iova & PAGE_MASK;
+ u64 last = start + PAGE_SIZE - 1;
+ struct vhost_iotlb_map *map;
+ struct page *page = NULL;
+
+ spin_lock(&domain->iotlb_lock);
+ map = vhost_iotlb_itree_first(domain->iotlb, start, last);
+ if (!map)
+ goto out;
+
+ page = pfn_to_page((map->addr + iova - map->start) >> PAGE_SHIFT);
+ get_page(page);
+out:
+ spin_unlock(&domain->iotlb_lock);
+
+ return page;
+}
+
+static struct page *
+vduse_domain_alloc_bounce_page(struct vduse_iova_domain *domain, u64 iova)
+{
+ u64 star

Re: [RFC v4 07/11] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-03-04 Thread Jason Wang


On 2021/3/4 4:05 下午, Yongji Xie wrote:

On Thu, Mar 4, 2021 at 2:27 PM Jason Wang  wrote:


On 2021/2/23 7:50 下午, Xie Yongji wrote:

This VDUSE driver enables implementing vDPA devices in userspace.
Both control path and data path of vDPA devices will be able to
be handled in userspace.

In the control path, the VDUSE driver will make use of message
mechnism to forward the config operation from vdpa bus driver
to userspace. Userspace can use read()/write() to receive/reply
those control messages.

In the data path, VDUSE_IOTLB_GET_FD ioctl will be used to get
the file descriptors referring to vDPA device's iova regions. Then
userspace can use mmap() to access those iova regions. Besides,
userspace can use ioctl() to inject interrupt and use the eventfd
mechanism to receive virtqueue kicks.

Signed-off-by: Xie Yongji 
---
   Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
   drivers/vdpa/Kconfig   |   10 +
   drivers/vdpa/Makefile  |1 +
   drivers/vdpa/vdpa_user/Makefile|5 +
   drivers/vdpa/vdpa_user/vduse_dev.c | 1348 

   include/uapi/linux/vduse.h |  136 ++
   6 files changed, 1501 insertions(+)
   create mode 100644 drivers/vdpa/vdpa_user/Makefile
   create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
   create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index a4c75a28c839..71722e6f8f23 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
   'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
conflict!
   '|'   00-7F  linux/media.h
   0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
   0x89  00-06  arch/x86/include/asm/sockios.h
   0x89  0B-DF  linux/sockios.h
   0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index ffd1e098bfd2..92f07715e3b6 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -25,6 +25,16 @@ config VDPA_SIM_NET
   help
 vDPA networking device simulator which loops TX traffic back to RX.

+config VDPA_USER
+ tristate "VDUSE (vDPA Device in Userspace) support"
+ depends on EVENTFD && MMU && HAS_DMA
+ select DMA_OPS
+ select VHOST_IOTLB
+ select IOMMU_IOVA
+ help
+   With VDUSE it is possible to emulate a vDPA Device
+   in a userspace program.
+
   config IFCVF
   tristate "Intel IFC VF vDPA driver"
   depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index d160e9b63a66..66e97778ad03 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,5 +1,6 @@
   # SPDX-License-Identifier: GPL-2.0
   obj-$(CONFIG_VDPA) += vdpa.o
   obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
   obj-$(CONFIG_IFCVF)+= ifcvf/
   obj-$(CONFIG_MLX5_VDPA) += mlx5/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..393bf99c48be
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1348 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_VERSION  "1.0"
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+
+struct vduse_virtqueue {
+ u16 index;
+ bool ready;
+ spinlock_t kick_lock;
+ spinlock_t irq_lock;
+ struct eventfd_ctx *kickfd;
+ struct vdpa_callback cb;
+};
+
+struct vduse_dev;
+
+struct vduse_vdpa {
+ struct vdpa_device vdpa;
+ struct vduse_dev *dev;
+};
+
+struct vduse_dev {
+ struct vduse_vdpa *vdev;
+ struct device dev;
+ struct cdev cdev;
+ struct vduse_virtqueue *vqs;
+ struct vduse_iova_domain *domain;
+ struct vhost_iotlb *iommu;
+ spinlock_t iommu_lock;
+ atomic_t bounce_map;
+ struct mutex msg_lock;
+ atomic64_t msg_unique;


"next_request_id" should be better.


OK

Re: [RFC v4 11/11] vduse: Support binding irq to the specified cpu

2021-03-04 Thread Jason Wang


On 2021/3/4 4:19 下午, Yongji Xie wrote:

On Thu, Mar 4, 2021 at 3:30 PM Jason Wang  wrote:


On 2021/2/23 7:50 下午, Xie Yongji wrote:

Add a parameter for the ioctl VDUSE_INJECT_VQ_IRQ to support
injecting virtqueue's interrupt to the specified cpu.


How userspace know which CPU is this irq for? It looks to me we need to
do it at different level.

E.g introduce some API in sys to allow admin to tune for that.

But I think we can do that in antoher patch on top of this series.


OK. I will think more about it.



It should be soemthing like 
/sys/class/vduse/$dev_name/vq/0/irq_affinity. Also need to make sure 
eventfd could not be reused.


Thanks




Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v4 10/11] vduse: Introduce a workqueue for irq injection

2021-03-04 Thread Jason Wang


On 2021/3/4 4:58 下午, Yongji Xie wrote:

On Thu, Mar 4, 2021 at 2:59 PM Jason Wang  wrote:


On 2021/2/23 7:50 下午, Xie Yongji wrote:

This patch introduces a workqueue to support injecting
virtqueue's interrupt asynchronously. This is mainly
for performance considerations which makes sure the push()
and pop() for used vring can be asynchronous.


Do you have pref numbers for this patch?


No, I can do some tests for it if needed.

Another problem is the VIRTIO_RING_F_EVENT_IDX feature will be useless
if we call irq callback in ioctl context. Something like:

virtqueue_push();
virtio_notify();
 ioctl()
-
 irq_cb()
 virtqueue_get_buf()

The used vring is always empty each time we call virtqueue_push() in
userspace. Not sure if it is what we expected.



I'm not sure I get the issue.

THe used ring should be filled by virtqueue_push() which is done by 
userspace before?


Thanks




Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-03-04 Thread Jason Wang


On 2021/3/4 9:50 下午, Cornelia Huck wrote:

On Thu, 4 Mar 2021 16:24:16 +0800
Jason Wang  wrote:


On 2021/3/3 4:29 下午, Cornelia Huck wrote:

On Wed, 3 Mar 2021 12:01:01 +0800
Jason Wang  wrote:
  

On 2021/3/2 8:08 下午, Cornelia Huck wrote:

On Mon, 1 Mar 2021 11:51:08 +0800
Jason Wang  wrote:
 

On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote:

On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote:

On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote:

Confused. What is wrong with the above? It never reads the
field unless the feature has been offered by device.

So the spec said:

"

The following driver-read-only field, max_virtqueue_pairs only exists if
VIRTIO_NET_F_MQ is set.

"

If I read this correctly, there will be no max_virtqueue_pairs field if the
VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() violates
what spec said.

Thanks

I think that's a misunderstanding. This text was never intended to
imply that field offsets change beased on feature bits.
We had this pain with legacy and we never wanted to go back there.

This merely implies that without VIRTIO_NET_F_MQ the field
should not be accessed. Exists in the sense "is accessible to driver".

Let's just clarify that in the spec, job done.

Ok, agree. That will make things more eaiser.

Yes, that makes much more sense.

What about adding the following to the "Basic Facilities of a Virtio
Device/Device Configuration Space" section of the spec:

"If an optional configuration field does not exist, the corresponding
space is still present, but reserved."

This became interesting after re-reading some of the qemu codes.

E.g in virtio-net.c we had:

*static VirtIOFeature feature_sizes[] = {
       {.flags = 1ULL << VIRTIO_NET_F_MAC,
    .end = endof(struct virtio_net_config, mac)},
       {.flags = 1ULL << VIRTIO_NET_F_STATUS,
    .end = endof(struct virtio_net_config, status)},
       {.flags = 1ULL << VIRTIO_NET_F_MQ,
    .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
       {.flags = 1ULL << VIRTIO_NET_F_MTU,
    .end = endof(struct virtio_net_config, mtu)},
       {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
    .end = endof(struct virtio_net_config, duplex)},
       {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL <<
VIRTIO_NET_F_HASH_REPORT),
    .end = endof(struct virtio_net_config, supported_hash_types)},
       {}
};*

*It has a implict dependency chain. E.g MTU doesn't presnet if
DUPLEX/RSS is not offered ...
*

But I think it covers everything up to the relevant field, no? So MTU
is included if we have the feature bit, even if we don't have
DUPLEX/RSS.

Given that a config space may be shorter (but must not collapse
non-existing fields), maybe a better wording would be:

"If an optional configuration field does not exist, the corresponding
space will still be present if it is not at the end of the
configuration space (i.e., further configuration fields exist.)


This should work but I think we need to define the end of configuration
space first?

What about sidestepping this:

"...the corresponding space will still be present, unless no further
configuration fields exist."

?



It might work. (I wonder maybe we can give some example in the spec).

Thanks





This
implies that a given field, if it exists, is always at the same offset
from the beginning of the configuration space."


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver

2021-03-04 Thread Jie Deng



On 2021/3/4 14:06, Viresh Kumar wrote:

Please always supply version history, it makes it difficult to review otherwise.

I will add the history.

  drivers/i2c/busses/Kconfig  |  11 ++
  drivers/i2c/busses/Makefile |   3 +
  drivers/i2c/busses/i2c-virtio.c | 289 
  include/uapi/linux/virtio_i2c.h |  42 ++
  include/uapi/linux/virtio_ids.h |   1 +
  5 files changed, 346 insertions(+)
  create mode 100644 drivers/i2c/busses/i2c-virtio.c
  create mode 100644 include/uapi/linux/virtio_i2c.h

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 05ebf75..0860395 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -21,6 +21,17 @@ config I2C_ALI1535
  This driver can also be built as a module.  If so, the module
  will be called i2c-ali1535.
  
+config I2C_VIRTIO

+   tristate "Virtio I2C Adapter"
+   depends on VIRTIO

depends on I2C as well ?
No need that. The dependency of I2C is included in the Kconfig in its 
parent directory.



+   help
+ If you say yes to this option, support will be included for the virtio
+ I2C adapter driver. The hardware can be emulated by any device model
+ software according to the virtio protocol.
+
+ This driver can also be built as a module. If so, the module
+ will be called i2c-virtio.
+
  config I2C_ALI1563
tristate "ALI 1563"
depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 615f35e..b88da08 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -6,6 +6,9 @@
  # ACPI drivers
  obj-$(CONFIG_I2C_SCMI)+= i2c-scmi.o
  
+# VIRTIO I2C host controller driver

+obj-$(CONFIG_I2C_VIRTIO)   += i2c-virtio.o
+

Not that it is important but I would have added it towards the end instead of at
the top of the file..

I'm OK to add it to the end.



  # PC SMBus host controller drivers
  obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o
  obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o
diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
new file mode 100644
index 000..98c0fcc
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,289 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * The Virtio I2C Specification:
+ * 
https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex
+ *
+ * Copyright (c) 2021 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 

I don't think above two are required here..


+#include 
+#include 
+#include 

same here.


+#include 

same here.

Will confirm and remove them if they are not required. Thank you.

+

And this blank line as well, since all are standard linux headers.

+#include 
+#include 
+
+/**
+ * struct virtio_i2c - virtio I2C data
+ * @vdev: virtio device for this controller
+ * @completion: completion of virtio I2C message
+ * @adap: I2C adapter for this controller
+ * @i2c_lock: lock for virtqueue processing
+ * @vq: the virtio virtqueue for communication
+ */
+struct virtio_i2c {
+   struct virtio_device *vdev;
+   struct completion completion;
+   struct i2c_adapter adap;
+   struct mutex i2c_lock;

i2c_ is redundant here. "lock" sounds good enough.

OK.

+   struct virtqueue *vq;
+};
+
+/**
+ * struct virtio_i2c_req - the virtio I2C request structure
+ * @out_hdr: the OUT header of the virtio I2C message
+ * @buf: the buffer into which data is read, or from which it's written
+ * @in_hdr: the IN header of the virtio I2C message
+ */
+struct virtio_i2c_req {
+   struct virtio_i2c_out_hdr out_hdr;
+   u8 *buf;
+   struct virtio_i2c_in_hdr in_hdr;
+};
+
+static void virtio_i2c_msg_done(struct virtqueue *vq)
+{
+   struct virtio_i2c *vi = vq->vdev->priv;
+
+   complete(&vi->completion);
+}
+
+static int virtio_i2c_send_reqs(struct virtqueue *vq,
+   struct virtio_i2c_req *reqs,
+   struct i2c_msg *msgs, int nr)
+{
+   struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr;
+   int i, outcnt, incnt, err = 0;
+   u8 *buf;
+
+   for (i = 0; i < nr; i++) {
+   if (!msgs[i].len)
+   break;
+
+   /* Only 7-bit mode supported for this moment. For the address 
format,
+* Please check the Virtio I2C Specification.
+*/

Please use proper comment style.

will fix it.


 /*
  * Only 7-bit mode supported for now, check Virtio I2C
  * specification for format of "addr" field.
  */


+   reqs[i].out_hdr.addr = cpu_to_le16(msgs[i].addr << 1);
+
+   if (i != nr - 1)
+   reqs[i].out_hdr.flags |= VIRTIO_I2C_FLAGS_FAIL_NEXT;

Since flags field hasn't been touched anywhe

RE: [RFC PATCH 06/18] virt/mshv: create, initialize, finalize, delete partition hypercalls

2021-03-04 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Thursday, March 
4, 2021 3:49 PM
> 
> On 2/8/2021 11:42 AM, Michael Kelley wrote:
> > From: Nuno Das Neves  Sent: Friday, 
> > November
> 20, 2020 4:30 PM
> >>

[snip]

> >> +
> >> +static int
> >> +hv_call_create_partition(
> >> +  u64 flags,
> >> +  struct hv_partition_creation_properties creation_properties,
> >> +  u64 *partition_id)
> >> +{
> >> +  struct hv_create_partition_in *input;
> >> +  struct hv_create_partition_out *output;
> >> +  int status;
> >> +  int ret;
> >> +  unsigned long irq_flags;
> >> +  int i;
> >> +
> >> +  do {
> >> +  local_irq_save(irq_flags);
> >> +  input = (struct hv_create_partition_in *)(*this_cpu_ptr(
> >> +  hyperv_pcpu_input_arg));
> >> +  output = (struct hv_create_partition_out *)(*this_cpu_ptr(
> >> +  hyperv_pcpu_output_arg));
> >> +
> >> +  input->flags = flags;
> >> +  input->proximity_domain_info.as_uint64 = 0;
> >> +  input->compatibility_version = HV_COMPATIBILITY_MANGANESE;
> >> +  for (i = 0; i < HV_PARTITION_PROCESSOR_FEATURE_BANKS; ++i)
> >> +  input->partition_creation_properties
> >> +  .disabled_processor_features.as_uint64[i] = 0;
> >> +  input->partition_creation_properties
> >> +  .disabled_processor_xsave_features.as_uint64 = 0;
> >> +  input->isolation_properties.as_uint64 = 0;
> >> +
> >> +  status = hv_do_hypercall(HVCALL_CREATE_PARTITION,
> >> +   input, output);
> >
> > hv_do_hypercall returns a u64, which should then be masked with
> > HV_HYPERCALL_RESULT_MASK before checking the result.
> >
> 
> Yes, I'll fix this everywhere.
> 
> >> +  if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
> >> +  if (status == HV_STATUS_SUCCESS)
> >> +  *partition_id = output->partition_id;
> >> +  else
> >> +  pr_err("%s: %s\n",
> >> + __func__, hv_status_to_string(status));
> >> +  local_irq_restore(irq_flags);
> >> +  ret = -hv_status_to_errno(status);
> >> +  break;
> >> +  }
> >> +  local_irq_restore(irq_flags);
> >> +  ret = hv_call_deposit_pages(NUMA_NO_NODE,
> >> +  hv_current_partition_id, 1);
> >> +  } while (!ret);
> >> +
> >> +  return ret;
> >> +}
> >> +

I had a separate thread on the linux-hyperv mailing list about the
inconsistency in how we check hypercall status in current upstream
code, and proposed some helper functions to make it easier and
more consistent.  Joe Salisbury has started work on a patch to
provide those helper functions and to start using them in current
upstream code.  You could coordinate with Joe to get the helper
functions as well and use them as discussed in that thread.  Then
later on we won't have to come back and fix up the uses in this
patch series.

Michael
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Freedreno] [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-04 Thread Rob Clark
On Thu, Mar 4, 2021 at 7:48 AM Robin Murphy  wrote:
>
> On 2021-03-01 08:42, Christoph Hellwig wrote:
> > Signed-off-by: Christoph Hellwig 
>
> Moreso than the previous patch, where the feature is at least relatively
> generic (note that there's a bunch of in-flight development around
> DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to
> bloat the generic iommu_ops structure with private driver-specific
> interfaces. The attribute interface is a great compromise for these
> kinds of things, and you can easily add type-checked wrappers around it
> for external callers (maybe even make the actual attributes internal
> between the IOMMU core and drivers) if that's your concern.

I suppose if this is *just* for the GPU we could move it into adreno_smmu_priv..

But one thing I'm not sure about is whether
IO_PGTABLE_QUIRK_ARM_OUTER_WBWA is something that other devices
*should* be using as well, but just haven't gotten around to yet.

BR,
-R

> Robin.
>
> > ---
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +-
> >   drivers/iommu/arm/arm-smmu/arm-smmu.c   | 40 +++--
> >   drivers/iommu/iommu.c   |  9 ++
> >   include/linux/iommu.h   |  9 +-
> >   4 files changed, 29 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
> > b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > index 0f184c3dd9d9ec..78d98ab2ee3a68 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > @@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain 
> > *iommu)
> >   struct io_pgtable_domain_attr pgtbl_cfg;
> >
> >   pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
> > - iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg);
> > + iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg);
> >   }
> >
> >   struct msm_gem_address_space *
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index 2e17d990d04481..2858999c86dfd1 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct 
> > iommu_domain *domain)
> >   return ret;
> >   }
> >
> > -static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
> > - enum iommu_attr attr, void *data)
> > +static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain,
> > + struct io_pgtable_domain_attr *pgtbl_cfg)
> >   {
> > - int ret = 0;
> >   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > + int ret = -EPERM;
> >
> > - mutex_lock(&smmu_domain->init_mutex);
> > -
> > - switch(domain->type) {
> > - case IOMMU_DOMAIN_UNMANAGED:
> > - switch (attr) {
> > - case DOMAIN_ATTR_IO_PGTABLE_CFG: {
> > - struct io_pgtable_domain_attr *pgtbl_cfg = data;
> > -
> > - if (smmu_domain->smmu) {
> > - ret = -EPERM;
> > - goto out_unlock;
> > - }
> > + if (domain->type != IOMMU_DOMAIN_UNMANAGED)
> > + return -EINVAL;
> >
> > - smmu_domain->pgtbl_cfg = *pgtbl_cfg;
> > - break;
> > - }
> > - default:
> > - ret = -ENODEV;
> > - }
> > - break;
> > - case IOMMU_DOMAIN_DMA:
> > - ret = -ENODEV;
> > - break;
> > - default:
> > - ret = -EINVAL;
> > + mutex_lock(&smmu_domain->init_mutex);
> > + if (!smmu_domain->smmu) {
> > + smmu_domain->pgtbl_cfg = *pgtbl_cfg;
> > + ret = 0;
> >   }
> > -out_unlock:
> >   mutex_unlock(&smmu_domain->init_mutex);
> > +
> >   return ret;
> >   }
> >
> > @@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {
> >   .device_group   = arm_smmu_device_group,
> >   .dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
> >   .dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
> > - .domain_set_attr= arm_smmu_domain_set_attr,
> > + .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr,
> >   .domain_enable_nesting  = arm_smmu_domain_enable_nesting,
> >   .of_xlate   = arm_smmu_of_xlate,
> >   .get_resv_regions   = arm_smmu_get_resv_regions,
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 2e9e058501a953..8490aefd4b41f8 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain 
> > *domain)
> >   }
> >   EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting);
> >
> > +int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,
> > + struct io_pgtable_domain_attr *pgtbl_

Re: [patch 1/7] drm/ttm: Replace kmap_atomic() usage

2021-03-04 Thread Christian König



Am 03.03.21 um 14:20 schrieb Thomas Gleixner:

From: Thomas Gleixner 

There is no reason to disable pagefaults and preemption as a side effect of
kmap_atomic_prot().

Use kmap_local_page_prot() instead and document the reasoning for the
mapping usage with the given pgprot.

Remove the NULL pointer check for the map. These functions return a valid
address for valid pages and the return was bogus anyway as it would have
left preemption and pagefaults disabled.

Signed-off-by: Thomas Gleixner 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
---
  drivers/gpu/drm/ttm/ttm_bo_util.c |   20 
  1 file changed, 12 insertions(+), 8 deletions(-)

--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -181,13 +181,15 @@ static int ttm_copy_io_ttm_page(struct t
return -ENOMEM;
  
  	src = (void *)((unsigned long)src + (page << PAGE_SHIFT));

-   dst = kmap_atomic_prot(d, prot);
-   if (!dst)
-   return -ENOMEM;
+   /*
+* Ensure that a highmem page is mapped with the correct
+* pgprot. For non highmem the mapping is already there.
+*/


I find the comment a bit misleading. Maybe write:

/*
 * Locally map highmem pages with the correct pgprot.
 * Normal memory should already have the correct pgprot in the linear 
mapping.

 */

Apart from that looks good to me.

Regards,
Christian.


+   dst = kmap_local_page_prot(d, prot);
  
  	memcpy_fromio(dst, src, PAGE_SIZE);
  
-	kunmap_atomic(dst);

+   kunmap_local(dst);
  
  	return 0;

  }
@@ -203,13 +205,15 @@ static int ttm_copy_ttm_io_page(struct t
return -ENOMEM;
  
  	dst = (void *)((unsigned long)dst + (page << PAGE_SHIFT));

-   src = kmap_atomic_prot(s, prot);
-   if (!src)
-   return -ENOMEM;
+   /*
+* Ensure that a highmem page is mapped with the correct
+* pgprot. For non highmem the mapping is already there.
+*/
+   src = kmap_local_page_prot(s, prot);
  
  	memcpy_toio(dst, src, PAGE_SIZE);
  
-	kunmap_atomic(src);

+   kunmap_local(src);
  
  	return 0;

  }




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] scsi: target: vhost-scsi: remove redundant initialization of variable ret

2021-03-04 Thread Stefan Hajnoczi
On Wed, Mar 03, 2021 at 01:43:39PM +, Colin King wrote:
> From: Colin Ian King 
> 
> The variable ret is being initialized with a value that is never read
> and it is being updated later with a new value.  The initialization is
> redundant and can be removed.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/vhost/scsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Which kernel version is this patch based on?

If it's a fix for a patch that hasn't landed yet, please indicate this.
A "Fixes: ..." tag should be added to this patch as well.

I looked at linux.git/master commit f69d02e37a85645aa90d18cacfff36dba370f797 
and see this:

  static int __init vhost_scsi_init(void)
  {
  int ret = -ENOMEM;

  pr_debug("TCM_VHOST fabric module %s on %s/%s"
  " on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname,
  utsname()->machine);

  /*
   * Use our own dedicated workqueue for submitting I/O into
   * target core to avoid contention within system_wq.
   */
  vhost_scsi_workqueue = alloc_workqueue("vhost_scsi", 0, 0);
  if (!vhost_scsi_workqueue)
  goto out;

We need ret's initialization value here ^

  ret = vhost_scsi_register();
  if (ret < 0)
  goto out_destroy_workqueue;

  ret = target_register_template(&vhost_scsi_ops);
  if (ret < 0)
  goto out_vhost_scsi_deregister;

  return 0;

  out_vhost_scsi_deregister:
  vhost_scsi_deregister();
  out_destroy_workqueue:
  destroy_workqueue(vhost_scsi_workqueue);
  out:
  return ret;
  };


> 
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index d16c04dcc144..9129ab8187fd 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -2465,7 +2465,7 @@ static const struct target_core_fabric_ops 
> vhost_scsi_ops = {
>  
>  static int __init vhost_scsi_init(void)
>  {
> - int ret = -ENOMEM;
> + int ret;
>  
>   pr_debug("TCM_VHOST fabric module %s on %s/%s"
>   " on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname,
> -- 
> 2.30.0
> 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG

2021-03-04 Thread Robin Murphy

On 2021-03-01 08:42, Christoph Hellwig wrote:

Signed-off-by: Christoph Hellwig 


Moreso than the previous patch, where the feature is at least relatively 
generic (note that there's a bunch of in-flight development around 
DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to 
bloat the generic iommu_ops structure with private driver-specific 
interfaces. The attribute interface is a great compromise for these 
kinds of things, and you can easily add type-checked wrappers around it 
for external callers (maybe even make the actual attributes internal 
between the IOMMU core and drivers) if that's your concern.


Robin.


---
  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  2 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 40 +++--
  drivers/iommu/iommu.c   |  9 ++
  include/linux/iommu.h   |  9 +-
  4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 0f184c3dd9d9ec..78d98ab2ee3a68 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu)
struct io_pgtable_domain_attr pgtbl_cfg;
  
  	pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;

-   iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg);
+   iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg);
  }
  
  struct msm_gem_address_space *

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2e17d990d04481..2858999c86dfd1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct 
iommu_domain *domain)
return ret;
  }
  
-static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain,
+   struct io_pgtable_domain_attr *pgtbl_cfg)
  {
-   int ret = 0;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   int ret = -EPERM;
  
-	mutex_lock(&smmu_domain->init_mutex);

-
-   switch(domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
-   struct io_pgtable_domain_attr *pgtbl_cfg = data;
-
-   if (smmu_domain->smmu) {
-   ret = -EPERM;
-   goto out_unlock;
-   }
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
  
-			smmu_domain->pgtbl_cfg = *pgtbl_cfg;

-   break;
-   }
-   default:
-   ret = -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   ret = -ENODEV;
-   break;
-   default:
-   ret = -EINVAL;
+   mutex_lock(&smmu_domain->init_mutex);
+   if (!smmu_domain->smmu) {
+   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
+   ret = 0;
}
-out_unlock:
mutex_unlock(&smmu_domain->init_mutex);
+
return ret;
  }
  
@@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = {

.device_group   = arm_smmu_device_group,
.dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
.dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
-   .domain_set_attr= arm_smmu_domain_set_attr,
+   .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr,
.domain_enable_nesting  = arm_smmu_domain_enable_nesting,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2e9e058501a953..8490aefd4b41f8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain 
*domain)
  }
  EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting);
  
+int iommu_domain_set_pgtable_attr(struct iommu_domain *domain,

+   struct io_pgtable_domain_attr *pgtbl_cfg)
+{
+   if (!domain->ops->domain_set_pgtable_attr)
+   return -EINVAL;
+   return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr);
+
  void iommu_get_resv_regions(struct device *dev, struct list_head *list)
  {
const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index aed88aa3bd3edf..39d3ed4d2700ac 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,6 +40,7 @@ struct iommu_domain;
  struct notifier_block;
  struct iommu_sva;
  struct iommu_fault_event;
+stru

Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE

2021-03-04 Thread Robin Murphy

On 2021-03-01 08:42, Christoph Hellwig wrote:

Use explicit methods for setting and querying the information instead.


Now that everyone's using iommu-dma, is there any point in bouncing this 
through the drivers at all? Seems like it would make more sense for the 
x86 drivers to reflect their private options back to iommu_dma_strict 
(and allow Intel's caching mode to override it as well), then have 
iommu_dma_init_domain just test !iommu_dma_strict && 
domain->ops->flush_iotlb_all.


Robin.


Also remove the now unused iommu_domain_get_attr functionality.

Signed-off-by: Christoph Hellwig 
---
  drivers/iommu/amd/iommu.c   | 23 ++---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++---
  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 56 +
  drivers/iommu/dma-iommu.c   |  8 ++-
  drivers/iommu/intel/iommu.c | 27 ++
  drivers/iommu/iommu.c   | 19 +++
  include/linux/iommu.h   | 17 ++-
  7 files changed, 51 insertions(+), 146 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40d0..37a8e51db17656 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1771,24 +1771,11 @@ static struct iommu_group 
*amd_iommu_device_group(struct device *dev)
return acpihid_device_group(dev);
  }
  
-static int amd_iommu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain)
  {
-   switch (domain->type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   return -ENODEV;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = !amd_iommu_unmap_flush;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return !amd_iommu_unmap_flush;
  }
  
  /*

@@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = {
.release_device = amd_iommu_release_device,
.probe_finalize = amd_iommu_probe_finalize,
.device_group = amd_iommu_device_group,
-   .domain_get_attr = amd_iommu_domain_get_attr,
+   .dma_use_flush_queue = amd_iommu_dma_use_flush_queue,
.get_resv_regions = amd_iommu_get_resv_regions,
.put_resv_regions = generic_iommu_put_resv_regions,
.is_attach_deferred = amd_iommu_is_attach_deferred,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 8594b4a8304375..bf96172e8c1f71 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
return group;
  }
  
-static int arm_smmu_domain_get_attr(struct iommu_domain *domain,

-   enum iommu_attr attr, void *data)
+static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain)
  {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
  
-	switch (domain->type) {

-   case IOMMU_DOMAIN_UNMANAGED:
-   switch (attr) {
-   case DOMAIN_ATTR_NESTING:
-   *(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   case IOMMU_DOMAIN_DMA:
-   switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
-   return 0;
-   default:
-   return -ENODEV;
-   }
-   break;
-   default:
-   return -EINVAL;
-   }
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return false;
+   return smmu_domain->non_strict;
+}
+
+
+static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain)
+{
+   if (domain->type != IOMMU_DOMAIN_DMA)
+   return;
+   to_smmu_domain(domain)->non_strict = true;
  }
  
  static int arm_smmu_domain_set_attr(struct iommu_domain *domain,

@@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
}
break;
case IOMMU_DOMAIN_DMA:
-   switch(attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
-   break;
-   default:
-   ret = -EN

[patch 5/7] drm/nouveau/device: Replace io_mapping_map_atomic_wc()

2021-03-04 Thread Thomas Gleixner
From: Thomas Gleixner 

Neither fbmem_peek() nor fbmem_poke() require to disable pagefaults and
preemption as a side effect of io_mapping_map_atomic_wc().

Use io_mapping_map_local_wc() instead.

Signed-off-by: Thomas Gleixner 
Cc: Ben Skeggs 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
Cc: nouv...@lists.freedesktop.org
---
 drivers/gpu/drm/nouveau/nvkm/subdev/devinit/fbmem.h |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/fbmem.h
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/fbmem.h
@@ -60,19 +60,19 @@ fbmem_fini(struct io_mapping *fb)
 static inline u32
 fbmem_peek(struct io_mapping *fb, u32 off)
 {
-   u8 __iomem *p = io_mapping_map_atomic_wc(fb, off & PAGE_MASK);
+   u8 __iomem *p = io_mapping_map_local_wc(fb, off & PAGE_MASK);
u32 val = ioread32(p + (off & ~PAGE_MASK));
-   io_mapping_unmap_atomic(p);
+   io_mapping_unmap_local(p);
return val;
 }
 
 static inline void
 fbmem_poke(struct io_mapping *fb, u32 off, u32 val)
 {
-   u8 __iomem *p = io_mapping_map_atomic_wc(fb, off & PAGE_MASK);
+   u8 __iomem *p = io_mapping_map_local_wc(fb, off & PAGE_MASK);
iowrite32(val, p + (off & ~PAGE_MASK));
wmb();
-   io_mapping_unmap_atomic(p);
+   io_mapping_unmap_local(p);
 }
 
 static inline bool


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[patch 2/7] drm/vmgfx: Replace kmap_atomic()

2021-03-04 Thread Thomas Gleixner
From: Thomas Gleixner 

There is no reason to disable pagefaults and preemption as a side effect of
kmap_atomic_prot().

Use kmap_local_page_prot() instead and document the reasoning for the
mapping usage with the given pgprot.

Remove the NULL pointer check for the map. These functions return a valid
address for valid pages and the return was bogus anyway as it would have
left preemption and pagefaults disabled.

Signed-off-by: Thomas Gleixner 
Cc: VMware Graphics 
Cc: Roland Scheidegger 
Cc: Zack Rusin 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
---
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c |   30 --
 1 file changed, 12 insertions(+), 18 deletions(-)

--- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c
@@ -375,12 +375,12 @@ static int vmw_bo_cpu_blit_line(struct v
copy_size = min_t(u32, copy_size, PAGE_SIZE - src_page_offset);
 
if (unmap_src) {
-   kunmap_atomic(d->src_addr);
+   kunmap_local(d->src_addr);
d->src_addr = NULL;
}
 
if (unmap_dst) {
-   kunmap_atomic(d->dst_addr);
+   kunmap_local(d->dst_addr);
d->dst_addr = NULL;
}
 
@@ -388,12 +388,8 @@ static int vmw_bo_cpu_blit_line(struct v
if (WARN_ON_ONCE(dst_page >= d->dst_num_pages))
return -EINVAL;
 
-   d->dst_addr =
-   kmap_atomic_prot(d->dst_pages[dst_page],
-d->dst_prot);
-   if (!d->dst_addr)
-   return -ENOMEM;
-
+   d->dst_addr = 
kmap_local_page_prot(d->dst_pages[dst_page],
+  d->dst_prot);
d->mapped_dst = dst_page;
}
 
@@ -401,12 +397,8 @@ static int vmw_bo_cpu_blit_line(struct v
if (WARN_ON_ONCE(src_page >= d->src_num_pages))
return -EINVAL;
 
-   d->src_addr =
-   kmap_atomic_prot(d->src_pages[src_page],
-d->src_prot);
-   if (!d->src_addr)
-   return -ENOMEM;
-
+   d->src_addr = 
kmap_local_page_prot(d->src_pages[src_page],
+  d->src_prot);
d->mapped_src = src_page;
}
diff->do_cpy(diff, d->dst_addr + dst_page_offset,
@@ -436,8 +428,10 @@ static int vmw_bo_cpu_blit_line(struct v
  *
  * Performs a CPU blit from one buffer object to another avoiding a full
  * bo vmap which may exhaust- or fragment vmalloc space.
- * On supported architectures (x86), we're using kmap_atomic which avoids
- * cross-processor TLB- and cache flushes and may, on non-HIGHMEM systems
+ *
+ * On supported architectures (x86), we're using kmap_local_prot() which
+ * avoids cross-processor TLB- and cache flushes. kmap_local_prot() will
+ * either map a highmem page with the proper pgprot on HIGHMEM=y systems or
  * reference already set-up mappings.
  *
  * Neither of the buffer objects may be placed in PCI memory
@@ -500,9 +494,9 @@ int vmw_bo_cpu_blit(struct ttm_buffer_ob
}
 out:
if (d.src_addr)
-   kunmap_atomic(d.src_addr);
+   kunmap_local(d.src_addr);
if (d.dst_addr)
-   kunmap_atomic(d.dst_addr);
+   kunmap_local(d.dst_addr);
 
return ret;
 }


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[patch 6/7] drm/i915: Replace io_mapping_map_atomic_wc()

2021-03-04 Thread Thomas Gleixner
From: Thomas Gleixner 

None of these mapping requires the side effect of disabling pagefaults and
preemption.

Use io_mapping_map_local_wc() instead, and clean up gtt_user_read() and
gtt_user_write() to use a plain copy_from_user() as the local maps are not
disabling pagefaults.

Signed-off-by: Thomas Gleixner 
Cc: Jani Nikula 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Chris Wilson 
Cc: intel-...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |7 +---
 drivers/gpu/drm/i915/i915_gem.c|   40 -
 drivers/gpu/drm/i915/selftests/i915_gem.c  |4 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c  |8 ++---
 4 files changed, 22 insertions(+), 37 deletions(-)

--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1080,7 +1080,7 @@ static void reloc_cache_reset(struct rel
struct i915_ggtt *ggtt = cache_to_ggtt(cache);
 
intel_gt_flush_ggtt_writes(ggtt->vm.gt);
-   io_mapping_unmap_atomic((void __iomem *)vaddr);
+   io_mapping_unmap_local((void __iomem *)vaddr);
 
if (drm_mm_node_allocated(&cache->node)) {
ggtt->vm.clear_range(&ggtt->vm,
@@ -1146,7 +1146,7 @@ static void *reloc_iomap(struct drm_i915
 
if (cache->vaddr) {
intel_gt_flush_ggtt_writes(ggtt->vm.gt);
-   io_mapping_unmap_atomic((void __force __iomem *) 
unmask_page(cache->vaddr));
+   io_mapping_unmap_local((void __force __iomem *) 
unmask_page(cache->vaddr));
} else {
struct i915_vma *vma;
int err;
@@ -1194,8 +1194,7 @@ static void *reloc_iomap(struct drm_i915
offset += page << PAGE_SHIFT;
}
 
-   vaddr = (void __force *)io_mapping_map_atomic_wc(&ggtt->iomap,
-offset);
+   vaddr = (void __force *)io_mapping_map_local_wc(&ggtt->iomap, offset);
cache->page = page;
cache->vaddr = (unsigned long)vaddr;
 
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -253,22 +253,15 @@ gtt_user_read(struct io_mapping *mapping
  char __user *user_data, int length)
 {
void __iomem *vaddr;
-   unsigned long unwritten;
+   bool fail = false;
 
/* We can use the cpu mem copy function because this is X86. */
-   vaddr = io_mapping_map_atomic_wc(mapping, base);
-   unwritten = __copy_to_user_inatomic(user_data,
-   (void __force *)vaddr + offset,
-   length);
-   io_mapping_unmap_atomic(vaddr);
-   if (unwritten) {
-   vaddr = io_mapping_map_wc(mapping, base, PAGE_SIZE);
-   unwritten = copy_to_user(user_data,
-(void __force *)vaddr + offset,
-length);
-   io_mapping_unmap(vaddr);
-   }
-   return unwritten;
+   vaddr = io_mapping_map_local_wc(mapping, base);
+   if (copy_to_user(user_data, (void __force *)vaddr + offset, length))
+   fail = true;
+   io_mapping_unmap_local(vaddr);
+
+   return fail;
 }
 
 static int
@@ -437,21 +430,14 @@ ggtt_write(struct io_mapping *mapping,
   char __user *user_data, int length)
 {
void __iomem *vaddr;
-   unsigned long unwritten;
+   bool fail = false;
 
/* We can use the cpu mem copy function because this is X86. */
-   vaddr = io_mapping_map_atomic_wc(mapping, base);
-   unwritten = __copy_from_user_inatomic_nocache((void __force *)vaddr + 
offset,
- user_data, length);
-   io_mapping_unmap_atomic(vaddr);
-   if (unwritten) {
-   vaddr = io_mapping_map_wc(mapping, base, PAGE_SIZE);
-   unwritten = copy_from_user((void __force *)vaddr + offset,
-  user_data, length);
-   io_mapping_unmap(vaddr);
-   }
-
-   return unwritten;
+   vaddr = io_mapping_map_local_wc(mapping, base);
+   if (copy_from_user((void __force *)vaddr + offset, user_data, length))
+   fail = true;
+   io_mapping_unmap_local(vaddr);
+   return fail;
 }
 
 /**
--- a/drivers/gpu/drm/i915/selftests/i915_gem.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
@@ -58,12 +58,12 @@ static void trash_stolen(struct drm_i915
 
ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0);
 
-   s = io_mapping_map_atomic_wc(&ggtt->iomap, slot);
+   s = io_mapping_map_local_wc(&ggtt->iomap, slot);
for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) {
prng = next_pseudo_random32(prng);
io

[patch 7/7] io-mapping: Remove io_mapping_map_atomic_wc()

2021-03-04 Thread Thomas Gleixner
From: Thomas Gleixner 

No more users. Get rid of it and remove the traces in documentation.

Signed-off-by: Thomas Gleixner 
Cc: Andrew Morton 
Cc: linux...@kvack.org
---
 Documentation/driver-api/io-mapping.rst |   22 +---
 include/linux/io-mapping.h  |   42 +---
 2 files changed, 9 insertions(+), 55 deletions(-)

--- a/Documentation/driver-api/io-mapping.rst
+++ b/Documentation/driver-api/io-mapping.rst
@@ -21,19 +21,15 @@ mappable, while 'size' indicates how lar
 enable. Both are in bytes.
 
 This _wc variant provides a mapping which may only be used with
-io_mapping_map_atomic_wc(), io_mapping_map_local_wc() or
-io_mapping_map_wc().
+io_mapping_map_local_wc() or io_mapping_map_wc().
 
 With this mapping object, individual pages can be mapped either temporarily
 or long term, depending on the requirements. Of course, temporary maps are
-more efficient. They come in two flavours::
+more efficient.
 
void *io_mapping_map_local_wc(struct io_mapping *mapping,
  unsigned long offset)
 
-   void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
-  unsigned long offset)
-
 'offset' is the offset within the defined mapping region.  Accessing
 addresses beyond the region specified in the creation function yields
 undefined results. Using an offset which is not page aligned yields an
@@ -50,9 +46,6 @@ io_mapping_map_local_wc() has a side eff
 migration to make the mapping code work. No caller can rely on this side
 effect.
 
-io_mapping_map_atomic_wc() has the side effect of disabling preemption and
-pagefaults. Don't use in new code. Use io_mapping_map_local_wc() instead.
-
 Nested mappings need to be undone in reverse order because the mapping
 code uses a stack for keeping track of them::
 
@@ -65,11 +58,10 @@ Nested mappings need to be undone in rev
 The mappings are released with::
 
void io_mapping_unmap_local(void *vaddr)
-   void io_mapping_unmap_atomic(void *vaddr)
 
-'vaddr' must be the value returned by the last io_mapping_map_local_wc() or
-io_mapping_map_atomic_wc() call. This unmaps the specified mapping and
-undoes the side effects of the mapping functions.
+'vaddr' must be the value returned by the last io_mapping_map_local_wc()
+call. This unmaps the specified mapping and undoes eventual side effects of
+the mapping function.
 
 If you need to sleep while holding a mapping, you can use the regular
 variant, although this may be significantly slower::
@@ -77,8 +69,8 @@ If you need to sleep while holding a map
void *io_mapping_map_wc(struct io_mapping *mapping,
unsigned long offset)
 
-This works like io_mapping_map_atomic/local_wc() except it has no side
-effects and the pointer is globaly visible.
+This works like io_mapping_map_local_wc() except it has no side effects and
+the pointer is globaly visible.
 
 The mappings are released with::
 
--- a/include/linux/io-mapping.h
+++ b/include/linux/io-mapping.h
@@ -60,28 +60,7 @@ io_mapping_fini(struct io_mapping *mappi
iomap_free(mapping->base, mapping->size);
 }
 
-/* Atomic map/unmap */
-static inline void __iomem *
-io_mapping_map_atomic_wc(struct io_mapping *mapping,
-unsigned long offset)
-{
-   resource_size_t phys_addr;
-
-   BUG_ON(offset >= mapping->size);
-   phys_addr = mapping->base + offset;
-   preempt_disable();
-   pagefault_disable();
-   return __iomap_local_pfn_prot(PHYS_PFN(phys_addr), mapping->prot);
-}
-
-static inline void
-io_mapping_unmap_atomic(void __iomem *vaddr)
-{
-   kunmap_local_indexed((void __force *)vaddr);
-   pagefault_enable();
-   preempt_enable();
-}
-
+/* Temporary mappings which are only valid in the current context */
 static inline void __iomem *
 io_mapping_map_local_wc(struct io_mapping *mapping, unsigned long offset)
 {
@@ -163,24 +142,7 @@ io_mapping_unmap(void __iomem *vaddr)
 {
 }
 
-/* Atomic map/unmap */
-static inline void __iomem *
-io_mapping_map_atomic_wc(struct io_mapping *mapping,
-unsigned long offset)
-{
-   preempt_disable();
-   pagefault_disable();
-   return io_mapping_map_wc(mapping, offset, PAGE_SIZE);
-}
-
-static inline void
-io_mapping_unmap_atomic(void __iomem *vaddr)
-{
-   io_mapping_unmap(vaddr);
-   pagefault_enable();
-   preempt_enable();
-}
-
+/* Temporary mappings which are only valid in the current context */
 static inline void __iomem *
 io_mapping_map_local_wc(struct io_mapping *mapping, unsigned long offset)
 {

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[patch 4/7] drm/qxl: Replace io_mapping_map_atomic_wc()

2021-03-04 Thread Thomas Gleixner
From: Thomas Gleixner 

None of these mapping requires the side effect of disabling pagefaults and
preemption.

Use io_mapping_map_local_wc() instead, rename the related functions
accordingly and clean up qxl_process_single_command() to use a plain
copy_from_user() as the local maps are not disabling pagefaults.

Signed-off-by: Thomas Gleixner 
Cc: David Airlie 
Cc: Gerd Hoffmann 
Cc: Daniel Vetter 
Cc: virtualization@lists.linux-foundation.org
Cc: spice-de...@lists.freedesktop.org
Cc: dri-de...@lists.freedesktop.org
---
 drivers/gpu/drm/qxl/qxl_image.c   |   18 +-
 drivers/gpu/drm/qxl/qxl_ioctl.c   |   27 +--
 drivers/gpu/drm/qxl/qxl_object.c  |   12 ++--
 drivers/gpu/drm/qxl/qxl_object.h  |4 ++--
 drivers/gpu/drm/qxl/qxl_release.c |4 ++--
 5 files changed, 32 insertions(+), 33 deletions(-)

--- a/drivers/gpu/drm/qxl/qxl_image.c
+++ b/drivers/gpu/drm/qxl/qxl_image.c
@@ -124,12 +124,12 @@ qxl_image_init_helper(struct qxl_device
  wrong (check the bitmaps are sent correctly
  first) */
 
-   ptr = qxl_bo_kmap_atomic_page(qdev, chunk_bo, 0);
+   ptr = qxl_bo_kmap_local_page(qdev, chunk_bo, 0);
chunk = ptr;
chunk->data_size = height * chunk_stride;
chunk->prev_chunk = 0;
chunk->next_chunk = 0;
-   qxl_bo_kunmap_atomic_page(qdev, chunk_bo, ptr);
+   qxl_bo_kunmap_local_page(qdev, chunk_bo, ptr);
 
{
void *k_data, *i_data;
@@ -143,7 +143,7 @@ qxl_image_init_helper(struct qxl_device
i_data = (void *)data;
 
while (remain > 0) {
-   ptr = qxl_bo_kmap_atomic_page(qdev, chunk_bo, 
page << PAGE_SHIFT);
+   ptr = qxl_bo_kmap_local_page(qdev, chunk_bo, 
page << PAGE_SHIFT);
 
if (page == 0) {
chunk = ptr;
@@ -157,7 +157,7 @@ qxl_image_init_helper(struct qxl_device
 
memcpy(k_data, i_data, size);
 
-   qxl_bo_kunmap_atomic_page(qdev, chunk_bo, ptr);
+   qxl_bo_kunmap_local_page(qdev, chunk_bo, ptr);
i_data += size;
remain -= size;
page++;
@@ -175,10 +175,10 @@ qxl_image_init_helper(struct qxl_device
page_offset = 
offset_in_page(out_offset);
size = min((int)(PAGE_SIZE - 
page_offset), remain);
 
-   ptr = qxl_bo_kmap_atomic_page(qdev, 
chunk_bo, page_base);
+   ptr = qxl_bo_kmap_local_page(qdev, 
chunk_bo, page_base);
k_data = ptr + page_offset;
memcpy(k_data, i_data, size);
-   qxl_bo_kunmap_atomic_page(qdev, 
chunk_bo, ptr);
+   qxl_bo_kunmap_local_page(qdev, 
chunk_bo, ptr);
remain -= size;
i_data += size;
out_offset += size;
@@ -189,7 +189,7 @@ qxl_image_init_helper(struct qxl_device
qxl_bo_kunmap(chunk_bo);
 
image_bo = dimage->bo;
-   ptr = qxl_bo_kmap_atomic_page(qdev, image_bo, 0);
+   ptr = qxl_bo_kmap_local_page(qdev, image_bo, 0);
image = ptr;
 
image->descriptor.id = 0;
@@ -212,7 +212,7 @@ qxl_image_init_helper(struct qxl_device
break;
default:
DRM_ERROR("unsupported image bit depth\n");
-   qxl_bo_kunmap_atomic_page(qdev, image_bo, ptr);
+   qxl_bo_kunmap_local_page(qdev, image_bo, ptr);
return -EINVAL;
}
image->u.bitmap.flags = QXL_BITMAP_TOP_DOWN;
@@ -222,7 +222,7 @@ qxl_image_init_helper(struct qxl_device
image->u.bitmap.palette = 0;
image->u.bitmap.data = qxl_bo_physical_address(qdev, chunk_bo, 0);
 
-   qxl_bo_kunmap_atomic_page(qdev, image_bo, ptr);
+   qxl_bo_kunmap_local_page(qdev, image_bo, ptr);
 
return 0;
 }
--- a/drivers/gpu/drm/qxl/qxl_ioctl.c
+++ b/drivers/gpu/drm/qxl/qxl_ioctl.c
@@ -89,11 +89,11 @@ apply_reloc(struct qxl_device *qdev, str
 {
void *reloc_page;
 
-   reloc_page = qxl_bo_kmap_atomic_page(qdev, info->dst_bo, 
info->dst_offset & PAGE_MASK);
+   reloc_page = qxl_bo_kmap_local_page(qdev, info->dst_bo, 
info->dst_offset & PAGE_MASK);
*(uint64_t *)(reloc_page + (info->dst_offset & ~PAGE_MASK)) = 
qxl_bo_physical_address(qdev,

  info->src_bo,

  info->src_o

[patch 1/7] drm/ttm: Replace kmap_atomic() usage

2021-03-04 Thread Thomas Gleixner
From: Thomas Gleixner 

There is no reason to disable pagefaults and preemption as a side effect of
kmap_atomic_prot().

Use kmap_local_page_prot() instead and document the reasoning for the
mapping usage with the given pgprot.

Remove the NULL pointer check for the map. These functions return a valid
address for valid pages and the return was bogus anyway as it would have
left preemption and pagefaults disabled.

Signed-off-by: Thomas Gleixner 
Cc: Christian Koenig 
Cc: Huang Rui 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
---
 drivers/gpu/drm/ttm/ttm_bo_util.c |   20 
 1 file changed, 12 insertions(+), 8 deletions(-)

--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -181,13 +181,15 @@ static int ttm_copy_io_ttm_page(struct t
return -ENOMEM;
 
src = (void *)((unsigned long)src + (page << PAGE_SHIFT));
-   dst = kmap_atomic_prot(d, prot);
-   if (!dst)
-   return -ENOMEM;
+   /*
+* Ensure that a highmem page is mapped with the correct
+* pgprot. For non highmem the mapping is already there.
+*/
+   dst = kmap_local_page_prot(d, prot);
 
memcpy_fromio(dst, src, PAGE_SIZE);
 
-   kunmap_atomic(dst);
+   kunmap_local(dst);
 
return 0;
 }
@@ -203,13 +205,15 @@ static int ttm_copy_ttm_io_page(struct t
return -ENOMEM;
 
dst = (void *)((unsigned long)dst + (page << PAGE_SHIFT));
-   src = kmap_atomic_prot(s, prot);
-   if (!src)
-   return -ENOMEM;
+   /*
+* Ensure that a highmem page is mapped with the correct
+* pgprot. For non highmem the mapping is already there.
+*/
+   src = kmap_local_page_prot(s, prot);
 
memcpy_toio(dst, src, PAGE_SIZE);
 
-   kunmap_atomic(src);
+   kunmap_local(src);
 
return 0;
 }


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[patch 3/7] highmem: Remove kmap_atomic_prot()

2021-03-04 Thread Thomas Gleixner
From: Thomas Gleixner 

No more users.

Signed-off-by: Thomas Gleixner 
Cc: Andrew Morton 
Cc: linux...@kvack.org
---
 include/linux/highmem-internal.h |   14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

--- a/include/linux/highmem-internal.h
+++ b/include/linux/highmem-internal.h
@@ -88,16 +88,11 @@ static inline void __kunmap_local(void *
kunmap_local_indexed(vaddr);
 }
 
-static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
+static inline void *kmap_atomic(struct page *page)
 {
preempt_disable();
pagefault_disable();
-   return __kmap_local_page_prot(page, prot);
-}
-
-static inline void *kmap_atomic(struct page *page)
-{
-   return kmap_atomic_prot(page, kmap_prot);
+   return __kmap_local_page_prot(page, kmap_prot);
 }
 
 static inline void *kmap_atomic_pfn(unsigned long pfn)
@@ -184,11 +179,6 @@ static inline void *kmap_atomic(struct p
return page_address(page);
 }
 
-static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
-{
-   return kmap_atomic(page);
-}
-
 static inline void *kmap_atomic_pfn(unsigned long pfn)
 {
return kmap_atomic(pfn_to_page(pfn));

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[patch 0/7] drm, highmem: Cleanup io/kmap_atomic*() usage

2021-03-04 Thread Thomas Gleixner
None of the DRM usage sites of temporary mappings requires the side
effects of io/kmap_atomic(), i.e. preemption and pagefault disable.

Replace them with the io/kmap_local() variants, simplify the
copy_to/from_user() error handling and remove the atomic variants.

Thanks,

tglx
---
 Documentation/driver-api/io-mapping.rst |   22 +++---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c  |7 +--
 drivers/gpu/drm/i915/i915_gem.c |   40 ++-
 drivers/gpu/drm/i915/selftests/i915_gem.c   |4 -
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |8 +--
 drivers/gpu/drm/nouveau/nvkm/subdev/devinit/fbmem.h |8 +--
 drivers/gpu/drm/qxl/qxl_image.c |   18 
 drivers/gpu/drm/qxl/qxl_ioctl.c |   27 ++--
 drivers/gpu/drm/qxl/qxl_object.c|   12 ++---
 drivers/gpu/drm/qxl/qxl_object.h|4 -
 drivers/gpu/drm/qxl/qxl_release.c   |4 -
 drivers/gpu/drm/ttm/ttm_bo_util.c   |   20 +
 drivers/gpu/drm/vmwgfx/vmwgfx_blit.c|   30 +-
 include/linux/highmem-internal.h|   14 --
 include/linux/io-mapping.h  |   42 
 15 files changed, 93 insertions(+), 167 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-03-04 Thread Cornelia Huck
On Thu, 4 Mar 2021 16:24:16 +0800
Jason Wang  wrote:

> On 2021/3/3 4:29 下午, Cornelia Huck wrote:
> > On Wed, 3 Mar 2021 12:01:01 +0800
> > Jason Wang  wrote:
> >  
> >> On 2021/3/2 8:08 下午, Cornelia Huck wrote:  
> >>> On Mon, 1 Mar 2021 11:51:08 +0800
> >>> Jason Wang  wrote:
> >>> 
>  On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote:  
> > On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote:  
> >> On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote:  
> >>> Confused. What is wrong with the above? It never reads the
> >>> field unless the feature has been offered by device.  
> >> So the spec said:
> >>
> >> "
> >>
> >> The following driver-read-only field, max_virtqueue_pairs only exists 
> >> if
> >> VIRTIO_NET_F_MQ is set.
> >>
> >> "
> >>
> >> If I read this correctly, there will be no max_virtqueue_pairs field 
> >> if the
> >> VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() 
> >> violates
> >> what spec said.
> >>
> >> Thanks  
> > I think that's a misunderstanding. This text was never intended to
> > imply that field offsets change beased on feature bits.
> > We had this pain with legacy and we never wanted to go back there.
> >
> > This merely implies that without VIRTIO_NET_F_MQ the field
> > should not be accessed. Exists in the sense "is accessible to driver".
> >
> > Let's just clarify that in the spec, job done.  
>  Ok, agree. That will make things more eaiser.  
> >>> Yes, that makes much more sense.
> >>>
> >>> What about adding the following to the "Basic Facilities of a Virtio
> >>> Device/Device Configuration Space" section of the spec:
> >>>
> >>> "If an optional configuration field does not exist, the corresponding
> >>> space is still present, but reserved."  
> >>
> >> This became interesting after re-reading some of the qemu codes.
> >>
> >> E.g in virtio-net.c we had:
> >>
> >> *static VirtIOFeature feature_sizes[] = {
> >>       {.flags = 1ULL << VIRTIO_NET_F_MAC,
> >>    .end = endof(struct virtio_net_config, mac)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_STATUS,
> >>    .end = endof(struct virtio_net_config, status)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_MQ,
> >>    .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_MTU,
> >>    .end = endof(struct virtio_net_config, mtu)},
> >>       {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
> >>    .end = endof(struct virtio_net_config, duplex)},
> >>       {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL <<
> >> VIRTIO_NET_F_HASH_REPORT),
> >>    .end = endof(struct virtio_net_config, supported_hash_types)},
> >>       {}
> >> };*
> >>
> >> *It has a implict dependency chain. E.g MTU doesn't presnet if
> >> DUPLEX/RSS is not offered ...
> >> *  
> > But I think it covers everything up to the relevant field, no? So MTU
> > is included if we have the feature bit, even if we don't have
> > DUPLEX/RSS.
> >
> > Given that a config space may be shorter (but must not collapse
> > non-existing fields), maybe a better wording would be:
> >
> > "If an optional configuration field does not exist, the corresponding
> > space will still be present if it is not at the end of the
> > configuration space (i.e., further configuration fields exist.)  
> 
> 
> This should work but I think we need to define the end of configuration 
> space first?

What about sidestepping this:

"...the corresponding space will still be present, unless no further
configuration fields exist."

?

> 
> > This
> > implies that a given field, if it exists, is always at the same offset
> > from the beginning of the configuration space."

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH RFC 0/2] virtio-ccw: allow to disable legacy virtio

2021-03-04 Thread Cornelia Huck
Unlike virtio-pci, virtio-ccw is currently always a transitional
driver (i.e. it always includes support for legacy devices.) The
differences between legacy and virtio-1+ virtio-ccw devices are not
that big (the most interesting things are in common virtio code
anyway.)

It might be beneficial to make support for legacy virtio generally
configurable, in case we want to remove it completely in a future
where we all have flying cars. As a prereq, we need to make it
configurable for virtio-ccw.

Patch 1 introduces a parameter; now that I look at it, it's probably
not that useful (not even for testing), so I'm inclined to drop it
again.

Patch 2 adds a new config symbol for generic legacy virtio support,
which currently does not do anything but being selected by the
legacy options for virtio-pci and virtio-ccw. A virtio-ccw driver
without legacy support will require a revision of 1 or higher to
be supported by the device.

A virtio-ccw driver with legacy turned off works well for me with
transitional devices and fails onlining gracefully for legacy devices
(max_revision=0 in QEMU).

(I also have some code that allows to make devices non-transitional
in QEMU, but I haven't yet found time to polish the patches.)

Cornelia Huck (2):
  virtio/s390: add parameter for minimum revision
  virtio/s390: make legacy support configurable

 arch/s390/Kconfig   |  11 ++
 drivers/s390/virtio/Makefile|   1 +
 drivers/s390/virtio/virtio_ccw.c| 179 
 drivers/s390/virtio/virtio_ccw_common.h | 113 +++
 drivers/s390/virtio/virtio_ccw_legacy.c | 138 ++
 drivers/virtio/Kconfig  |   8 ++
 6 files changed, 330 insertions(+), 120 deletions(-)
 create mode 100644 drivers/s390/virtio/virtio_ccw_common.h
 create mode 100644 drivers/s390/virtio/virtio_ccw_legacy.c


base-commit: cf6acb8bdb1d829b85a4daa2944bf9e71c93f4b9
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 2/2] virtio/s390: make legacy support configurable

2021-03-04 Thread Cornelia Huck
We may want to remove legacy virtio support in the future. In order
to prepare virtio-ccw for that, add an option to disable legacy
support there. This means raising the minimum transport revision
to 1.

Signed-off-by: Cornelia Huck 
---
 arch/s390/Kconfig   |  11 ++
 drivers/s390/virtio/Makefile|   1 +
 drivers/s390/virtio/virtio_ccw.c| 160 ++--
 drivers/s390/virtio/virtio_ccw_common.h | 113 +
 drivers/s390/virtio/virtio_ccw_legacy.c | 138 
 drivers/virtio/Kconfig  |   8 ++
 6 files changed, 312 insertions(+), 119 deletions(-)
 create mode 100644 drivers/s390/virtio/virtio_ccw_common.h
 create mode 100644 drivers/s390/virtio/virtio_ccw_legacy.c

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index e8f7216f6c63..b2743dcf5b36 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -941,6 +941,17 @@ config S390_GUEST
  Select this option if you want to run the kernel as a guest under
  the KVM hypervisor.
 
+config VIRTIO_CCW_LEGACY
+   def_bool y
+   prompt "Support for legacy virtio-ccw devices"
+   depends on S390_GUEST
+   select VIRTIO_LEGACY
+   help
+ Enabling this option adds support for virtio-ccw paravirtual device
+ drivers with legacy support, i.e. it enables transitional a
+ transitional virtio-ccw driver.
+
+ If unsure, say Y.
 endmenu
 
 menu "Selftests"
diff --git a/drivers/s390/virtio/Makefile b/drivers/s390/virtio/Makefile
index 2dc4d9aab634..96dd68411d64 100644
--- a/drivers/s390/virtio/Makefile
+++ b/drivers/s390/virtio/Makefile
@@ -4,3 +4,4 @@
 # Copyright IBM Corp. 2008
 
 obj-$(CONFIG_S390_GUEST) += virtio_ccw.o
+obj-$(CONFIG_VIRTIO_CCW_LEGACY) += virtio_ccw_legacy.o
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 0d3971dbc109..32234b6b9074 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -11,11 +11,8 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -30,16 +27,16 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include "virtio_ccw_common.h"
 
 /*
  * Provide a knob to turn off support for older revisions. This is useful
  * if we want to act as a non-transitional virtio device driver: requiring
  * a minimum revision of 1 turns off support for legacy devices.
  */
-static int min_revision;
+int min_revision = VIRTIO_CCW_REV_MIN;
 
 module_param(min_revision, int, 0444);
 MODULE_PARM_DESC(min_revision, "minimum transport revision to accept");
@@ -53,9 +50,6 @@ struct vq_config_block {
__u16 num;
 } __packed;
 
-#define VIRTIO_CCW_CONFIG_SIZE 0x100
-/* same as PCI config space size, should be enough for all drivers */
-
 struct vcdev_dma_area {
unsigned long indicators;
unsigned long indicators2;
@@ -63,25 +57,6 @@ struct vcdev_dma_area {
__u8 status;
 };
 
-struct virtio_ccw_device {
-   struct virtio_device vdev;
-   __u8 config[VIRTIO_CCW_CONFIG_SIZE];
-   struct ccw_device *cdev;
-   __u32 curr_io;
-   int err;
-   unsigned int revision; /* Transport revision */
-   wait_queue_head_t wait_q;
-   spinlock_t lock;
-   struct mutex io_lock; /* Serializes I/O requests */
-   struct list_head virtqueues;
-   bool is_thinint;
-   bool going_away;
-   bool device_lost;
-   unsigned int config_ready;
-   void *airq_info;
-   struct vcdev_dma_area *dma_area;
-};
-
 static inline unsigned long *indicators(struct virtio_ccw_device *vcdev)
 {
return &vcdev->dma_area->indicators;
@@ -92,13 +67,6 @@ static inline unsigned long *indicators2(struct 
virtio_ccw_device *vcdev)
return &vcdev->dma_area->indicators2;
 }
 
-struct vq_info_block_legacy {
-   __u64 queue;
-   __u32 align;
-   __u16 index;
-   __u16 num;
-} __packed;
-
 struct vq_info_block {
__u64 desc;
__u32 res0;
@@ -129,18 +97,6 @@ struct virtio_rev_info {
 /* the highest virtio-ccw revision we support */
 #define VIRTIO_CCW_REV_MAX 2
 
-struct virtio_ccw_vq_info {
-   struct virtqueue *vq;
-   int num;
-   union {
-   struct vq_info_block s;
-   struct vq_info_block_legacy l;
-   } *info_block;
-   int bit_nr;
-   struct list_head node;
-   long cookie;
-};
-
 #define VIRTIO_AIRQ_ISC IO_SCH_ISC /* inherit from subchannel */
 
 #define VIRTIO_IV_BITS (L1_CACHE_BYTES * 8)
@@ -164,40 +120,6 @@ static inline u8 *get_summary_indicator(struct airq_info 
*info)
return summary_indicators + info->summary_indicator_idx;
 }
 
-#define CCW_CMD_SET_VQ 0x13
-#define CCW_CMD_VDEV_RESET 0x33
-#define CCW_CMD_SET_IND 0x43
-#define CCW_CMD_SET_CONF_IND 0x53
-#define CCW_CMD_READ_FEAT 0x12
-#define CCW_CMD_WRITE_FEAT 0x11
-#define CCW_CMD_READ_CONF 0x22
-#define CCW_CMD_WRITE_CONF 0x21
-#define CCW

[PATCH RFC 1/2] virtio/s390: add parameter for minimum revision

2021-03-04 Thread Cornelia Huck
We use transport revisions in virtio-ccw for introducing new commands
etc.; revision 1 denotes operating according to the standard. Legacy
devices do not understand the command to set a revision; for those, we
presume to operate at revision 0.

Add a parameter min_revision to be able to actively restrict use of
old transport revisions. In particular, setting a minimum revision
of 1 makes our driver act as a non-transitional driver.

With the default min_revision of 0, we continue to act as a
transitional driver.

Signed-off-by: Cornelia Huck 
---
 drivers/s390/virtio/virtio_ccw.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 54e686dca6de..0d3971dbc109 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -34,6 +34,16 @@
 #include 
 #include 
 
+/*
+ * Provide a knob to turn off support for older revisions. This is useful
+ * if we want to act as a non-transitional virtio device driver: requiring
+ * a minimum revision of 1 turns off support for legacy devices.
+ */
+static int min_revision;
+
+module_param(min_revision, int, 0444);
+MODULE_PARM_DESC(min_revision, "minimum transport revision to accept");
+
 /*
  * virtio related functions
  */
@@ -1271,7 +1281,10 @@ static int virtio_ccw_set_transport_rev(struct 
virtio_ccw_device *vcdev)
else
vcdev->revision--;
}
-   } while (ret == -EOPNOTSUPP);
+   } while (vcdev->revision >= min_revision && ret == -EOPNOTSUPP);
+
+   if (ret == -EOPNOTSUPP && vcdev->revision < min_revision)
+   ret = -EINVAL;
 
ccw_device_dma_free(vcdev->cdev, ccw, sizeof(*ccw));
ccw_device_dma_free(vcdev->cdev, rev, sizeof(*rev));
@@ -1315,8 +1328,12 @@ static int virtio_ccw_online(struct ccw_device *cdev)
vcdev->vdev.id.device = cdev->id.cu_model;
 
ret = virtio_ccw_set_transport_rev(vcdev);
-   if (ret)
+   if (ret) {
+   dev_warn(&cdev->dev,
+"Could not set a supported transport revision: %d\n",
+ret);
goto out_free;
+   }
 
ret = register_virtio_device(&vcdev->vdev);
if (ret) {
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: cleanup unused or almost unused IOMMU APIs and the FSL PAMU driver

2021-03-04 Thread Joerg Roedel
On Mon, Mar 01, 2021 at 09:42:40AM +0100, Christoph Hellwig wrote:
> Diffstat:
>  arch/powerpc/include/asm/fsl_pamu_stash.h   |   12 
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |2 
>  drivers/iommu/amd/iommu.c   |   23 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   85 ---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  122 +---
>  drivers/iommu/dma-iommu.c   |8 
>  drivers/iommu/fsl_pamu.c|  264 --
>  drivers/iommu/fsl_pamu.h|   10 
>  drivers/iommu/fsl_pamu_domain.c |  694 
> ++--
>  drivers/iommu/fsl_pamu_domain.h |   46 -
>  drivers/iommu/intel/iommu.c |   55 --
>  drivers/iommu/iommu.c   |   75 ---
>  drivers/soc/fsl/qbman/qman_portal.c |   56 --
>  drivers/vfio/vfio_iommu_type1.c |   31 -
>  drivers/vhost/vdpa.c|   10 
>  include/linux/iommu.h   |   81 ---
>  16 files changed, 214 insertions(+), 1360 deletions(-)

Nice cleanup, thanks. The fsl_pamu driver and interface has always been
a little bit of an alien compared to other IOMMU drivers. I am inclined
to merge this after -rc3 is out, given some reviews. Can you also please
add changelogs to the last three patches?

Thanks,

Joerg
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] qxl: Fix uninitialised struct field head.surface_id

2021-03-04 Thread Colin King
From: Colin Ian King 

The surface_id struct field in head is not being initialized and
static analysis warns that this is being passed through to
dev->monitors_config->heads[i] on an assignment. Clear up this
warning by initializing it to zero.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: a6d3c4d79822 ("qxl: hook monitors_config updates into crtc, not 
encoder.")
Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/qxl/qxl_display.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index 012bce0cdb65..10738e04c09b 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -328,6 +328,7 @@ static void qxl_crtc_update_monitors_config(struct drm_crtc 
*crtc,
 
head.id = i;
head.flags = 0;
+   head.surface_id = 0;
oldcount = qdev->monitors_config->count;
if (crtc->state->active) {
struct drm_display_mode *mode = &crtc->mode;
-- 
2.30.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver

2021-03-04 Thread Jason Wang


On 2021/3/4 9:59 上午, Jie Deng wrote:

Add an I2C bus driver for virtio para-virtualization.

The controller can be emulated by the backend driver in
any device model software by following the virtio protocol.

The device specification can be found on
https://lists.oasis-open.org/archives/virtio-comment/202101/msg8.html.

By following the specification, people may implement different
backend drivers to emulate different controllers according to
their needs.

Co-developed-by: Conghui Chen 
Signed-off-by: Conghui Chen 
Signed-off-by: Jie Deng 
---
  drivers/i2c/busses/Kconfig  |  11 ++
  drivers/i2c/busses/Makefile |   3 +
  drivers/i2c/busses/i2c-virtio.c | 289 
  include/uapi/linux/virtio_i2c.h |  42 ++
  include/uapi/linux/virtio_ids.h |   1 +
  5 files changed, 346 insertions(+)
  create mode 100644 drivers/i2c/busses/i2c-virtio.c
  create mode 100644 include/uapi/linux/virtio_i2c.h

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 05ebf75..0860395 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -21,6 +21,17 @@ config I2C_ALI1535
  This driver can also be built as a module.  If so, the module
  will be called i2c-ali1535.
  
+config I2C_VIRTIO

+   tristate "Virtio I2C Adapter"
+   depends on VIRTIO



Please use select here. There's no prompt for VIRTIO.



+   help
+ If you say yes to this option, support will be included for the virtio
+ I2C adapter driver. The hardware can be emulated by any device model
+ software according to the virtio protocol.
+
+ This driver can also be built as a module. If so, the module
+ will be called i2c-virtio.
+
  config I2C_ALI1563
tristate "ALI 1563"
depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 615f35e..b88da08 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -6,6 +6,9 @@
  # ACPI drivers
  obj-$(CONFIG_I2C_SCMI)+= i2c-scmi.o
  
+# VIRTIO I2C host controller driver

+obj-$(CONFIG_I2C_VIRTIO)   += i2c-virtio.o
+
  # PC SMBus host controller drivers
  obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o
  obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o
diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
new file mode 100644
index 000..98c0fcc
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,289 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * The Virtio I2C Specification:
+ * 
https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex
+ *
+ * Copyright (c) 2021 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/**
+ * struct virtio_i2c - virtio I2C data
+ * @vdev: virtio device for this controller
+ * @completion: completion of virtio I2C message
+ * @adap: I2C adapter for this controller
+ * @i2c_lock: lock for virtqueue processing
+ * @vq: the virtio virtqueue for communication
+ */
+struct virtio_i2c {
+   struct virtio_device *vdev;
+   struct completion completion;
+   struct i2c_adapter adap;
+   struct mutex i2c_lock;
+   struct virtqueue *vq;
+};
+
+/**
+ * struct virtio_i2c_req - the virtio I2C request structure
+ * @out_hdr: the OUT header of the virtio I2C message
+ * @buf: the buffer into which data is read, or from which it's written
+ * @in_hdr: the IN header of the virtio I2C message
+ */
+struct virtio_i2c_req {
+   struct virtio_i2c_out_hdr out_hdr;
+   u8 *buf;
+   struct virtio_i2c_in_hdr in_hdr;
+};
+
+static void virtio_i2c_msg_done(struct virtqueue *vq)
+{
+   struct virtio_i2c *vi = vq->vdev->priv;
+
+   complete(&vi->completion);
+}
+
+static int virtio_i2c_send_reqs(struct virtqueue *vq,
+   struct virtio_i2c_req *reqs,
+   struct i2c_msg *msgs, int nr)
+{
+   struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr;
+   int i, outcnt, incnt, err = 0;
+   u8 *buf;
+
+   for (i = 0; i < nr; i++) {
+   if (!msgs[i].len)
+   break;
+
+   /* Only 7-bit mode supported for this moment. For the address 
format,
+* Please check the Virtio I2C Specification.
+*/
+   reqs[i].out_hdr.addr = cpu_to_le16(msgs[i].addr << 1);
+
+   if (i != nr - 1)
+   reqs[i].out_hdr.flags |= VIRTIO_I2C_FLAGS_FAIL_NEXT;
+
+   outcnt = incnt = 0;
+   sg_init_one(&out_hdr, &reqs[i].out_hdr, 
sizeof(reqs[i].out_hdr));
+   sgs[outcnt++] = &out_hdr;
+
+   buf = kzalloc(msgs[i].len, GFP_KERNEL);
+   if (!buf)
+   break;
+
+   reqs[i].buf = buf;
+

Re: [RFC PATCH 01/10] vdpa: add get_config_size callback in vdpa_config_ops

2021-03-04 Thread Jason Wang


On 2021/3/2 10:15 下午, Stefano Garzarella wrote:

On Tue, Mar 02, 2021 at 12:14:13PM +0800, Jason Wang wrote:


On 2021/2/16 5:44 下午, Stefano Garzarella wrote:

This new callback is used to get the size of the configuration space
of vDPA devices.

Signed-off-by: Stefano Garzarella 
---
 include/linux/vdpa.h  | 4 
 drivers/vdpa/ifcvf/ifcvf_main.c   | 6 ++
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 6 ++
 drivers/vdpa/vdpa_sim/vdpa_sim.c  | 9 +
 4 files changed, 25 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 4ab5494503a8..fddf42b17573 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -150,6 +150,9 @@ struct vdpa_iova_range {
  * @set_status:    Set the device status
  *    @vdev: vdpa device
  *    @status: virtio device status
+ * @get_config_size:    Get the size of the configuration space
+ *    @vdev: vdpa device
+ *    Returns size_t: configuration size



Rethink about this, how much we could gain by introducing a dedicated 
ops here? E.g would it be simpler if we simply introduce a 
max_config_size to vdpa device?


Mainly because in this way we don't have to add new parameters to the 
vdpa_alloc_device() function.


We do the same for example for 'get_device_id', 'get_vendor_id', 
'get_vq_num_max'. All of these are usually static, but we have ops.

I think because it's easier to extend.

I don't know if it's worth adding a new structure for these static 
values at this point, like 'struct vdpa_config_params'.



Yes, that's the point. I think for any static values, it should be set 
during device allocation.


I'm fine with both.

Thanks




Thanks,
Stefano



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 10/10] vhost/vdpa: return configuration bytes read and written to user space

2021-03-04 Thread Jason Wang


On 2021/3/2 10:06 下午, Stefano Garzarella wrote:

On Tue, Mar 02, 2021 at 12:05:35PM +0800, Jason Wang wrote:


On 2021/2/16 5:44 下午, Stefano Garzarella wrote:

vdpa_get_config() and vdpa_set_config() now return the amount
of bytes read and written, so let's return them to the user space.

We also modify vhost_vdpa_config_validate() to return 0 (bytes read
or written) instead of an error, when the buffer length is 0.

Signed-off-by: Stefano Garzarella 
---
 drivers/vhost/vdpa.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 21eea2be5afa..b754c53171a7 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -191,9 +191,6 @@ static ssize_t vhost_vdpa_config_validate(struct 
vhost_vdpa *v,

 struct vdpa_device *vdpa = v->vdpa;
 u32 size = vdpa->config->get_config_size(vdpa);
-    if (c->len == 0)
-    return -EINVAL;
-
 return min(c->len, size);
 }
@@ -204,6 +201,7 @@ static long vhost_vdpa_get_config(struct 
vhost_vdpa *v,

 struct vhost_vdpa_config config;
 unsigned long size = offsetof(struct vhost_vdpa_config, buf);
 ssize_t config_size;
+    long ret;
 u8 *buf;
 if (copy_from_user(&config, c, size))
@@ -217,15 +215,18 @@ static long vhost_vdpa_get_config(struct 
vhost_vdpa *v,

 if (!buf)
 return -ENOMEM;
-    vdpa_get_config(vdpa, config.off, buf, config_size);
-
-    if (copy_to_user(c->buf, buf, config_size)) {
-    kvfree(buf);
-    return -EFAULT;
+    ret = vdpa_get_config(vdpa, config.off, buf, config_size);
+    if (ret < 0) {
+    ret = -EFAULT;
+    goto out;
 }
+    if (copy_to_user(c->buf, buf, config_size))
+    ret = -EFAULT;
+
+out:
 kvfree(buf);
-    return 0;
+    return ret;
 }
 static long vhost_vdpa_set_config(struct vhost_vdpa *v,
@@ -235,6 +236,7 @@ static long vhost_vdpa_set_config(struct 
vhost_vdpa *v,

 struct vhost_vdpa_config config;
 unsigned long size = offsetof(struct vhost_vdpa_config, buf);
 ssize_t config_size;
+    long ret;
 u8 *buf;
 if (copy_from_user(&config, c, size))
@@ -248,10 +250,12 @@ static long vhost_vdpa_set_config(struct 
vhost_vdpa *v,

 if (IS_ERR(buf))
 return PTR_ERR(buf);
-    vdpa_set_config(vdpa, config.off, buf, config_size);
+    ret = vdpa_set_config(vdpa, config.off, buf, config_size);
+    if (ret < 0)
+    ret = -EFAULT;
 kvfree(buf);
-    return 0;
+    return ret;
 }



So I wonder whether it's worth to return the number of bytes since we 
can't propogate the result to driver or driver doesn't care about that.


Okay, but IIUC user space application that issue VHOST_VDPA_GET_CONFIG 
ioctl can use the return value.



Yes, but it looks to it's too late to change since it's a userspace 
noticble behaviour.





Should we change also 'struct virtio_config_ops' to propagate this 
value also to virtio drivers?



I think not, the reason is the driver doesn't expect the get()/set() can 
fail...


Thanks




Thanks,
Stefano



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero

2021-03-04 Thread Jason Wang


On 2021/3/3 4:29 下午, Cornelia Huck wrote:

On Wed, 3 Mar 2021 12:01:01 +0800
Jason Wang  wrote:


On 2021/3/2 8:08 下午, Cornelia Huck wrote:

On Mon, 1 Mar 2021 11:51:08 +0800
Jason Wang  wrote:
  

On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote:

On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote:

On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote:

Confused. What is wrong with the above? It never reads the
field unless the feature has been offered by device.

So the spec said:

"

The following driver-read-only field, max_virtqueue_pairs only exists if
VIRTIO_NET_F_MQ is set.

"

If I read this correctly, there will be no max_virtqueue_pairs field if the
VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() violates
what spec said.

Thanks

I think that's a misunderstanding. This text was never intended to
imply that field offsets change beased on feature bits.
We had this pain with legacy and we never wanted to go back there.

This merely implies that without VIRTIO_NET_F_MQ the field
should not be accessed. Exists in the sense "is accessible to driver".

Let's just clarify that in the spec, job done.

Ok, agree. That will make things more eaiser.

Yes, that makes much more sense.

What about adding the following to the "Basic Facilities of a Virtio
Device/Device Configuration Space" section of the spec:

"If an optional configuration field does not exist, the corresponding
space is still present, but reserved."


This became interesting after re-reading some of the qemu codes.

E.g in virtio-net.c we had:

*static VirtIOFeature feature_sizes[] = {
      {.flags = 1ULL << VIRTIO_NET_F_MAC,
   .end = endof(struct virtio_net_config, mac)},
      {.flags = 1ULL << VIRTIO_NET_F_STATUS,
   .end = endof(struct virtio_net_config, status)},
      {.flags = 1ULL << VIRTIO_NET_F_MQ,
   .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
      {.flags = 1ULL << VIRTIO_NET_F_MTU,
   .end = endof(struct virtio_net_config, mtu)},
      {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
   .end = endof(struct virtio_net_config, duplex)},
      {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL <<
VIRTIO_NET_F_HASH_REPORT),
   .end = endof(struct virtio_net_config, supported_hash_types)},
      {}
};*

*It has a implict dependency chain. E.g MTU doesn't presnet if
DUPLEX/RSS is not offered ...
*

But I think it covers everything up to the relevant field, no? So MTU
is included if we have the feature bit, even if we don't have
DUPLEX/RSS.

Given that a config space may be shorter (but must not collapse
non-existing fields), maybe a better wording would be:

"If an optional configuration field does not exist, the corresponding
space will still be present if it is not at the end of the
configuration space (i.e., further configuration fields exist.)



This should work but I think we need to define the end of configuration 
space first?




This
implies that a given field, if it exists, is always at the same offset
from the beginning of the configuration space."



(Do we need to specify what a device needs to do if the driver tries to
access a non-existing field? We cannot really make assumptions about
config space accesses; virtio-ccw can just copy a chunk of config space
that contains non-existing fields...


Right, it looks to me ccw doesn't depends on config len which is kind of
interesting. I think the answer depends on the implementation of both
transport and device:

(virtio-ccw is a bit odd, because channel I/O does not have the concept
of a config space and I needed to come up with something)



Ok.





Let's take virtio-net-pci as an example, it fills status unconditionally
in virtio_net_set_config() even if VIRTIO_NET_F_STATUS is not
negotiated. So with the above feature_sizes:

1) if one of the MQ, MTU, DUPLEX or RSS is implemented, driver can still
read status in this case
2) if none of the above four is negotied, driver can only read a 0xFF
(virtio_config_readb())



I guess the device could ignore
writes and return zeroes on read?)


So consider the above, it might be too late to fix/clarify that in the
spec on the expected behaviour of reading/writing non-exist fields.

We could still advise behaviour via SHOULD, though I'm not sure it adds
much at this point in time.

It really feels a bit odd that a driver can still read and write fields
for features that it did not negotiate, but I fear we're stuck with it.



Yes, since the device (at least virtio-pci) implment thing like this.

Thanks





Thanks



I've opened https://github.com/oasis-tcs/virtio-spec/issues/98 for the
spec issues.
  


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virt