Re: [RFC v4 10/11] vduse: Introduce a workqueue for irq injection
On 2021/3/5 3:27 下午, Yongji Xie wrote: On Fri, Mar 5, 2021 at 3:01 PM Jason Wang wrote: On 2021/3/5 2:36 下午, Yongji Xie wrote: On Fri, Mar 5, 2021 at 11:42 AM Jason Wang wrote: On 2021/3/5 11:30 上午, Yongji Xie wrote: On Fri, Mar 5, 2021 at 11:05 AM Jason Wang wrote: On 2021/3/4 4:58 下午, Yongji Xie wrote: On Thu, Mar 4, 2021 at 2:59 PM Jason Wang wrote: On 2021/2/23 7:50 下午, Xie Yongji wrote: This patch introduces a workqueue to support injecting virtqueue's interrupt asynchronously. This is mainly for performance considerations which makes sure the push() and pop() for used vring can be asynchronous. Do you have pref numbers for this patch? No, I can do some tests for it if needed. Another problem is the VIRTIO_RING_F_EVENT_IDX feature will be useless if we call irq callback in ioctl context. Something like: virtqueue_push(); virtio_notify(); ioctl() - irq_cb() virtqueue_get_buf() The used vring is always empty each time we call virtqueue_push() in userspace. Not sure if it is what we expected. I'm not sure I get the issue. THe used ring should be filled by virtqueue_push() which is done by userspace before? After userspace call virtqueue_push(), it always call virtio_notify() immediately. In traditional VM (vhost-vdpa) cases, virtio_notify() will inject an irq to VM and return, then vcpu thread will call interrupt handler. But in container (virtio-vdpa) cases, virtio_notify() will call interrupt handler directly. So it looks like we have to optimize the virtio-vdpa cases. But one problem is we don't know whether we are in the VM user case or container user case. Yes, but I still don't get why used ring is empty after the ioctl()? Used ring does not use bounce page so it should be visible to the kernel driver. What did I miss :) ? Sorry, I'm not saying the kernel can't see the correct used vring. I mean the kernel will consume the used vring in the ioctl context directly in the virtio-vdpa case. In userspace's view, that means virtqueue_push() is used vring's producer and virtio_notify() is used vring's consumer. They will be called one by one in one thread rather than different threads, which looks odd and has a bad effect on performance. Yes, that's why we need a workqueue (WQ_UNBOUND you used). Or do you want to squash this patch into patch 8? So I think we can see obvious difference when virtio-vdpa is used. But it looks like we don't need this workqueue in vhost-vdpa cases. Any suggestions? I haven't had a deep thought. But I feel we can solve this by using the irq bypass manager (or something similar). Then we don't need it to be relayed via workqueue and vdpa. But I'm not sure how hard it will be. Do you see any obvious performance regression by using the workqueue? Or we can optimize it in the future. Thanks Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v4 06/11] vduse: Implement an MMU-based IOMMU driver
On 2021/3/5 3:13 下午, Yongji Xie wrote: On Fri, Mar 5, 2021 at 2:52 PM Jason Wang wrote: On 2021/3/5 2:15 下午, Yongji Xie wrote: Sorry if I've asked this before. But what's the reason for maintaing a dedicated IOTLB here? I think we could reuse vduse_dev->iommu since the device can not be used by both virtio and vhost in the same time or use vduse_iova_domain->iotlb for set_map(). The main difference between domain->iotlb and dev->iotlb is the way to deal with bounce buffer. In the domain->iotlb case, bounce buffer needs to be mapped each DMA transfer because we need to get the bounce pages by an IOVA during DMA unmapping. In the dev->iotlb case, bounce buffer only needs to be mapped once during initialization, which will be used to tell userspace how to do mmap(). Also, since vhost IOTLB support per mapping token (opauqe), can we use that instead of the bounce_pages *? Sorry, I didn't get you here. Which value do you mean to store in the opaque pointer? So I would like to have a way to use a single IOTLB for manage all kinds of mappings. Two possible ideas: 1) map bounce page one by one in vduse_dev_map_page(), in VDUSE_IOTLB_GET_FD, try to merge the result if we had the same fd. Then for bounce pages, userspace still only need to map it once and we can maintain the actual mapping by storing the page or pa in the opaque field of IOTLB entry. Looks like userspace still needs to unmap the old region and map a new region (size is changed) with the fd in each VDUSE_IOTLB_GET_FD ioctl. I don't get here. Can you give an example? For example, userspace needs to process two I/O requests (one page per request). To process the first request, userspace uses VDUSE_IOTLB_GET_FD ioctl to query the iova region (0 ~ 4096) and mmap it. I think in this case we should let VDUSE_IOTLB_GET_FD return the maximum range as far as they are backed by the same fd. In the case of bounce page, it's bascially the whole range of bounce buffer? Thanks To process the second request, userspace uses VDUSE_IOTLB_GET_FD ioctl to query the new iova region and map a new region (0 ~ 8192). Then userspace needs to traverse the list of iova regions and unmap the old region (0 ~ 4096). Looks like this is a little complex. Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver
On 2021/3/5 1:47 下午, Jie Deng wrote: On 2021/3/4 17:15, Jason Wang wrote: On 2021/3/4 9:59 上午, Jie Deng wrote: Add an I2C bus driver for virtio para-virtualization. The controller can be emulated by the backend driver in any device model software by following the virtio protocol. The device specification can be found on https://lists.oasis-open.org/archives/virtio-comment/202101/msg8.html. By following the specification, people may implement different backend drivers to emulate different controllers according to their needs. Co-developed-by: Conghui Chen Signed-off-by: Conghui Chen Signed-off-by: Jie Deng --- drivers/i2c/busses/Kconfig | 11 ++ drivers/i2c/busses/Makefile | 3 + drivers/i2c/busses/i2c-virtio.c | 289 include/uapi/linux/virtio_i2c.h | 42 ++ include/uapi/linux/virtio_ids.h | 1 + 5 files changed, 346 insertions(+) create mode 100644 drivers/i2c/busses/i2c-virtio.c create mode 100644 include/uapi/linux/virtio_i2c.h diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index 05ebf75..0860395 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -21,6 +21,17 @@ config I2C_ALI1535 This driver can also be built as a module. If so, the module will be called i2c-ali1535. +config I2C_VIRTIO + tristate "Virtio I2C Adapter" + depends on VIRTIO Please use select here. There's no prompt for VIRTIO. OK. + help + If you say yes to this option, support will be included for the virtio + I2C adapter driver. The hardware can be emulated by any device model + software according to the virtio protocol. + + This driver can also be built as a module. If so, the module + will be called i2c-virtio. + config I2C_ALI1563 tristate "ALI 1563" depends on PCI diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile index 615f35e..b88da08 100644 --- a/drivers/i2c/busses/Makefile +++ b/drivers/i2c/busses/Makefile @@ -6,6 +6,9 @@ # ACPI drivers obj-$(CONFIG_I2C_SCMI) += i2c-scmi.o +# VIRTIO I2C host controller driver +obj-$(CONFIG_I2C_VIRTIO) += i2c-virtio.o + # PC SMBus host controller drivers obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c new file mode 100644 index 000..98c0fcc --- /dev/null +++ b/drivers/i2c/busses/i2c-virtio.c @@ -0,0 +1,289 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Virtio I2C Bus Driver + * + * The Virtio I2C Specification: + * https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex + * + * Copyright (c) 2021 Intel Corporation. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/** + * struct virtio_i2c - virtio I2C data + * @vdev: virtio device for this controller + * @completion: completion of virtio I2C message + * @adap: I2C adapter for this controller + * @i2c_lock: lock for virtqueue processing + * @vq: the virtio virtqueue for communication + */ +struct virtio_i2c { + struct virtio_device *vdev; + struct completion completion; + struct i2c_adapter adap; + struct mutex i2c_lock; + struct virtqueue *vq; +}; + +/** + * struct virtio_i2c_req - the virtio I2C request structure + * @out_hdr: the OUT header of the virtio I2C message + * @buf: the buffer into which data is read, or from which it's written + * @in_hdr: the IN header of the virtio I2C message + */ +struct virtio_i2c_req { + struct virtio_i2c_out_hdr out_hdr; + u8 *buf; + struct virtio_i2c_in_hdr in_hdr; +}; + +static void virtio_i2c_msg_done(struct virtqueue *vq) +{ + struct virtio_i2c *vi = vq->vdev->priv; + + complete(&vi->completion); +} + +static int virtio_i2c_send_reqs(struct virtqueue *vq, + struct virtio_i2c_req *reqs, + struct i2c_msg *msgs, int nr) +{ + struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr; + int i, outcnt, incnt, err = 0; + u8 *buf; + + for (i = 0; i < nr; i++) { + if (!msgs[i].len) + break; + + /* Only 7-bit mode supported for this moment. For the address format, + * Please check the Virtio I2C Specification. + */ + reqs[i].out_hdr.addr = cpu_to_le16(msgs[i].addr << 1); + + if (i != nr - 1) + reqs[i].out_hdr.flags |= VIRTIO_I2C_FLAGS_FAIL_NEXT; + + outcnt = incnt = 0; + sg_init_one(&out_hdr, &reqs[i].out_hdr, sizeof(reqs[i].out_hdr)); + sgs[outcnt++] = &out_hdr; + + buf = kzalloc(msgs[i].len, GFP_KERNEL); + if (!buf) + break; + + reqs[i].buf = buf; + sg_init_one(&msg_buf, reqs[i].buf, msgs[i].len); + + if (msgs[i].flags & I2C_M_RD) { + sgs[outcnt + incnt++] = &msg_buf; +
Re: [PATCH v5 net-next] virtio-net: support XDP when not more queues
On 2021/3/1 11:22 上午, Xuan Zhuo wrote: The number of queues implemented by many virtio backends is limited, especially some machines have a large number of CPUs. In this case, it is often impossible to allocate a separate queue for XDP_TX/XDP_REDIRECT, then xdp cannot be loaded to work, even xdp does not use the XDP_TX/XDP_REDIRECT. This patch allows XDP_TX/XDP_REDIRECT to run by reuse the existing SQ with __netif_tx_lock() hold when there are not enough queues. Signed-off-by: Xuan Zhuo Reviewed-by: Dust Li --- v5: change subject from 'support XDP_TX when not more queues' v4: make sparse happy suggested by Jakub Kicinski v3: add warning when no more queues suggested by Jesper Dangaard Brouer drivers/net/virtio_net.c | 53 1 file changed, 44 insertions(+), 9 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ba8e637..55f1dd1 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -195,6 +195,9 @@ struct virtnet_info { /* # of XDP queue pairs currently used by the driver */ u16 xdp_queue_pairs; + /* xdp_queue_pairs may be 0, when xdp is already loaded. So add this. */ + bool xdp_enabled; + /* I like... big packets and I cannot lie! */ bool big_packets; @@ -481,14 +484,42 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi, return 0; } -static struct send_queue *virtnet_xdp_sq(struct virtnet_info *vi) +static struct send_queue *virtnet_get_xdp_sq(struct virtnet_info *vi) + __acquires(lock) { + struct netdev_queue *txq; unsigned int qp; - qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id(); + if (vi->curr_queue_pairs > nr_cpu_ids) { + qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id(); + + /* tell sparse we took the lock, but don't really take it */ + __acquire(lock); The code can explain itself but you need to explain why we don't need to hold tx lock here. And it looks to me we should use __netif_tx_acquire()/__netif_tx_release()? Btw, is it better to refactor the code then we can annote the code with something like __acquire(txq->xmit_lock)? Thanks + } else { + qp = smp_processor_id() % vi->curr_queue_pairs; + txq = netdev_get_tx_queue(vi->dev, qp); + __netif_tx_lock(txq, raw_smp_processor_id()); + } + return &vi->sq[qp]; } +static void virtnet_put_xdp_sq(struct virtnet_info *vi, struct send_queue *sq) + __releases(lock) +{ + struct netdev_queue *txq; + unsigned int qp; + + if (vi->curr_queue_pairs <= nr_cpu_ids) { + qp = sq - vi->sq; + txq = netdev_get_tx_queue(vi->dev, qp); + __netif_tx_unlock(txq); + } else { + /* make sparse happy */ + __release(lock); + } +} + static int virtnet_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags) { @@ -512,7 +543,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, if (!xdp_prog) return -ENXIO; - sq = virtnet_xdp_sq(vi); + sq = virtnet_get_xdp_sq(vi); if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) { ret = -EINVAL; @@ -560,12 +591,13 @@ static int virtnet_xdp_xmit(struct net_device *dev, sq->stats.kicks += kicks; u64_stats_update_end(&sq->stats.syncp); + virtnet_put_xdp_sq(vi, sq); return ret; } static unsigned int virtnet_get_headroom(struct virtnet_info *vi) { - return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0; + return vi->xdp_enabled ? VIRTIO_XDP_HEADROOM : 0; } /* We copy the packet for XDP in the following cases: @@ -1457,12 +1489,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget) xdp_do_flush(); if (xdp_xmit & VIRTIO_XDP_TX) { - sq = virtnet_xdp_sq(vi); + sq = virtnet_get_xdp_sq(vi); if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) { u64_stats_update_begin(&sq->stats.syncp); sq->stats.kicks++; u64_stats_update_end(&sq->stats.syncp); } + virtnet_put_xdp_sq(vi, sq); } return received; @@ -2417,10 +2450,9 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, /* XDP requires extra queues for XDP_TX */ if (curr_qp + xdp_qp > vi->max_queue_pairs) { - NL_SET_ERR_MSG_MOD(extack, "Too few free TX rings available"); - netdev_warn(dev, "request %i queues but max is %i\n", + netdev_warn(dev, "XDP request %i queues but max is %i. XDP_TX and XDP_REDIRECT will operate in a slower locked tx mode.\n",
Re: [RFC v4 10/11] vduse: Introduce a workqueue for irq injection
On 2021/3/5 2:36 下午, Yongji Xie wrote: On Fri, Mar 5, 2021 at 11:42 AM Jason Wang wrote: On 2021/3/5 11:30 上午, Yongji Xie wrote: On Fri, Mar 5, 2021 at 11:05 AM Jason Wang wrote: On 2021/3/4 4:58 下午, Yongji Xie wrote: On Thu, Mar 4, 2021 at 2:59 PM Jason Wang wrote: On 2021/2/23 7:50 下午, Xie Yongji wrote: This patch introduces a workqueue to support injecting virtqueue's interrupt asynchronously. This is mainly for performance considerations which makes sure the push() and pop() for used vring can be asynchronous. Do you have pref numbers for this patch? No, I can do some tests for it if needed. Another problem is the VIRTIO_RING_F_EVENT_IDX feature will be useless if we call irq callback in ioctl context. Something like: virtqueue_push(); virtio_notify(); ioctl() - irq_cb() virtqueue_get_buf() The used vring is always empty each time we call virtqueue_push() in userspace. Not sure if it is what we expected. I'm not sure I get the issue. THe used ring should be filled by virtqueue_push() which is done by userspace before? After userspace call virtqueue_push(), it always call virtio_notify() immediately. In traditional VM (vhost-vdpa) cases, virtio_notify() will inject an irq to VM and return, then vcpu thread will call interrupt handler. But in container (virtio-vdpa) cases, virtio_notify() will call interrupt handler directly. So it looks like we have to optimize the virtio-vdpa cases. But one problem is we don't know whether we are in the VM user case or container user case. Yes, but I still don't get why used ring is empty after the ioctl()? Used ring does not use bounce page so it should be visible to the kernel driver. What did I miss :) ? Sorry, I'm not saying the kernel can't see the correct used vring. I mean the kernel will consume the used vring in the ioctl context directly in the virtio-vdpa case. In userspace's view, that means virtqueue_push() is used vring's producer and virtio_notify() is used vring's consumer. They will be called one by one in one thread rather than different threads, which looks odd and has a bad effect on performance. Yes, that's why we need a workqueue (WQ_UNBOUND you used). Or do you want to squash this patch into patch 8? So I think we can see obvious difference when virtio-vdpa is used. Thanks Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver
On 2021/3/5 11:09, Viresh Kumar wrote: On 05-03-21, 09:46, Jie Deng wrote: On 2021/3/4 14:06, Viresh Kumar wrote: depends on I2C as well ? No need that. The dependency of I2C is included in the Kconfig in its parent directory. Sorry about that, I must have figured that out myself. (Though a note on the way we reply to messages, please leave an empty line before and after your messages, it gets difficult to find the inline replies otherwise. ) + if (!(req && req == &reqs[i])) { I find this less readable compared to: if (!req || req != &reqs[i]) { Different people may have different tastes. Please check Andy's comment in this link. https://lists.linuxfoundation.org/pipermail/virtualization/2020-September/049933.html Heh, everyone wants you to do it differently :) If we leave compilers optimizations aside (because it will generate the exact same code for both the cases, I tested it as well to be doubly sure), The original statement used in this patch has 3 conditional statements in it and the way I suggested has only two. Andy, thoughts ? And anyway, this isn't biggest of my worries, just that I had to notice it somehow :) For all the above errors where you simply break out, you still need to free the memory for buf, right ? Will try to use reqs[i].buf = msgs[i].buf to avoid allocation. I think it would be better to have all such deallocations done at a single place, i.e. after the completion callback is finished.. Trying to solve this everywhere is going to make this more messy. I think there is no need to have allocations/deallocations/memcpy for reqs[i].buf any more if using msgs[i].buf directly. + mutex_lock(&vi->i2c_lock); I have never worked with i2c stuff earlier, but I don't think you need a lock here. The transactions seem to be serialized by the i2c-core by itself (look at i2c_transfer() in i2c-core-base.c), though there is another exported version __i2c_transfer() but the comment over it says the callers must take adapter lock before calling it. Lock is needed since no "lock_ops" is registered in this i2c_adapter. drivers/i2c/i2c-core-base.c: static int i2c_register_adapter(struct i2c_adapter *adap) { ... if (!adap->lock_ops) adap->lock_ops = &i2c_adapter_lock_ops; ... } This should take care of it ? The problem is that you can't guarantee that adap->algo->master_xfer is only called from i2c_transfer. Any function who holds the adapter can call adap->algo->master_xfer directly. So I think it is safer to have a lock in virtio_i2c_xfer. + + ret = virtio_i2c_send_reqs(vq, reqs, msgs, num); + if (ret == 0) + goto err_unlock_free; + + nr = ret; + + virtqueue_kick(vq); + + time_left = wait_for_completion_timeout(&vi->completion, adap->timeout); + if (!time_left) { + dev_err(&adap->dev, "virtio i2c backend timeout.\n"); + ret = -ETIMEDOUT; You need to free bufs of the requests here as well.. Just want to make sure you didn't miss this comment. Will try to use msgs[i].buf directly. So it should be no free bufs any more. +static struct i2c_adapter virtio_adapter = { + .owner = THIS_MODULE, + .name = "Virtio I2C Adapter", + .class = I2C_CLASS_DEPRECATED, Why are we using something that is deprecated here ? Warn users that the adapter doesn't support classes anymore. So this is the right thing to do? Or this is what we expect from new drivers? Sorry, I am just new to this stuff and so... https://patchwork.ozlabs.org/project/linux-i2c/patch/20170729121143.3980-1-...@the-dreams.de/ +struct virtio_i2c_out_hdr { + __le16 addr; + __le16 padding; + __le32 flags; +}; It might be worth setting __packed for the structures here, even when we have taken care of padding ourselves, for both the structures.. Please check Michael's comment https://lkml.org/lkml/2020/9/3/339. I agreed to remove "__packed" . When Michael commented the structure looked like this: Actually it can be ignored as the compiler isn't going to add any padding by itself in this case (since you already took care of it) as the structure will be aligned to the size of the biggest element here. I generally consider it to be a good coding-style to make sure we don't add any unwanted stuff in there by mistake. Anyway, we can see it in future if this is going to be required or not, if and when we add new fields here. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v4 06/11] vduse: Implement an MMU-based IOMMU driver
On 2021/3/5 2:15 下午, Yongji Xie wrote: Sorry if I've asked this before. But what's the reason for maintaing a dedicated IOTLB here? I think we could reuse vduse_dev->iommu since the device can not be used by both virtio and vhost in the same time or use vduse_iova_domain->iotlb for set_map(). The main difference between domain->iotlb and dev->iotlb is the way to deal with bounce buffer. In the domain->iotlb case, bounce buffer needs to be mapped each DMA transfer because we need to get the bounce pages by an IOVA during DMA unmapping. In the dev->iotlb case, bounce buffer only needs to be mapped once during initialization, which will be used to tell userspace how to do mmap(). Also, since vhost IOTLB support per mapping token (opauqe), can we use that instead of the bounce_pages *? Sorry, I didn't get you here. Which value do you mean to store in the opaque pointer? So I would like to have a way to use a single IOTLB for manage all kinds of mappings. Two possible ideas: 1) map bounce page one by one in vduse_dev_map_page(), in VDUSE_IOTLB_GET_FD, try to merge the result if we had the same fd. Then for bounce pages, userspace still only need to map it once and we can maintain the actual mapping by storing the page or pa in the opaque field of IOTLB entry. Looks like userspace still needs to unmap the old region and map a new region (size is changed) with the fd in each VDUSE_IOTLB_GET_FD ioctl. I don't get here. Can you give an example? 2) map bounce page once in vduse_dev_map_page() and store struct page **bounce_pages in the opaque field of this single IOTLB entry. We can get the struct page **bounce_pages from vduse_iova_domain. Why do we need to store it in the opaque field? Should the opaque field be used to store vdpa_map_file? Oh yes, you're right. And I think it works. One problem is we need to find a place to store the original DMA buffer's address and size. I think we can modify the array of bounce_pages for this purpose. Thanks, Yongji Yes. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver
On 2021/3/4 17:15, Jason Wang wrote: On 2021/3/4 9:59 上午, Jie Deng wrote: Add an I2C bus driver for virtio para-virtualization. The controller can be emulated by the backend driver in any device model software by following the virtio protocol. The device specification can be found on https://lists.oasis-open.org/archives/virtio-comment/202101/msg8.html. By following the specification, people may implement different backend drivers to emulate different controllers according to their needs. Co-developed-by: Conghui Chen Signed-off-by: Conghui Chen Signed-off-by: Jie Deng --- drivers/i2c/busses/Kconfig | 11 ++ drivers/i2c/busses/Makefile | 3 + drivers/i2c/busses/i2c-virtio.c | 289 include/uapi/linux/virtio_i2c.h | 42 ++ include/uapi/linux/virtio_ids.h | 1 + 5 files changed, 346 insertions(+) create mode 100644 drivers/i2c/busses/i2c-virtio.c create mode 100644 include/uapi/linux/virtio_i2c.h diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index 05ebf75..0860395 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -21,6 +21,17 @@ config I2C_ALI1535 This driver can also be built as a module. If so, the module will be called i2c-ali1535. +config I2C_VIRTIO + tristate "Virtio I2C Adapter" + depends on VIRTIO Please use select here. There's no prompt for VIRTIO. OK. + help + If you say yes to this option, support will be included for the virtio + I2C adapter driver. The hardware can be emulated by any device model + software according to the virtio protocol. + + This driver can also be built as a module. If so, the module + will be called i2c-virtio. + config I2C_ALI1563 tristate "ALI 1563" depends on PCI diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile index 615f35e..b88da08 100644 --- a/drivers/i2c/busses/Makefile +++ b/drivers/i2c/busses/Makefile @@ -6,6 +6,9 @@ # ACPI drivers obj-$(CONFIG_I2C_SCMI) += i2c-scmi.o +# VIRTIO I2C host controller driver +obj-$(CONFIG_I2C_VIRTIO) += i2c-virtio.o + # PC SMBus host controller drivers obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c new file mode 100644 index 000..98c0fcc --- /dev/null +++ b/drivers/i2c/busses/i2c-virtio.c @@ -0,0 +1,289 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Virtio I2C Bus Driver + * + * The Virtio I2C Specification: + * https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex + * + * Copyright (c) 2021 Intel Corporation. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/** + * struct virtio_i2c - virtio I2C data + * @vdev: virtio device for this controller + * @completion: completion of virtio I2C message + * @adap: I2C adapter for this controller + * @i2c_lock: lock for virtqueue processing + * @vq: the virtio virtqueue for communication + */ +struct virtio_i2c { + struct virtio_device *vdev; + struct completion completion; + struct i2c_adapter adap; + struct mutex i2c_lock; + struct virtqueue *vq; +}; + +/** + * struct virtio_i2c_req - the virtio I2C request structure + * @out_hdr: the OUT header of the virtio I2C message + * @buf: the buffer into which data is read, or from which it's written + * @in_hdr: the IN header of the virtio I2C message + */ +struct virtio_i2c_req { + struct virtio_i2c_out_hdr out_hdr; + u8 *buf; + struct virtio_i2c_in_hdr in_hdr; +}; + +static void virtio_i2c_msg_done(struct virtqueue *vq) +{ + struct virtio_i2c *vi = vq->vdev->priv; + + complete(&vi->completion); +} + +static int virtio_i2c_send_reqs(struct virtqueue *vq, + struct virtio_i2c_req *reqs, + struct i2c_msg *msgs, int nr) +{ + struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr; + int i, outcnt, incnt, err = 0; + u8 *buf; + + for (i = 0; i < nr; i++) { + if (!msgs[i].len) + break; + + /* Only 7-bit mode supported for this moment. For the address format, + * Please check the Virtio I2C Specification. + */ + reqs[i].out_hdr.addr = cpu_to_le16(msgs[i].addr << 1); + + if (i != nr - 1) + reqs[i].out_hdr.flags |= VIRTIO_I2C_FLAGS_FAIL_NEXT; + + outcnt = incnt = 0; + sg_init_one(&out_hdr, &reqs[i].out_hdr, sizeof(reqs[i].out_hdr)); + sgs[outcnt++] = &out_hdr; + + buf = kzalloc(msgs[i].len, GFP_KERNEL); + if (!buf) + break; + + reqs[i].buf = buf; + sg_init_one(&msg_buf, reqs[i].buf, msgs[i].len); + + if (msgs[i].flags & I2C_M_RD) { + sgs[outcnt + incnt++] = &msg_buf; + } else { + memcpy(reqs[i]
Re: [RFC v4 11/11] vduse: Support binding irq to the specified cpu
On 2021/3/5 11:37 上午, Yongji Xie wrote: On Fri, Mar 5, 2021 at 11:11 AM Jason Wang wrote: On 2021/3/4 4:19 下午, Yongji Xie wrote: On Thu, Mar 4, 2021 at 3:30 PM Jason Wang wrote: On 2021/2/23 7:50 下午, Xie Yongji wrote: Add a parameter for the ioctl VDUSE_INJECT_VQ_IRQ to support injecting virtqueue's interrupt to the specified cpu. How userspace know which CPU is this irq for? It looks to me we need to do it at different level. E.g introduce some API in sys to allow admin to tune for that. But I think we can do that in antoher patch on top of this series. OK. I will think more about it. It should be soemthing like /sys/class/vduse/$dev_name/vq/0/irq_affinity. Also need to make sure eventfd could not be reused. Looks like we doesn't use eventfd now. Do you mean we need to use eventfd in this case? No, I meant if we're using eventfd, do we allow a single eventfd to be used for injecting irq for more than one virtqueue? (If not, I guess it should be ok). Thanks Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v4 10/11] vduse: Introduce a workqueue for irq injection
On 2021/3/5 11:30 上午, Yongji Xie wrote: On Fri, Mar 5, 2021 at 11:05 AM Jason Wang wrote: On 2021/3/4 4:58 下午, Yongji Xie wrote: On Thu, Mar 4, 2021 at 2:59 PM Jason Wang wrote: On 2021/2/23 7:50 下午, Xie Yongji wrote: This patch introduces a workqueue to support injecting virtqueue's interrupt asynchronously. This is mainly for performance considerations which makes sure the push() and pop() for used vring can be asynchronous. Do you have pref numbers for this patch? No, I can do some tests for it if needed. Another problem is the VIRTIO_RING_F_EVENT_IDX feature will be useless if we call irq callback in ioctl context. Something like: virtqueue_push(); virtio_notify(); ioctl() - irq_cb() virtqueue_get_buf() The used vring is always empty each time we call virtqueue_push() in userspace. Not sure if it is what we expected. I'm not sure I get the issue. THe used ring should be filled by virtqueue_push() which is done by userspace before? After userspace call virtqueue_push(), it always call virtio_notify() immediately. In traditional VM (vhost-vdpa) cases, virtio_notify() will inject an irq to VM and return, then vcpu thread will call interrupt handler. But in container (virtio-vdpa) cases, virtio_notify() will call interrupt handler directly. So it looks like we have to optimize the virtio-vdpa cases. But one problem is we don't know whether we are in the VM user case or container user case. Yes, but I still don't get why used ring is empty after the ioctl()? Used ring does not use bounce page so it should be visible to the kernel driver. What did I miss :) ? Thanks Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v4 06/11] vduse: Implement an MMU-based IOMMU driver
On 2021/3/4 1:12 下午, Yongji Xie wrote: On Thu, Mar 4, 2021 at 12:21 PM Jason Wang wrote: On 2021/2/23 7:50 下午, Xie Yongji wrote: This implements a MMU-based IOMMU driver to support mapping kernel dma buffer into userspace. The basic idea behind it is treating MMU (VA->PA) as IOMMU (IOVA->PA). The driver will set up MMU mapping instead of IOMMU mapping for the DMA transfer so that the userspace process is able to use its virtual address to access the dma buffer in kernel. And to avoid security issue, a bounce-buffering mechanism is introduced to prevent userspace accessing the original buffer directly. Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_user/iova_domain.c | 486 +++ drivers/vdpa/vdpa_user/iova_domain.h | 61 + 2 files changed, 547 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c new file mode 100644 index ..9285d430d486 --- /dev/null +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -0,0 +1,486 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * MMU-based IOMMU implementation + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include + +#include "iova_domain.h" + +#define IOVA_START_PFN 1 +#define IOVA_ALLOC_ORDER 12 +#define IOVA_ALLOC_SIZE (1 << IOVA_ALLOC_ORDER) + +static inline struct page * +vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova) +{ + u64 index = iova >> PAGE_SHIFT; + + return domain->bounce_pages[index]; +} + +static inline void +vduse_domain_set_bounce_page(struct vduse_iova_domain *domain, + u64 iova, struct page *page) +{ + u64 index = iova >> PAGE_SHIFT; + + domain->bounce_pages[index] = page; +} + +static enum dma_data_direction perm_to_dir(int perm) +{ + enum dma_data_direction dir; + + switch (perm) { + case VHOST_MAP_WO: + dir = DMA_FROM_DEVICE; + break; + case VHOST_MAP_RO: + dir = DMA_TO_DEVICE; + break; + case VHOST_MAP_RW: + dir = DMA_BIDIRECTIONAL; + break; + default: + break; + } + + return dir; +} + +static int dir_to_perm(enum dma_data_direction dir) +{ + int perm = -EFAULT; + + switch (dir) { + case DMA_FROM_DEVICE: + perm = VHOST_MAP_WO; + break; + case DMA_TO_DEVICE: + perm = VHOST_MAP_RO; + break; + case DMA_BIDIRECTIONAL: + perm = VHOST_MAP_RW; + break; + default: + break; + } + + return perm; +} Let's move the above two helpers to vhost_iotlb.h so they could be used by other driver e.g (vpda_sim) Sure. + +static void do_bounce(phys_addr_t orig, void *addr, size_t size, + enum dma_data_direction dir) +{ + unsigned long pfn = PFN_DOWN(orig); + + if (PageHighMem(pfn_to_page(pfn))) { + unsigned int offset = offset_in_page(orig); + char *buffer; + unsigned int sz = 0; + unsigned long flags; + + while (size) { + sz = min_t(size_t, PAGE_SIZE - offset, size); + + local_irq_save(flags); + buffer = kmap_atomic(pfn_to_page(pfn)); + if (dir == DMA_TO_DEVICE) + memcpy(addr, buffer + offset, sz); + else + memcpy(buffer + offset, addr, sz); + kunmap_atomic(buffer); + local_irq_restore(flags); I wonder why we need to deal with highmem and irq flags explicitly like this. Doesn't kmap_atomic() will take care all of those? Yes, irq flags is useless here. Will remove it. + + size -= sz; + pfn++; + addr += sz; + offset = 0; + } + } else if (dir == DMA_TO_DEVICE) { + memcpy(addr, phys_to_virt(orig), size); + } else { + memcpy(phys_to_virt(orig), addr, size); + } +} + +static struct page * +vduse_domain_get_mapping_page(struct vduse_iova_domain *domain, u64 iova) +{ + u64 start = iova & PAGE_MASK; + u64 last = start + PAGE_SIZE - 1; + struct vhost_iotlb_map *map; + struct page *page = NULL; + + spin_lock(&domain->iotlb_lock); + map = vhost_iotlb_itree_first(domain->iotlb, start, last); + if (!map) + goto out; + + page = pfn_to_page((map->addr + iova - map->start) >> PAGE_SHIFT); + get_page(page); +out: + spin_unlock(&domain->iotlb_lock); + + return page; +} + +static struct page * +vduse_domain_alloc_bounce_page(struct vduse_iova_domain *domain, u64 iova) +{ + u64 star
Re: [RFC v4 07/11] vduse: Introduce VDUSE - vDPA Device in Userspace
On 2021/3/4 4:05 下午, Yongji Xie wrote: On Thu, Mar 4, 2021 at 2:27 PM Jason Wang wrote: On 2021/2/23 7:50 下午, Xie Yongji wrote: This VDUSE driver enables implementing vDPA devices in userspace. Both control path and data path of vDPA devices will be able to be handled in userspace. In the control path, the VDUSE driver will make use of message mechnism to forward the config operation from vdpa bus driver to userspace. Userspace can use read()/write() to receive/reply those control messages. In the data path, VDUSE_IOTLB_GET_FD ioctl will be used to get the file descriptors referring to vDPA device's iova regions. Then userspace can use mmap() to access those iova regions. Besides, userspace can use ioctl() to inject interrupt and use the eventfd mechanism to receive virtqueue kicks. Signed-off-by: Xie Yongji --- Documentation/userspace-api/ioctl/ioctl-number.rst |1 + drivers/vdpa/Kconfig | 10 + drivers/vdpa/Makefile |1 + drivers/vdpa/vdpa_user/Makefile|5 + drivers/vdpa/vdpa_user/vduse_dev.c | 1348 include/uapi/linux/vduse.h | 136 ++ 6 files changed, 1501 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index a4c75a28c839..71722e6f8f23 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -300,6 +300,7 @@ Code Seq#Include File Comments 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! '|' 00-7F linux/media.h 0x80 00-1F linux/fb.h +0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index ffd1e098bfd2..92f07715e3b6 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -25,6 +25,16 @@ config VDPA_SIM_NET help vDPA networking device simulator which loops TX traffic back to RX. +config VDPA_USER + tristate "VDUSE (vDPA Device in Userspace) support" + depends on EVENTFD && MMU && HAS_DMA + select DMA_OPS + select VHOST_IOTLB + select IOMMU_IOVA + help + With VDUSE it is possible to emulate a vDPA Device + in a userspace program. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index d160e9b63a66..66e97778ad03 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ +obj-$(CONFIG_VDPA_USER) += vdpa_user/ obj-$(CONFIG_IFCVF)+= ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile new file mode 100644 index ..260e0b26af99 --- /dev/null +++ b/drivers/vdpa/vdpa_user/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +vduse-y := vduse_dev.o iova_domain.o + +obj-$(CONFIG_VDPA_USER) += vduse.o diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c new file mode 100644 index ..393bf99c48be --- /dev/null +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -0,0 +1,1348 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDUSE: vDPA Device in Userspace + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +#define DRV_VERSION "1.0" +#define DRV_AUTHOR "Yongji Xie " +#define DRV_DESC "vDPA Device in Userspace" +#define DRV_LICENSE "GPL v2" + +#define VDUSE_DEV_MAX (1U << MINORBITS) + +struct vduse_virtqueue { + u16 index; + bool ready; + spinlock_t kick_lock; + spinlock_t irq_lock; + struct eventfd_ctx *kickfd; + struct vdpa_callback cb; +}; + +struct vduse_dev; + +struct vduse_vdpa { + struct vdpa_device vdpa; + struct vduse_dev *dev; +}; + +struct vduse_dev { + struct vduse_vdpa *vdev; + struct device dev; + struct cdev cdev; + struct vduse_virtqueue *vqs; + struct vduse_iova_domain *domain; + struct vhost_iotlb *iommu; + spinlock_t iommu_lock; + atomic_t bounce_map; + struct mutex msg_lock; + atomic64_t msg_unique; "next_request_id" should be better. OK
Re: [RFC v4 11/11] vduse: Support binding irq to the specified cpu
On 2021/3/4 4:19 下午, Yongji Xie wrote: On Thu, Mar 4, 2021 at 3:30 PM Jason Wang wrote: On 2021/2/23 7:50 下午, Xie Yongji wrote: Add a parameter for the ioctl VDUSE_INJECT_VQ_IRQ to support injecting virtqueue's interrupt to the specified cpu. How userspace know which CPU is this irq for? It looks to me we need to do it at different level. E.g introduce some API in sys to allow admin to tune for that. But I think we can do that in antoher patch on top of this series. OK. I will think more about it. It should be soemthing like /sys/class/vduse/$dev_name/vq/0/irq_affinity. Also need to make sure eventfd could not be reused. Thanks Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v4 10/11] vduse: Introduce a workqueue for irq injection
On 2021/3/4 4:58 下午, Yongji Xie wrote: On Thu, Mar 4, 2021 at 2:59 PM Jason Wang wrote: On 2021/2/23 7:50 下午, Xie Yongji wrote: This patch introduces a workqueue to support injecting virtqueue's interrupt asynchronously. This is mainly for performance considerations which makes sure the push() and pop() for used vring can be asynchronous. Do you have pref numbers for this patch? No, I can do some tests for it if needed. Another problem is the VIRTIO_RING_F_EVENT_IDX feature will be useless if we call irq callback in ioctl context. Something like: virtqueue_push(); virtio_notify(); ioctl() - irq_cb() virtqueue_get_buf() The used vring is always empty each time we call virtqueue_push() in userspace. Not sure if it is what we expected. I'm not sure I get the issue. THe used ring should be filled by virtqueue_push() which is done by userspace before? Thanks Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2021/3/4 9:50 下午, Cornelia Huck wrote: On Thu, 4 Mar 2021 16:24:16 +0800 Jason Wang wrote: On 2021/3/3 4:29 下午, Cornelia Huck wrote: On Wed, 3 Mar 2021 12:01:01 +0800 Jason Wang wrote: On 2021/3/2 8:08 下午, Cornelia Huck wrote: On Mon, 1 Mar 2021 11:51:08 +0800 Jason Wang wrote: On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote: On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote: On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote: Confused. What is wrong with the above? It never reads the field unless the feature has been offered by device. So the spec said: " The following driver-read-only field, max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ is set. " If I read this correctly, there will be no max_virtqueue_pairs field if the VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() violates what spec said. Thanks I think that's a misunderstanding. This text was never intended to imply that field offsets change beased on feature bits. We had this pain with legacy and we never wanted to go back there. This merely implies that without VIRTIO_NET_F_MQ the field should not be accessed. Exists in the sense "is accessible to driver". Let's just clarify that in the spec, job done. Ok, agree. That will make things more eaiser. Yes, that makes much more sense. What about adding the following to the "Basic Facilities of a Virtio Device/Device Configuration Space" section of the spec: "If an optional configuration field does not exist, the corresponding space is still present, but reserved." This became interesting after re-reading some of the qemu codes. E.g in virtio-net.c we had: *static VirtIOFeature feature_sizes[] = { {.flags = 1ULL << VIRTIO_NET_F_MAC, .end = endof(struct virtio_net_config, mac)}, {.flags = 1ULL << VIRTIO_NET_F_STATUS, .end = endof(struct virtio_net_config, status)}, {.flags = 1ULL << VIRTIO_NET_F_MQ, .end = endof(struct virtio_net_config, max_virtqueue_pairs)}, {.flags = 1ULL << VIRTIO_NET_F_MTU, .end = endof(struct virtio_net_config, mtu)}, {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX, .end = endof(struct virtio_net_config, duplex)}, {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL << VIRTIO_NET_F_HASH_REPORT), .end = endof(struct virtio_net_config, supported_hash_types)}, {} };* *It has a implict dependency chain. E.g MTU doesn't presnet if DUPLEX/RSS is not offered ... * But I think it covers everything up to the relevant field, no? So MTU is included if we have the feature bit, even if we don't have DUPLEX/RSS. Given that a config space may be shorter (but must not collapse non-existing fields), maybe a better wording would be: "If an optional configuration field does not exist, the corresponding space will still be present if it is not at the end of the configuration space (i.e., further configuration fields exist.) This should work but I think we need to define the end of configuration space first? What about sidestepping this: "...the corresponding space will still be present, unless no further configuration fields exist." ? It might work. (I wonder maybe we can give some example in the spec). Thanks This implies that a given field, if it exists, is always at the same offset from the beginning of the configuration space." ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver
On 2021/3/4 14:06, Viresh Kumar wrote: Please always supply version history, it makes it difficult to review otherwise. I will add the history. drivers/i2c/busses/Kconfig | 11 ++ drivers/i2c/busses/Makefile | 3 + drivers/i2c/busses/i2c-virtio.c | 289 include/uapi/linux/virtio_i2c.h | 42 ++ include/uapi/linux/virtio_ids.h | 1 + 5 files changed, 346 insertions(+) create mode 100644 drivers/i2c/busses/i2c-virtio.c create mode 100644 include/uapi/linux/virtio_i2c.h diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index 05ebf75..0860395 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -21,6 +21,17 @@ config I2C_ALI1535 This driver can also be built as a module. If so, the module will be called i2c-ali1535. +config I2C_VIRTIO + tristate "Virtio I2C Adapter" + depends on VIRTIO depends on I2C as well ? No need that. The dependency of I2C is included in the Kconfig in its parent directory. + help + If you say yes to this option, support will be included for the virtio + I2C adapter driver. The hardware can be emulated by any device model + software according to the virtio protocol. + + This driver can also be built as a module. If so, the module + will be called i2c-virtio. + config I2C_ALI1563 tristate "ALI 1563" depends on PCI diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile index 615f35e..b88da08 100644 --- a/drivers/i2c/busses/Makefile +++ b/drivers/i2c/busses/Makefile @@ -6,6 +6,9 @@ # ACPI drivers obj-$(CONFIG_I2C_SCMI)+= i2c-scmi.o +# VIRTIO I2C host controller driver +obj-$(CONFIG_I2C_VIRTIO) += i2c-virtio.o + Not that it is important but I would have added it towards the end instead of at the top of the file.. I'm OK to add it to the end. # PC SMBus host controller drivers obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c new file mode 100644 index 000..98c0fcc --- /dev/null +++ b/drivers/i2c/busses/i2c-virtio.c @@ -0,0 +1,289 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Virtio I2C Bus Driver + * + * The Virtio I2C Specification: + * https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex + * + * Copyright (c) 2021 Intel Corporation. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include I don't think above two are required here.. +#include +#include +#include same here. +#include same here. Will confirm and remove them if they are not required. Thank you. + And this blank line as well, since all are standard linux headers. +#include +#include + +/** + * struct virtio_i2c - virtio I2C data + * @vdev: virtio device for this controller + * @completion: completion of virtio I2C message + * @adap: I2C adapter for this controller + * @i2c_lock: lock for virtqueue processing + * @vq: the virtio virtqueue for communication + */ +struct virtio_i2c { + struct virtio_device *vdev; + struct completion completion; + struct i2c_adapter adap; + struct mutex i2c_lock; i2c_ is redundant here. "lock" sounds good enough. OK. + struct virtqueue *vq; +}; + +/** + * struct virtio_i2c_req - the virtio I2C request structure + * @out_hdr: the OUT header of the virtio I2C message + * @buf: the buffer into which data is read, or from which it's written + * @in_hdr: the IN header of the virtio I2C message + */ +struct virtio_i2c_req { + struct virtio_i2c_out_hdr out_hdr; + u8 *buf; + struct virtio_i2c_in_hdr in_hdr; +}; + +static void virtio_i2c_msg_done(struct virtqueue *vq) +{ + struct virtio_i2c *vi = vq->vdev->priv; + + complete(&vi->completion); +} + +static int virtio_i2c_send_reqs(struct virtqueue *vq, + struct virtio_i2c_req *reqs, + struct i2c_msg *msgs, int nr) +{ + struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr; + int i, outcnt, incnt, err = 0; + u8 *buf; + + for (i = 0; i < nr; i++) { + if (!msgs[i].len) + break; + + /* Only 7-bit mode supported for this moment. For the address format, +* Please check the Virtio I2C Specification. +*/ Please use proper comment style. will fix it. /* * Only 7-bit mode supported for now, check Virtio I2C * specification for format of "addr" field. */ + reqs[i].out_hdr.addr = cpu_to_le16(msgs[i].addr << 1); + + if (i != nr - 1) + reqs[i].out_hdr.flags |= VIRTIO_I2C_FLAGS_FAIL_NEXT; Since flags field hasn't been touched anywhe
RE: [RFC PATCH 06/18] virt/mshv: create, initialize, finalize, delete partition hypercalls
From: Nuno Das Neves Sent: Thursday, March 4, 2021 3:49 PM > > On 2/8/2021 11:42 AM, Michael Kelley wrote: > > From: Nuno Das Neves Sent: Friday, > > November > 20, 2020 4:30 PM > >> [snip] > >> + > >> +static int > >> +hv_call_create_partition( > >> + u64 flags, > >> + struct hv_partition_creation_properties creation_properties, > >> + u64 *partition_id) > >> +{ > >> + struct hv_create_partition_in *input; > >> + struct hv_create_partition_out *output; > >> + int status; > >> + int ret; > >> + unsigned long irq_flags; > >> + int i; > >> + > >> + do { > >> + local_irq_save(irq_flags); > >> + input = (struct hv_create_partition_in *)(*this_cpu_ptr( > >> + hyperv_pcpu_input_arg)); > >> + output = (struct hv_create_partition_out *)(*this_cpu_ptr( > >> + hyperv_pcpu_output_arg)); > >> + > >> + input->flags = flags; > >> + input->proximity_domain_info.as_uint64 = 0; > >> + input->compatibility_version = HV_COMPATIBILITY_MANGANESE; > >> + for (i = 0; i < HV_PARTITION_PROCESSOR_FEATURE_BANKS; ++i) > >> + input->partition_creation_properties > >> + .disabled_processor_features.as_uint64[i] = 0; > >> + input->partition_creation_properties > >> + .disabled_processor_xsave_features.as_uint64 = 0; > >> + input->isolation_properties.as_uint64 = 0; > >> + > >> + status = hv_do_hypercall(HVCALL_CREATE_PARTITION, > >> + input, output); > > > > hv_do_hypercall returns a u64, which should then be masked with > > HV_HYPERCALL_RESULT_MASK before checking the result. > > > > Yes, I'll fix this everywhere. > > >> + if (status != HV_STATUS_INSUFFICIENT_MEMORY) { > >> + if (status == HV_STATUS_SUCCESS) > >> + *partition_id = output->partition_id; > >> + else > >> + pr_err("%s: %s\n", > >> + __func__, hv_status_to_string(status)); > >> + local_irq_restore(irq_flags); > >> + ret = -hv_status_to_errno(status); > >> + break; > >> + } > >> + local_irq_restore(irq_flags); > >> + ret = hv_call_deposit_pages(NUMA_NO_NODE, > >> + hv_current_partition_id, 1); > >> + } while (!ret); > >> + > >> + return ret; > >> +} > >> + I had a separate thread on the linux-hyperv mailing list about the inconsistency in how we check hypercall status in current upstream code, and proposed some helper functions to make it easier and more consistent. Joe Salisbury has started work on a patch to provide those helper functions and to start using them in current upstream code. You could coordinate with Joe to get the helper functions as well and use them as discussed in that thread. Then later on we won't have to come back and fix up the uses in this patch series. Michael ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Freedreno] [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG
On Thu, Mar 4, 2021 at 7:48 AM Robin Murphy wrote: > > On 2021-03-01 08:42, Christoph Hellwig wrote: > > Signed-off-by: Christoph Hellwig > > Moreso than the previous patch, where the feature is at least relatively > generic (note that there's a bunch of in-flight development around > DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to > bloat the generic iommu_ops structure with private driver-specific > interfaces. The attribute interface is a great compromise for these > kinds of things, and you can easily add type-checked wrappers around it > for external callers (maybe even make the actual attributes internal > between the IOMMU core and drivers) if that's your concern. I suppose if this is *just* for the GPU we could move it into adreno_smmu_priv.. But one thing I'm not sure about is whether IO_PGTABLE_QUIRK_ARM_OUTER_WBWA is something that other devices *should* be using as well, but just haven't gotten around to yet. BR, -R > Robin. > > > --- > > drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +- > > drivers/iommu/arm/arm-smmu/arm-smmu.c | 40 +++-- > > drivers/iommu/iommu.c | 9 ++ > > include/linux/iommu.h | 9 +- > > 4 files changed, 29 insertions(+), 31 deletions(-) > > > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > index 0f184c3dd9d9ec..78d98ab2ee3a68 100644 > > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c > > @@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain > > *iommu) > > struct io_pgtable_domain_attr pgtbl_cfg; > > > > pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA; > > - iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg); > > + iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg); > > } > > > > struct msm_gem_address_space * > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c > > b/drivers/iommu/arm/arm-smmu/arm-smmu.c > > index 2e17d990d04481..2858999c86dfd1 100644 > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > > @@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct > > iommu_domain *domain) > > return ret; > > } > > > > -static int arm_smmu_domain_set_attr(struct iommu_domain *domain, > > - enum iommu_attr attr, void *data) > > +static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain, > > + struct io_pgtable_domain_attr *pgtbl_cfg) > > { > > - int ret = 0; > > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > > + int ret = -EPERM; > > > > - mutex_lock(&smmu_domain->init_mutex); > > - > > - switch(domain->type) { > > - case IOMMU_DOMAIN_UNMANAGED: > > - switch (attr) { > > - case DOMAIN_ATTR_IO_PGTABLE_CFG: { > > - struct io_pgtable_domain_attr *pgtbl_cfg = data; > > - > > - if (smmu_domain->smmu) { > > - ret = -EPERM; > > - goto out_unlock; > > - } > > + if (domain->type != IOMMU_DOMAIN_UNMANAGED) > > + return -EINVAL; > > > > - smmu_domain->pgtbl_cfg = *pgtbl_cfg; > > - break; > > - } > > - default: > > - ret = -ENODEV; > > - } > > - break; > > - case IOMMU_DOMAIN_DMA: > > - ret = -ENODEV; > > - break; > > - default: > > - ret = -EINVAL; > > + mutex_lock(&smmu_domain->init_mutex); > > + if (!smmu_domain->smmu) { > > + smmu_domain->pgtbl_cfg = *pgtbl_cfg; > > + ret = 0; > > } > > -out_unlock: > > mutex_unlock(&smmu_domain->init_mutex); > > + > > return ret; > > } > > > > @@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = { > > .device_group = arm_smmu_device_group, > > .dma_use_flush_queue= arm_smmu_dma_use_flush_queue, > > .dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue, > > - .domain_set_attr= arm_smmu_domain_set_attr, > > + .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr, > > .domain_enable_nesting = arm_smmu_domain_enable_nesting, > > .of_xlate = arm_smmu_of_xlate, > > .get_resv_regions = arm_smmu_get_resv_regions, > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 2e9e058501a953..8490aefd4b41f8 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain > > *domain) > > } > > EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting); > > > > +int iommu_domain_set_pgtable_attr(struct iommu_domain *domain, > > + struct io_pgtable_domain_attr *pgtbl_
Re: [patch 1/7] drm/ttm: Replace kmap_atomic() usage
Am 03.03.21 um 14:20 schrieb Thomas Gleixner: From: Thomas Gleixner There is no reason to disable pagefaults and preemption as a side effect of kmap_atomic_prot(). Use kmap_local_page_prot() instead and document the reasoning for the mapping usage with the given pgprot. Remove the NULL pointer check for the map. These functions return a valid address for valid pages and the return was bogus anyway as it would have left preemption and pagefaults disabled. Signed-off-by: Thomas Gleixner Cc: Christian Koenig Cc: Huang Rui Cc: David Airlie Cc: Daniel Vetter Cc: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/ttm/ttm_bo_util.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -181,13 +181,15 @@ static int ttm_copy_io_ttm_page(struct t return -ENOMEM; src = (void *)((unsigned long)src + (page << PAGE_SHIFT)); - dst = kmap_atomic_prot(d, prot); - if (!dst) - return -ENOMEM; + /* +* Ensure that a highmem page is mapped with the correct +* pgprot. For non highmem the mapping is already there. +*/ I find the comment a bit misleading. Maybe write: /* * Locally map highmem pages with the correct pgprot. * Normal memory should already have the correct pgprot in the linear mapping. */ Apart from that looks good to me. Regards, Christian. + dst = kmap_local_page_prot(d, prot); memcpy_fromio(dst, src, PAGE_SIZE); - kunmap_atomic(dst); + kunmap_local(dst); return 0; } @@ -203,13 +205,15 @@ static int ttm_copy_ttm_io_page(struct t return -ENOMEM; dst = (void *)((unsigned long)dst + (page << PAGE_SHIFT)); - src = kmap_atomic_prot(s, prot); - if (!src) - return -ENOMEM; + /* +* Ensure that a highmem page is mapped with the correct +* pgprot. For non highmem the mapping is already there. +*/ + src = kmap_local_page_prot(s, prot); memcpy_toio(dst, src, PAGE_SIZE); - kunmap_atomic(src); + kunmap_local(src); return 0; } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] scsi: target: vhost-scsi: remove redundant initialization of variable ret
On Wed, Mar 03, 2021 at 01:43:39PM +, Colin King wrote: > From: Colin Ian King > > The variable ret is being initialized with a value that is never read > and it is being updated later with a new value. The initialization is > redundant and can be removed. > > Addresses-Coverity: ("Unused value") > Signed-off-by: Colin Ian King > --- > drivers/vhost/scsi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Which kernel version is this patch based on? If it's a fix for a patch that hasn't landed yet, please indicate this. A "Fixes: ..." tag should be added to this patch as well. I looked at linux.git/master commit f69d02e37a85645aa90d18cacfff36dba370f797 and see this: static int __init vhost_scsi_init(void) { int ret = -ENOMEM; pr_debug("TCM_VHOST fabric module %s on %s/%s" " on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname, utsname()->machine); /* * Use our own dedicated workqueue for submitting I/O into * target core to avoid contention within system_wq. */ vhost_scsi_workqueue = alloc_workqueue("vhost_scsi", 0, 0); if (!vhost_scsi_workqueue) goto out; We need ret's initialization value here ^ ret = vhost_scsi_register(); if (ret < 0) goto out_destroy_workqueue; ret = target_register_template(&vhost_scsi_ops); if (ret < 0) goto out_vhost_scsi_deregister; return 0; out_vhost_scsi_deregister: vhost_scsi_deregister(); out_destroy_workqueue: destroy_workqueue(vhost_scsi_workqueue); out: return ret; }; > > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c > index d16c04dcc144..9129ab8187fd 100644 > --- a/drivers/vhost/scsi.c > +++ b/drivers/vhost/scsi.c > @@ -2465,7 +2465,7 @@ static const struct target_core_fabric_ops > vhost_scsi_ops = { > > static int __init vhost_scsi_init(void) > { > - int ret = -ENOMEM; > + int ret; > > pr_debug("TCM_VHOST fabric module %s on %s/%s" > " on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname, > -- > 2.30.0 > signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 16/17] iommu: remove DOMAIN_ATTR_IO_PGTABLE_CFG
On 2021-03-01 08:42, Christoph Hellwig wrote: Signed-off-by: Christoph Hellwig Moreso than the previous patch, where the feature is at least relatively generic (note that there's a bunch of in-flight development around DOMAIN_ATTR_NESTING), I'm really not convinced that it's beneficial to bloat the generic iommu_ops structure with private driver-specific interfaces. The attribute interface is a great compromise for these kinds of things, and you can easily add type-checked wrappers around it for external callers (maybe even make the actual attributes internal between the IOMMU core and drivers) if that's your concern. Robin. --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +- drivers/iommu/arm/arm-smmu/arm-smmu.c | 40 +++-- drivers/iommu/iommu.c | 9 ++ include/linux/iommu.h | 9 +- 4 files changed, 29 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 0f184c3dd9d9ec..78d98ab2ee3a68 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -191,7 +191,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu) struct io_pgtable_domain_attr pgtbl_cfg; pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA; - iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg); + iommu_domain_set_pgtable_attr(iommu, &pgtbl_cfg); } struct msm_gem_address_space * diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 2e17d990d04481..2858999c86dfd1 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1515,40 +1515,22 @@ static int arm_smmu_domain_enable_nesting(struct iommu_domain *domain) return ret; } -static int arm_smmu_domain_set_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) +static int arm_smmu_domain_set_pgtable_attr(struct iommu_domain *domain, + struct io_pgtable_domain_attr *pgtbl_cfg) { - int ret = 0; struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + int ret = -EPERM; - mutex_lock(&smmu_domain->init_mutex); - - switch(domain->type) { - case IOMMU_DOMAIN_UNMANAGED: - switch (attr) { - case DOMAIN_ATTR_IO_PGTABLE_CFG: { - struct io_pgtable_domain_attr *pgtbl_cfg = data; - - if (smmu_domain->smmu) { - ret = -EPERM; - goto out_unlock; - } + if (domain->type != IOMMU_DOMAIN_UNMANAGED) + return -EINVAL; - smmu_domain->pgtbl_cfg = *pgtbl_cfg; - break; - } - default: - ret = -ENODEV; - } - break; - case IOMMU_DOMAIN_DMA: - ret = -ENODEV; - break; - default: - ret = -EINVAL; + mutex_lock(&smmu_domain->init_mutex); + if (!smmu_domain->smmu) { + smmu_domain->pgtbl_cfg = *pgtbl_cfg; + ret = 0; } -out_unlock: mutex_unlock(&smmu_domain->init_mutex); + return ret; } @@ -1609,7 +1591,7 @@ static struct iommu_ops arm_smmu_ops = { .device_group = arm_smmu_device_group, .dma_use_flush_queue= arm_smmu_dma_use_flush_queue, .dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue, - .domain_set_attr= arm_smmu_domain_set_attr, + .domain_set_pgtable_attr = arm_smmu_domain_set_pgtable_attr, .domain_enable_nesting = arm_smmu_domain_enable_nesting, .of_xlate = arm_smmu_of_xlate, .get_resv_regions = arm_smmu_get_resv_regions, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2e9e058501a953..8490aefd4b41f8 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2693,6 +2693,15 @@ int iommu_domain_enable_nesting(struct iommu_domain *domain) } EXPORT_SYMBOL_GPL(iommu_domain_enable_nesting); +int iommu_domain_set_pgtable_attr(struct iommu_domain *domain, + struct io_pgtable_domain_attr *pgtbl_cfg) +{ + if (!domain->ops->domain_set_pgtable_attr) + return -EINVAL; + return domain->ops->domain_set_pgtable_attr(domain, pgtbl_cfg); +} +EXPORT_SYMBOL_GPL(iommu_domain_set_pgtable_attr); + void iommu_get_resv_regions(struct device *dev, struct list_head *list) { const struct iommu_ops *ops = dev->bus->iommu_ops; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index aed88aa3bd3edf..39d3ed4d2700ac 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -40,6 +40,7 @@ struct iommu_domain; struct notifier_block; struct iommu_sva; struct iommu_fault_event; +stru
Re: [PATCH 14/17] iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE
On 2021-03-01 08:42, Christoph Hellwig wrote: Use explicit methods for setting and querying the information instead. Now that everyone's using iommu-dma, is there any point in bouncing this through the drivers at all? Seems like it would make more sense for the x86 drivers to reflect their private options back to iommu_dma_strict (and allow Intel's caching mode to override it as well), then have iommu_dma_init_domain just test !iommu_dma_strict && domain->ops->flush_iotlb_all. Robin. Also remove the now unused iommu_domain_get_attr functionality. Signed-off-by: Christoph Hellwig --- drivers/iommu/amd/iommu.c | 23 ++--- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++--- drivers/iommu/arm/arm-smmu/arm-smmu.c | 56 + drivers/iommu/dma-iommu.c | 8 ++- drivers/iommu/intel/iommu.c | 27 ++ drivers/iommu/iommu.c | 19 +++ include/linux/iommu.h | 17 ++- 7 files changed, 51 insertions(+), 146 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index a69a8b573e40d0..37a8e51db17656 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1771,24 +1771,11 @@ static struct iommu_group *amd_iommu_device_group(struct device *dev) return acpihid_device_group(dev); } -static int amd_iommu_domain_get_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) +static bool amd_iommu_dma_use_flush_queue(struct iommu_domain *domain) { - switch (domain->type) { - case IOMMU_DOMAIN_UNMANAGED: - return -ENODEV; - case IOMMU_DOMAIN_DMA: - switch (attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = !amd_iommu_unmap_flush; - return 0; - default: - return -ENODEV; - } - break; - default: - return -EINVAL; - } + if (domain->type != IOMMU_DOMAIN_DMA) + return false; + return !amd_iommu_unmap_flush; } /* @@ -2257,7 +2244,7 @@ const struct iommu_ops amd_iommu_ops = { .release_device = amd_iommu_release_device, .probe_finalize = amd_iommu_probe_finalize, .device_group = amd_iommu_device_group, - .domain_get_attr = amd_iommu_domain_get_attr, + .dma_use_flush_queue = amd_iommu_dma_use_flush_queue, .get_resv_regions = amd_iommu_get_resv_regions, .put_resv_regions = generic_iommu_put_resv_regions, .is_attach_deferred = amd_iommu_is_attach_deferred, diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 8594b4a8304375..bf96172e8c1f71 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2449,33 +2449,21 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev) return group; } -static int arm_smmu_domain_get_attr(struct iommu_domain *domain, - enum iommu_attr attr, void *data) +static bool arm_smmu_dma_use_flush_queue(struct iommu_domain *domain) { struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); - switch (domain->type) { - case IOMMU_DOMAIN_UNMANAGED: - switch (attr) { - case DOMAIN_ATTR_NESTING: - *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); - return 0; - default: - return -ENODEV; - } - break; - case IOMMU_DOMAIN_DMA: - switch (attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = smmu_domain->non_strict; - return 0; - default: - return -ENODEV; - } - break; - default: - return -EINVAL; - } + if (domain->type != IOMMU_DOMAIN_DMA) + return false; + return smmu_domain->non_strict; +} + + +static void arm_smmu_dma_enable_flush_queue(struct iommu_domain *domain) +{ + if (domain->type != IOMMU_DOMAIN_DMA) + return; + to_smmu_domain(domain)->non_strict = true; } static int arm_smmu_domain_set_attr(struct iommu_domain *domain, @@ -2505,13 +2493,7 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain, } break; case IOMMU_DOMAIN_DMA: - switch(attr) { - case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - smmu_domain->non_strict = *(int *)data; - break; - default: - ret = -EN
[patch 5/7] drm/nouveau/device: Replace io_mapping_map_atomic_wc()
From: Thomas Gleixner Neither fbmem_peek() nor fbmem_poke() require to disable pagefaults and preemption as a side effect of io_mapping_map_atomic_wc(). Use io_mapping_map_local_wc() instead. Signed-off-by: Thomas Gleixner Cc: Ben Skeggs Cc: David Airlie Cc: Daniel Vetter Cc: dri-de...@lists.freedesktop.org Cc: nouv...@lists.freedesktop.org --- drivers/gpu/drm/nouveau/nvkm/subdev/devinit/fbmem.h |8 1 file changed, 4 insertions(+), 4 deletions(-) --- a/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/fbmem.h +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/devinit/fbmem.h @@ -60,19 +60,19 @@ fbmem_fini(struct io_mapping *fb) static inline u32 fbmem_peek(struct io_mapping *fb, u32 off) { - u8 __iomem *p = io_mapping_map_atomic_wc(fb, off & PAGE_MASK); + u8 __iomem *p = io_mapping_map_local_wc(fb, off & PAGE_MASK); u32 val = ioread32(p + (off & ~PAGE_MASK)); - io_mapping_unmap_atomic(p); + io_mapping_unmap_local(p); return val; } static inline void fbmem_poke(struct io_mapping *fb, u32 off, u32 val) { - u8 __iomem *p = io_mapping_map_atomic_wc(fb, off & PAGE_MASK); + u8 __iomem *p = io_mapping_map_local_wc(fb, off & PAGE_MASK); iowrite32(val, p + (off & ~PAGE_MASK)); wmb(); - io_mapping_unmap_atomic(p); + io_mapping_unmap_local(p); } static inline bool ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[patch 2/7] drm/vmgfx: Replace kmap_atomic()
From: Thomas Gleixner There is no reason to disable pagefaults and preemption as a side effect of kmap_atomic_prot(). Use kmap_local_page_prot() instead and document the reasoning for the mapping usage with the given pgprot. Remove the NULL pointer check for the map. These functions return a valid address for valid pages and the return was bogus anyway as it would have left preemption and pagefaults disabled. Signed-off-by: Thomas Gleixner Cc: VMware Graphics Cc: Roland Scheidegger Cc: Zack Rusin Cc: David Airlie Cc: Daniel Vetter Cc: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/vmwgfx/vmwgfx_blit.c | 30 -- 1 file changed, 12 insertions(+), 18 deletions(-) --- a/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_blit.c @@ -375,12 +375,12 @@ static int vmw_bo_cpu_blit_line(struct v copy_size = min_t(u32, copy_size, PAGE_SIZE - src_page_offset); if (unmap_src) { - kunmap_atomic(d->src_addr); + kunmap_local(d->src_addr); d->src_addr = NULL; } if (unmap_dst) { - kunmap_atomic(d->dst_addr); + kunmap_local(d->dst_addr); d->dst_addr = NULL; } @@ -388,12 +388,8 @@ static int vmw_bo_cpu_blit_line(struct v if (WARN_ON_ONCE(dst_page >= d->dst_num_pages)) return -EINVAL; - d->dst_addr = - kmap_atomic_prot(d->dst_pages[dst_page], -d->dst_prot); - if (!d->dst_addr) - return -ENOMEM; - + d->dst_addr = kmap_local_page_prot(d->dst_pages[dst_page], + d->dst_prot); d->mapped_dst = dst_page; } @@ -401,12 +397,8 @@ static int vmw_bo_cpu_blit_line(struct v if (WARN_ON_ONCE(src_page >= d->src_num_pages)) return -EINVAL; - d->src_addr = - kmap_atomic_prot(d->src_pages[src_page], -d->src_prot); - if (!d->src_addr) - return -ENOMEM; - + d->src_addr = kmap_local_page_prot(d->src_pages[src_page], + d->src_prot); d->mapped_src = src_page; } diff->do_cpy(diff, d->dst_addr + dst_page_offset, @@ -436,8 +428,10 @@ static int vmw_bo_cpu_blit_line(struct v * * Performs a CPU blit from one buffer object to another avoiding a full * bo vmap which may exhaust- or fragment vmalloc space. - * On supported architectures (x86), we're using kmap_atomic which avoids - * cross-processor TLB- and cache flushes and may, on non-HIGHMEM systems + * + * On supported architectures (x86), we're using kmap_local_prot() which + * avoids cross-processor TLB- and cache flushes. kmap_local_prot() will + * either map a highmem page with the proper pgprot on HIGHMEM=y systems or * reference already set-up mappings. * * Neither of the buffer objects may be placed in PCI memory @@ -500,9 +494,9 @@ int vmw_bo_cpu_blit(struct ttm_buffer_ob } out: if (d.src_addr) - kunmap_atomic(d.src_addr); + kunmap_local(d.src_addr); if (d.dst_addr) - kunmap_atomic(d.dst_addr); + kunmap_local(d.dst_addr); return ret; } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[patch 6/7] drm/i915: Replace io_mapping_map_atomic_wc()
From: Thomas Gleixner None of these mapping requires the side effect of disabling pagefaults and preemption. Use io_mapping_map_local_wc() instead, and clean up gtt_user_read() and gtt_user_write() to use a plain copy_from_user() as the local maps are not disabling pagefaults. Signed-off-by: Thomas Gleixner Cc: Jani Nikula Cc: Joonas Lahtinen Cc: Rodrigo Vivi Cc: David Airlie Cc: Daniel Vetter Cc: Chris Wilson Cc: intel-...@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |7 +--- drivers/gpu/drm/i915/i915_gem.c| 40 - drivers/gpu/drm/i915/selftests/i915_gem.c |4 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |8 ++--- 4 files changed, 22 insertions(+), 37 deletions(-) --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1080,7 +1080,7 @@ static void reloc_cache_reset(struct rel struct i915_ggtt *ggtt = cache_to_ggtt(cache); intel_gt_flush_ggtt_writes(ggtt->vm.gt); - io_mapping_unmap_atomic((void __iomem *)vaddr); + io_mapping_unmap_local((void __iomem *)vaddr); if (drm_mm_node_allocated(&cache->node)) { ggtt->vm.clear_range(&ggtt->vm, @@ -1146,7 +1146,7 @@ static void *reloc_iomap(struct drm_i915 if (cache->vaddr) { intel_gt_flush_ggtt_writes(ggtt->vm.gt); - io_mapping_unmap_atomic((void __force __iomem *) unmask_page(cache->vaddr)); + io_mapping_unmap_local((void __force __iomem *) unmask_page(cache->vaddr)); } else { struct i915_vma *vma; int err; @@ -1194,8 +1194,7 @@ static void *reloc_iomap(struct drm_i915 offset += page << PAGE_SHIFT; } - vaddr = (void __force *)io_mapping_map_atomic_wc(&ggtt->iomap, -offset); + vaddr = (void __force *)io_mapping_map_local_wc(&ggtt->iomap, offset); cache->page = page; cache->vaddr = (unsigned long)vaddr; --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -253,22 +253,15 @@ gtt_user_read(struct io_mapping *mapping char __user *user_data, int length) { void __iomem *vaddr; - unsigned long unwritten; + bool fail = false; /* We can use the cpu mem copy function because this is X86. */ - vaddr = io_mapping_map_atomic_wc(mapping, base); - unwritten = __copy_to_user_inatomic(user_data, - (void __force *)vaddr + offset, - length); - io_mapping_unmap_atomic(vaddr); - if (unwritten) { - vaddr = io_mapping_map_wc(mapping, base, PAGE_SIZE); - unwritten = copy_to_user(user_data, -(void __force *)vaddr + offset, -length); - io_mapping_unmap(vaddr); - } - return unwritten; + vaddr = io_mapping_map_local_wc(mapping, base); + if (copy_to_user(user_data, (void __force *)vaddr + offset, length)) + fail = true; + io_mapping_unmap_local(vaddr); + + return fail; } static int @@ -437,21 +430,14 @@ ggtt_write(struct io_mapping *mapping, char __user *user_data, int length) { void __iomem *vaddr; - unsigned long unwritten; + bool fail = false; /* We can use the cpu mem copy function because this is X86. */ - vaddr = io_mapping_map_atomic_wc(mapping, base); - unwritten = __copy_from_user_inatomic_nocache((void __force *)vaddr + offset, - user_data, length); - io_mapping_unmap_atomic(vaddr); - if (unwritten) { - vaddr = io_mapping_map_wc(mapping, base, PAGE_SIZE); - unwritten = copy_from_user((void __force *)vaddr + offset, - user_data, length); - io_mapping_unmap(vaddr); - } - - return unwritten; + vaddr = io_mapping_map_local_wc(mapping, base); + if (copy_from_user((void __force *)vaddr + offset, user_data, length)) + fail = true; + io_mapping_unmap_local(vaddr); + return fail; } /** --- a/drivers/gpu/drm/i915/selftests/i915_gem.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem.c @@ -58,12 +58,12 @@ static void trash_stolen(struct drm_i915 ggtt->vm.insert_page(&ggtt->vm, dma, slot, I915_CACHE_NONE, 0); - s = io_mapping_map_atomic_wc(&ggtt->iomap, slot); + s = io_mapping_map_local_wc(&ggtt->iomap, slot); for (x = 0; x < PAGE_SIZE / sizeof(u32); x++) { prng = next_pseudo_random32(prng); io
[patch 7/7] io-mapping: Remove io_mapping_map_atomic_wc()
From: Thomas Gleixner No more users. Get rid of it and remove the traces in documentation. Signed-off-by: Thomas Gleixner Cc: Andrew Morton Cc: linux...@kvack.org --- Documentation/driver-api/io-mapping.rst | 22 +--- include/linux/io-mapping.h | 42 +--- 2 files changed, 9 insertions(+), 55 deletions(-) --- a/Documentation/driver-api/io-mapping.rst +++ b/Documentation/driver-api/io-mapping.rst @@ -21,19 +21,15 @@ mappable, while 'size' indicates how lar enable. Both are in bytes. This _wc variant provides a mapping which may only be used with -io_mapping_map_atomic_wc(), io_mapping_map_local_wc() or -io_mapping_map_wc(). +io_mapping_map_local_wc() or io_mapping_map_wc(). With this mapping object, individual pages can be mapped either temporarily or long term, depending on the requirements. Of course, temporary maps are -more efficient. They come in two flavours:: +more efficient. void *io_mapping_map_local_wc(struct io_mapping *mapping, unsigned long offset) - void *io_mapping_map_atomic_wc(struct io_mapping *mapping, - unsigned long offset) - 'offset' is the offset within the defined mapping region. Accessing addresses beyond the region specified in the creation function yields undefined results. Using an offset which is not page aligned yields an @@ -50,9 +46,6 @@ io_mapping_map_local_wc() has a side eff migration to make the mapping code work. No caller can rely on this side effect. -io_mapping_map_atomic_wc() has the side effect of disabling preemption and -pagefaults. Don't use in new code. Use io_mapping_map_local_wc() instead. - Nested mappings need to be undone in reverse order because the mapping code uses a stack for keeping track of them:: @@ -65,11 +58,10 @@ Nested mappings need to be undone in rev The mappings are released with:: void io_mapping_unmap_local(void *vaddr) - void io_mapping_unmap_atomic(void *vaddr) -'vaddr' must be the value returned by the last io_mapping_map_local_wc() or -io_mapping_map_atomic_wc() call. This unmaps the specified mapping and -undoes the side effects of the mapping functions. +'vaddr' must be the value returned by the last io_mapping_map_local_wc() +call. This unmaps the specified mapping and undoes eventual side effects of +the mapping function. If you need to sleep while holding a mapping, you can use the regular variant, although this may be significantly slower:: @@ -77,8 +69,8 @@ If you need to sleep while holding a map void *io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset) -This works like io_mapping_map_atomic/local_wc() except it has no side -effects and the pointer is globaly visible. +This works like io_mapping_map_local_wc() except it has no side effects and +the pointer is globaly visible. The mappings are released with:: --- a/include/linux/io-mapping.h +++ b/include/linux/io-mapping.h @@ -60,28 +60,7 @@ io_mapping_fini(struct io_mapping *mappi iomap_free(mapping->base, mapping->size); } -/* Atomic map/unmap */ -static inline void __iomem * -io_mapping_map_atomic_wc(struct io_mapping *mapping, -unsigned long offset) -{ - resource_size_t phys_addr; - - BUG_ON(offset >= mapping->size); - phys_addr = mapping->base + offset; - preempt_disable(); - pagefault_disable(); - return __iomap_local_pfn_prot(PHYS_PFN(phys_addr), mapping->prot); -} - -static inline void -io_mapping_unmap_atomic(void __iomem *vaddr) -{ - kunmap_local_indexed((void __force *)vaddr); - pagefault_enable(); - preempt_enable(); -} - +/* Temporary mappings which are only valid in the current context */ static inline void __iomem * io_mapping_map_local_wc(struct io_mapping *mapping, unsigned long offset) { @@ -163,24 +142,7 @@ io_mapping_unmap(void __iomem *vaddr) { } -/* Atomic map/unmap */ -static inline void __iomem * -io_mapping_map_atomic_wc(struct io_mapping *mapping, -unsigned long offset) -{ - preempt_disable(); - pagefault_disable(); - return io_mapping_map_wc(mapping, offset, PAGE_SIZE); -} - -static inline void -io_mapping_unmap_atomic(void __iomem *vaddr) -{ - io_mapping_unmap(vaddr); - pagefault_enable(); - preempt_enable(); -} - +/* Temporary mappings which are only valid in the current context */ static inline void __iomem * io_mapping_map_local_wc(struct io_mapping *mapping, unsigned long offset) { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[patch 4/7] drm/qxl: Replace io_mapping_map_atomic_wc()
From: Thomas Gleixner None of these mapping requires the side effect of disabling pagefaults and preemption. Use io_mapping_map_local_wc() instead, rename the related functions accordingly and clean up qxl_process_single_command() to use a plain copy_from_user() as the local maps are not disabling pagefaults. Signed-off-by: Thomas Gleixner Cc: David Airlie Cc: Gerd Hoffmann Cc: Daniel Vetter Cc: virtualization@lists.linux-foundation.org Cc: spice-de...@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/qxl/qxl_image.c | 18 +- drivers/gpu/drm/qxl/qxl_ioctl.c | 27 +-- drivers/gpu/drm/qxl/qxl_object.c | 12 ++-- drivers/gpu/drm/qxl/qxl_object.h |4 ++-- drivers/gpu/drm/qxl/qxl_release.c |4 ++-- 5 files changed, 32 insertions(+), 33 deletions(-) --- a/drivers/gpu/drm/qxl/qxl_image.c +++ b/drivers/gpu/drm/qxl/qxl_image.c @@ -124,12 +124,12 @@ qxl_image_init_helper(struct qxl_device wrong (check the bitmaps are sent correctly first) */ - ptr = qxl_bo_kmap_atomic_page(qdev, chunk_bo, 0); + ptr = qxl_bo_kmap_local_page(qdev, chunk_bo, 0); chunk = ptr; chunk->data_size = height * chunk_stride; chunk->prev_chunk = 0; chunk->next_chunk = 0; - qxl_bo_kunmap_atomic_page(qdev, chunk_bo, ptr); + qxl_bo_kunmap_local_page(qdev, chunk_bo, ptr); { void *k_data, *i_data; @@ -143,7 +143,7 @@ qxl_image_init_helper(struct qxl_device i_data = (void *)data; while (remain > 0) { - ptr = qxl_bo_kmap_atomic_page(qdev, chunk_bo, page << PAGE_SHIFT); + ptr = qxl_bo_kmap_local_page(qdev, chunk_bo, page << PAGE_SHIFT); if (page == 0) { chunk = ptr; @@ -157,7 +157,7 @@ qxl_image_init_helper(struct qxl_device memcpy(k_data, i_data, size); - qxl_bo_kunmap_atomic_page(qdev, chunk_bo, ptr); + qxl_bo_kunmap_local_page(qdev, chunk_bo, ptr); i_data += size; remain -= size; page++; @@ -175,10 +175,10 @@ qxl_image_init_helper(struct qxl_device page_offset = offset_in_page(out_offset); size = min((int)(PAGE_SIZE - page_offset), remain); - ptr = qxl_bo_kmap_atomic_page(qdev, chunk_bo, page_base); + ptr = qxl_bo_kmap_local_page(qdev, chunk_bo, page_base); k_data = ptr + page_offset; memcpy(k_data, i_data, size); - qxl_bo_kunmap_atomic_page(qdev, chunk_bo, ptr); + qxl_bo_kunmap_local_page(qdev, chunk_bo, ptr); remain -= size; i_data += size; out_offset += size; @@ -189,7 +189,7 @@ qxl_image_init_helper(struct qxl_device qxl_bo_kunmap(chunk_bo); image_bo = dimage->bo; - ptr = qxl_bo_kmap_atomic_page(qdev, image_bo, 0); + ptr = qxl_bo_kmap_local_page(qdev, image_bo, 0); image = ptr; image->descriptor.id = 0; @@ -212,7 +212,7 @@ qxl_image_init_helper(struct qxl_device break; default: DRM_ERROR("unsupported image bit depth\n"); - qxl_bo_kunmap_atomic_page(qdev, image_bo, ptr); + qxl_bo_kunmap_local_page(qdev, image_bo, ptr); return -EINVAL; } image->u.bitmap.flags = QXL_BITMAP_TOP_DOWN; @@ -222,7 +222,7 @@ qxl_image_init_helper(struct qxl_device image->u.bitmap.palette = 0; image->u.bitmap.data = qxl_bo_physical_address(qdev, chunk_bo, 0); - qxl_bo_kunmap_atomic_page(qdev, image_bo, ptr); + qxl_bo_kunmap_local_page(qdev, image_bo, ptr); return 0; } --- a/drivers/gpu/drm/qxl/qxl_ioctl.c +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c @@ -89,11 +89,11 @@ apply_reloc(struct qxl_device *qdev, str { void *reloc_page; - reloc_page = qxl_bo_kmap_atomic_page(qdev, info->dst_bo, info->dst_offset & PAGE_MASK); + reloc_page = qxl_bo_kmap_local_page(qdev, info->dst_bo, info->dst_offset & PAGE_MASK); *(uint64_t *)(reloc_page + (info->dst_offset & ~PAGE_MASK)) = qxl_bo_physical_address(qdev, info->src_bo, info->src_o
[patch 1/7] drm/ttm: Replace kmap_atomic() usage
From: Thomas Gleixner There is no reason to disable pagefaults and preemption as a side effect of kmap_atomic_prot(). Use kmap_local_page_prot() instead and document the reasoning for the mapping usage with the given pgprot. Remove the NULL pointer check for the map. These functions return a valid address for valid pages and the return was bogus anyway as it would have left preemption and pagefaults disabled. Signed-off-by: Thomas Gleixner Cc: Christian Koenig Cc: Huang Rui Cc: David Airlie Cc: Daniel Vetter Cc: dri-de...@lists.freedesktop.org --- drivers/gpu/drm/ttm/ttm_bo_util.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -181,13 +181,15 @@ static int ttm_copy_io_ttm_page(struct t return -ENOMEM; src = (void *)((unsigned long)src + (page << PAGE_SHIFT)); - dst = kmap_atomic_prot(d, prot); - if (!dst) - return -ENOMEM; + /* +* Ensure that a highmem page is mapped with the correct +* pgprot. For non highmem the mapping is already there. +*/ + dst = kmap_local_page_prot(d, prot); memcpy_fromio(dst, src, PAGE_SIZE); - kunmap_atomic(dst); + kunmap_local(dst); return 0; } @@ -203,13 +205,15 @@ static int ttm_copy_ttm_io_page(struct t return -ENOMEM; dst = (void *)((unsigned long)dst + (page << PAGE_SHIFT)); - src = kmap_atomic_prot(s, prot); - if (!src) - return -ENOMEM; + /* +* Ensure that a highmem page is mapped with the correct +* pgprot. For non highmem the mapping is already there. +*/ + src = kmap_local_page_prot(s, prot); memcpy_toio(dst, src, PAGE_SIZE); - kunmap_atomic(src); + kunmap_local(src); return 0; } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[patch 3/7] highmem: Remove kmap_atomic_prot()
From: Thomas Gleixner No more users. Signed-off-by: Thomas Gleixner Cc: Andrew Morton Cc: linux...@kvack.org --- include/linux/highmem-internal.h | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -88,16 +88,11 @@ static inline void __kunmap_local(void * kunmap_local_indexed(vaddr); } -static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) +static inline void *kmap_atomic(struct page *page) { preempt_disable(); pagefault_disable(); - return __kmap_local_page_prot(page, prot); -} - -static inline void *kmap_atomic(struct page *page) -{ - return kmap_atomic_prot(page, kmap_prot); + return __kmap_local_page_prot(page, kmap_prot); } static inline void *kmap_atomic_pfn(unsigned long pfn) @@ -184,11 +179,6 @@ static inline void *kmap_atomic(struct p return page_address(page); } -static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) -{ - return kmap_atomic(page); -} - static inline void *kmap_atomic_pfn(unsigned long pfn) { return kmap_atomic(pfn_to_page(pfn)); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[patch 0/7] drm, highmem: Cleanup io/kmap_atomic*() usage
None of the DRM usage sites of temporary mappings requires the side effects of io/kmap_atomic(), i.e. preemption and pagefault disable. Replace them with the io/kmap_local() variants, simplify the copy_to/from_user() error handling and remove the atomic variants. Thanks, tglx --- Documentation/driver-api/io-mapping.rst | 22 +++--- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |7 +-- drivers/gpu/drm/i915/i915_gem.c | 40 ++- drivers/gpu/drm/i915/selftests/i915_gem.c |4 - drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |8 +-- drivers/gpu/drm/nouveau/nvkm/subdev/devinit/fbmem.h |8 +-- drivers/gpu/drm/qxl/qxl_image.c | 18 drivers/gpu/drm/qxl/qxl_ioctl.c | 27 ++-- drivers/gpu/drm/qxl/qxl_object.c| 12 ++--- drivers/gpu/drm/qxl/qxl_object.h|4 - drivers/gpu/drm/qxl/qxl_release.c |4 - drivers/gpu/drm/ttm/ttm_bo_util.c | 20 + drivers/gpu/drm/vmwgfx/vmwgfx_blit.c| 30 +- include/linux/highmem-internal.h| 14 -- include/linux/io-mapping.h | 42 15 files changed, 93 insertions(+), 167 deletions(-) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On Thu, 4 Mar 2021 16:24:16 +0800 Jason Wang wrote: > On 2021/3/3 4:29 下午, Cornelia Huck wrote: > > On Wed, 3 Mar 2021 12:01:01 +0800 > > Jason Wang wrote: > > > >> On 2021/3/2 8:08 下午, Cornelia Huck wrote: > >>> On Mon, 1 Mar 2021 11:51:08 +0800 > >>> Jason Wang wrote: > >>> > On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote: > > On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote: > >> On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote: > >>> Confused. What is wrong with the above? It never reads the > >>> field unless the feature has been offered by device. > >> So the spec said: > >> > >> " > >> > >> The following driver-read-only field, max_virtqueue_pairs only exists > >> if > >> VIRTIO_NET_F_MQ is set. > >> > >> " > >> > >> If I read this correctly, there will be no max_virtqueue_pairs field > >> if the > >> VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() > >> violates > >> what spec said. > >> > >> Thanks > > I think that's a misunderstanding. This text was never intended to > > imply that field offsets change beased on feature bits. > > We had this pain with legacy and we never wanted to go back there. > > > > This merely implies that without VIRTIO_NET_F_MQ the field > > should not be accessed. Exists in the sense "is accessible to driver". > > > > Let's just clarify that in the spec, job done. > Ok, agree. That will make things more eaiser. > >>> Yes, that makes much more sense. > >>> > >>> What about adding the following to the "Basic Facilities of a Virtio > >>> Device/Device Configuration Space" section of the spec: > >>> > >>> "If an optional configuration field does not exist, the corresponding > >>> space is still present, but reserved." > >> > >> This became interesting after re-reading some of the qemu codes. > >> > >> E.g in virtio-net.c we had: > >> > >> *static VirtIOFeature feature_sizes[] = { > >> {.flags = 1ULL << VIRTIO_NET_F_MAC, > >> .end = endof(struct virtio_net_config, mac)}, > >> {.flags = 1ULL << VIRTIO_NET_F_STATUS, > >> .end = endof(struct virtio_net_config, status)}, > >> {.flags = 1ULL << VIRTIO_NET_F_MQ, > >> .end = endof(struct virtio_net_config, max_virtqueue_pairs)}, > >> {.flags = 1ULL << VIRTIO_NET_F_MTU, > >> .end = endof(struct virtio_net_config, mtu)}, > >> {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX, > >> .end = endof(struct virtio_net_config, duplex)}, > >> {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL << > >> VIRTIO_NET_F_HASH_REPORT), > >> .end = endof(struct virtio_net_config, supported_hash_types)}, > >> {} > >> };* > >> > >> *It has a implict dependency chain. E.g MTU doesn't presnet if > >> DUPLEX/RSS is not offered ... > >> * > > But I think it covers everything up to the relevant field, no? So MTU > > is included if we have the feature bit, even if we don't have > > DUPLEX/RSS. > > > > Given that a config space may be shorter (but must not collapse > > non-existing fields), maybe a better wording would be: > > > > "If an optional configuration field does not exist, the corresponding > > space will still be present if it is not at the end of the > > configuration space (i.e., further configuration fields exist.) > > > This should work but I think we need to define the end of configuration > space first? What about sidestepping this: "...the corresponding space will still be present, unless no further configuration fields exist." ? > > > This > > implies that a given field, if it exists, is always at the same offset > > from the beginning of the configuration space." ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH RFC 0/2] virtio-ccw: allow to disable legacy virtio
Unlike virtio-pci, virtio-ccw is currently always a transitional driver (i.e. it always includes support for legacy devices.) The differences between legacy and virtio-1+ virtio-ccw devices are not that big (the most interesting things are in common virtio code anyway.) It might be beneficial to make support for legacy virtio generally configurable, in case we want to remove it completely in a future where we all have flying cars. As a prereq, we need to make it configurable for virtio-ccw. Patch 1 introduces a parameter; now that I look at it, it's probably not that useful (not even for testing), so I'm inclined to drop it again. Patch 2 adds a new config symbol for generic legacy virtio support, which currently does not do anything but being selected by the legacy options for virtio-pci and virtio-ccw. A virtio-ccw driver without legacy support will require a revision of 1 or higher to be supported by the device. A virtio-ccw driver with legacy turned off works well for me with transitional devices and fails onlining gracefully for legacy devices (max_revision=0 in QEMU). (I also have some code that allows to make devices non-transitional in QEMU, but I haven't yet found time to polish the patches.) Cornelia Huck (2): virtio/s390: add parameter for minimum revision virtio/s390: make legacy support configurable arch/s390/Kconfig | 11 ++ drivers/s390/virtio/Makefile| 1 + drivers/s390/virtio/virtio_ccw.c| 179 drivers/s390/virtio/virtio_ccw_common.h | 113 +++ drivers/s390/virtio/virtio_ccw_legacy.c | 138 ++ drivers/virtio/Kconfig | 8 ++ 6 files changed, 330 insertions(+), 120 deletions(-) create mode 100644 drivers/s390/virtio/virtio_ccw_common.h create mode 100644 drivers/s390/virtio/virtio_ccw_legacy.c base-commit: cf6acb8bdb1d829b85a4daa2944bf9e71c93f4b9 -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH RFC 2/2] virtio/s390: make legacy support configurable
We may want to remove legacy virtio support in the future. In order to prepare virtio-ccw for that, add an option to disable legacy support there. This means raising the minimum transport revision to 1. Signed-off-by: Cornelia Huck --- arch/s390/Kconfig | 11 ++ drivers/s390/virtio/Makefile| 1 + drivers/s390/virtio/virtio_ccw.c| 160 ++-- drivers/s390/virtio/virtio_ccw_common.h | 113 + drivers/s390/virtio/virtio_ccw_legacy.c | 138 drivers/virtio/Kconfig | 8 ++ 6 files changed, 312 insertions(+), 119 deletions(-) create mode 100644 drivers/s390/virtio/virtio_ccw_common.h create mode 100644 drivers/s390/virtio/virtio_ccw_legacy.c diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index e8f7216f6c63..b2743dcf5b36 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -941,6 +941,17 @@ config S390_GUEST Select this option if you want to run the kernel as a guest under the KVM hypervisor. +config VIRTIO_CCW_LEGACY + def_bool y + prompt "Support for legacy virtio-ccw devices" + depends on S390_GUEST + select VIRTIO_LEGACY + help + Enabling this option adds support for virtio-ccw paravirtual device + drivers with legacy support, i.e. it enables transitional a + transitional virtio-ccw driver. + + If unsure, say Y. endmenu menu "Selftests" diff --git a/drivers/s390/virtio/Makefile b/drivers/s390/virtio/Makefile index 2dc4d9aab634..96dd68411d64 100644 --- a/drivers/s390/virtio/Makefile +++ b/drivers/s390/virtio/Makefile @@ -4,3 +4,4 @@ # Copyright IBM Corp. 2008 obj-$(CONFIG_S390_GUEST) += virtio_ccw.o +obj-$(CONFIG_VIRTIO_CCW_LEGACY) += virtio_ccw_legacy.o diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index 0d3971dbc109..32234b6b9074 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -11,11 +11,8 @@ #include #include #include -#include -#include #include #include -#include #include #include #include @@ -30,16 +27,16 @@ #include #include #include -#include #include #include +#include "virtio_ccw_common.h" /* * Provide a knob to turn off support for older revisions. This is useful * if we want to act as a non-transitional virtio device driver: requiring * a minimum revision of 1 turns off support for legacy devices. */ -static int min_revision; +int min_revision = VIRTIO_CCW_REV_MIN; module_param(min_revision, int, 0444); MODULE_PARM_DESC(min_revision, "minimum transport revision to accept"); @@ -53,9 +50,6 @@ struct vq_config_block { __u16 num; } __packed; -#define VIRTIO_CCW_CONFIG_SIZE 0x100 -/* same as PCI config space size, should be enough for all drivers */ - struct vcdev_dma_area { unsigned long indicators; unsigned long indicators2; @@ -63,25 +57,6 @@ struct vcdev_dma_area { __u8 status; }; -struct virtio_ccw_device { - struct virtio_device vdev; - __u8 config[VIRTIO_CCW_CONFIG_SIZE]; - struct ccw_device *cdev; - __u32 curr_io; - int err; - unsigned int revision; /* Transport revision */ - wait_queue_head_t wait_q; - spinlock_t lock; - struct mutex io_lock; /* Serializes I/O requests */ - struct list_head virtqueues; - bool is_thinint; - bool going_away; - bool device_lost; - unsigned int config_ready; - void *airq_info; - struct vcdev_dma_area *dma_area; -}; - static inline unsigned long *indicators(struct virtio_ccw_device *vcdev) { return &vcdev->dma_area->indicators; @@ -92,13 +67,6 @@ static inline unsigned long *indicators2(struct virtio_ccw_device *vcdev) return &vcdev->dma_area->indicators2; } -struct vq_info_block_legacy { - __u64 queue; - __u32 align; - __u16 index; - __u16 num; -} __packed; - struct vq_info_block { __u64 desc; __u32 res0; @@ -129,18 +97,6 @@ struct virtio_rev_info { /* the highest virtio-ccw revision we support */ #define VIRTIO_CCW_REV_MAX 2 -struct virtio_ccw_vq_info { - struct virtqueue *vq; - int num; - union { - struct vq_info_block s; - struct vq_info_block_legacy l; - } *info_block; - int bit_nr; - struct list_head node; - long cookie; -}; - #define VIRTIO_AIRQ_ISC IO_SCH_ISC /* inherit from subchannel */ #define VIRTIO_IV_BITS (L1_CACHE_BYTES * 8) @@ -164,40 +120,6 @@ static inline u8 *get_summary_indicator(struct airq_info *info) return summary_indicators + info->summary_indicator_idx; } -#define CCW_CMD_SET_VQ 0x13 -#define CCW_CMD_VDEV_RESET 0x33 -#define CCW_CMD_SET_IND 0x43 -#define CCW_CMD_SET_CONF_IND 0x53 -#define CCW_CMD_READ_FEAT 0x12 -#define CCW_CMD_WRITE_FEAT 0x11 -#define CCW_CMD_READ_CONF 0x22 -#define CCW_CMD_WRITE_CONF 0x21 -#define CCW
[PATCH RFC 1/2] virtio/s390: add parameter for minimum revision
We use transport revisions in virtio-ccw for introducing new commands etc.; revision 1 denotes operating according to the standard. Legacy devices do not understand the command to set a revision; for those, we presume to operate at revision 0. Add a parameter min_revision to be able to actively restrict use of old transport revisions. In particular, setting a minimum revision of 1 makes our driver act as a non-transitional driver. With the default min_revision of 0, we continue to act as a transitional driver. Signed-off-by: Cornelia Huck --- drivers/s390/virtio/virtio_ccw.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index 54e686dca6de..0d3971dbc109 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -34,6 +34,16 @@ #include #include +/* + * Provide a knob to turn off support for older revisions. This is useful + * if we want to act as a non-transitional virtio device driver: requiring + * a minimum revision of 1 turns off support for legacy devices. + */ +static int min_revision; + +module_param(min_revision, int, 0444); +MODULE_PARM_DESC(min_revision, "minimum transport revision to accept"); + /* * virtio related functions */ @@ -1271,7 +1281,10 @@ static int virtio_ccw_set_transport_rev(struct virtio_ccw_device *vcdev) else vcdev->revision--; } - } while (ret == -EOPNOTSUPP); + } while (vcdev->revision >= min_revision && ret == -EOPNOTSUPP); + + if (ret == -EOPNOTSUPP && vcdev->revision < min_revision) + ret = -EINVAL; ccw_device_dma_free(vcdev->cdev, ccw, sizeof(*ccw)); ccw_device_dma_free(vcdev->cdev, rev, sizeof(*rev)); @@ -1315,8 +1328,12 @@ static int virtio_ccw_online(struct ccw_device *cdev) vcdev->vdev.id.device = cdev->id.cu_model; ret = virtio_ccw_set_transport_rev(vcdev); - if (ret) + if (ret) { + dev_warn(&cdev->dev, +"Could not set a supported transport revision: %d\n", +ret); goto out_free; + } ret = register_virtio_device(&vcdev->vdev); if (ret) { -- 2.26.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: cleanup unused or almost unused IOMMU APIs and the FSL PAMU driver
On Mon, Mar 01, 2021 at 09:42:40AM +0100, Christoph Hellwig wrote: > Diffstat: > arch/powerpc/include/asm/fsl_pamu_stash.h | 12 > drivers/gpu/drm/msm/adreno/adreno_gpu.c |2 > drivers/iommu/amd/iommu.c | 23 > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 85 --- > drivers/iommu/arm/arm-smmu/arm-smmu.c | 122 +--- > drivers/iommu/dma-iommu.c |8 > drivers/iommu/fsl_pamu.c| 264 -- > drivers/iommu/fsl_pamu.h| 10 > drivers/iommu/fsl_pamu_domain.c | 694 > ++-- > drivers/iommu/fsl_pamu_domain.h | 46 - > drivers/iommu/intel/iommu.c | 55 -- > drivers/iommu/iommu.c | 75 --- > drivers/soc/fsl/qbman/qman_portal.c | 56 -- > drivers/vfio/vfio_iommu_type1.c | 31 - > drivers/vhost/vdpa.c| 10 > include/linux/iommu.h | 81 --- > 16 files changed, 214 insertions(+), 1360 deletions(-) Nice cleanup, thanks. The fsl_pamu driver and interface has always been a little bit of an alien compared to other IOMMU drivers. I am inclined to merge this after -rc3 is out, given some reviews. Can you also please add changelogs to the last three patches? Thanks, Joerg ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] qxl: Fix uninitialised struct field head.surface_id
From: Colin Ian King The surface_id struct field in head is not being initialized and static analysis warns that this is being passed through to dev->monitors_config->heads[i] on an assignment. Clear up this warning by initializing it to zero. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: a6d3c4d79822 ("qxl: hook monitors_config updates into crtc, not encoder.") Signed-off-by: Colin Ian King --- drivers/gpu/drm/qxl/qxl_display.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 012bce0cdb65..10738e04c09b 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -328,6 +328,7 @@ static void qxl_crtc_update_monitors_config(struct drm_crtc *crtc, head.id = i; head.flags = 0; + head.surface_id = 0; oldcount = qdev->monitors_config->count; if (crtc->state->active) { struct drm_display_mode *mode = &crtc->mode; -- 2.30.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6] i2c: virtio: add a virtio i2c frontend driver
On 2021/3/4 9:59 上午, Jie Deng wrote: Add an I2C bus driver for virtio para-virtualization. The controller can be emulated by the backend driver in any device model software by following the virtio protocol. The device specification can be found on https://lists.oasis-open.org/archives/virtio-comment/202101/msg8.html. By following the specification, people may implement different backend drivers to emulate different controllers according to their needs. Co-developed-by: Conghui Chen Signed-off-by: Conghui Chen Signed-off-by: Jie Deng --- drivers/i2c/busses/Kconfig | 11 ++ drivers/i2c/busses/Makefile | 3 + drivers/i2c/busses/i2c-virtio.c | 289 include/uapi/linux/virtio_i2c.h | 42 ++ include/uapi/linux/virtio_ids.h | 1 + 5 files changed, 346 insertions(+) create mode 100644 drivers/i2c/busses/i2c-virtio.c create mode 100644 include/uapi/linux/virtio_i2c.h diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index 05ebf75..0860395 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -21,6 +21,17 @@ config I2C_ALI1535 This driver can also be built as a module. If so, the module will be called i2c-ali1535. +config I2C_VIRTIO + tristate "Virtio I2C Adapter" + depends on VIRTIO Please use select here. There's no prompt for VIRTIO. + help + If you say yes to this option, support will be included for the virtio + I2C adapter driver. The hardware can be emulated by any device model + software according to the virtio protocol. + + This driver can also be built as a module. If so, the module + will be called i2c-virtio. + config I2C_ALI1563 tristate "ALI 1563" depends on PCI diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile index 615f35e..b88da08 100644 --- a/drivers/i2c/busses/Makefile +++ b/drivers/i2c/busses/Makefile @@ -6,6 +6,9 @@ # ACPI drivers obj-$(CONFIG_I2C_SCMI)+= i2c-scmi.o +# VIRTIO I2C host controller driver +obj-$(CONFIG_I2C_VIRTIO) += i2c-virtio.o + # PC SMBus host controller drivers obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c new file mode 100644 index 000..98c0fcc --- /dev/null +++ b/drivers/i2c/busses/i2c-virtio.c @@ -0,0 +1,289 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Virtio I2C Bus Driver + * + * The Virtio I2C Specification: + * https://raw.githubusercontent.com/oasis-tcs/virtio-spec/master/virtio-i2c.tex + * + * Copyright (c) 2021 Intel Corporation. All rights reserved. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +/** + * struct virtio_i2c - virtio I2C data + * @vdev: virtio device for this controller + * @completion: completion of virtio I2C message + * @adap: I2C adapter for this controller + * @i2c_lock: lock for virtqueue processing + * @vq: the virtio virtqueue for communication + */ +struct virtio_i2c { + struct virtio_device *vdev; + struct completion completion; + struct i2c_adapter adap; + struct mutex i2c_lock; + struct virtqueue *vq; +}; + +/** + * struct virtio_i2c_req - the virtio I2C request structure + * @out_hdr: the OUT header of the virtio I2C message + * @buf: the buffer into which data is read, or from which it's written + * @in_hdr: the IN header of the virtio I2C message + */ +struct virtio_i2c_req { + struct virtio_i2c_out_hdr out_hdr; + u8 *buf; + struct virtio_i2c_in_hdr in_hdr; +}; + +static void virtio_i2c_msg_done(struct virtqueue *vq) +{ + struct virtio_i2c *vi = vq->vdev->priv; + + complete(&vi->completion); +} + +static int virtio_i2c_send_reqs(struct virtqueue *vq, + struct virtio_i2c_req *reqs, + struct i2c_msg *msgs, int nr) +{ + struct scatterlist *sgs[3], out_hdr, msg_buf, in_hdr; + int i, outcnt, incnt, err = 0; + u8 *buf; + + for (i = 0; i < nr; i++) { + if (!msgs[i].len) + break; + + /* Only 7-bit mode supported for this moment. For the address format, +* Please check the Virtio I2C Specification. +*/ + reqs[i].out_hdr.addr = cpu_to_le16(msgs[i].addr << 1); + + if (i != nr - 1) + reqs[i].out_hdr.flags |= VIRTIO_I2C_FLAGS_FAIL_NEXT; + + outcnt = incnt = 0; + sg_init_one(&out_hdr, &reqs[i].out_hdr, sizeof(reqs[i].out_hdr)); + sgs[outcnt++] = &out_hdr; + + buf = kzalloc(msgs[i].len, GFP_KERNEL); + if (!buf) + break; + + reqs[i].buf = buf; +
Re: [RFC PATCH 01/10] vdpa: add get_config_size callback in vdpa_config_ops
On 2021/3/2 10:15 下午, Stefano Garzarella wrote: On Tue, Mar 02, 2021 at 12:14:13PM +0800, Jason Wang wrote: On 2021/2/16 5:44 下午, Stefano Garzarella wrote: This new callback is used to get the size of the configuration space of vDPA devices. Signed-off-by: Stefano Garzarella --- include/linux/vdpa.h | 4 drivers/vdpa/ifcvf/ifcvf_main.c | 6 ++ drivers/vdpa/mlx5/net/mlx5_vnet.c | 6 ++ drivers/vdpa/vdpa_sim/vdpa_sim.c | 9 + 4 files changed, 25 insertions(+) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 4ab5494503a8..fddf42b17573 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -150,6 +150,9 @@ struct vdpa_iova_range { * @set_status: Set the device status * @vdev: vdpa device * @status: virtio device status + * @get_config_size: Get the size of the configuration space + * @vdev: vdpa device + * Returns size_t: configuration size Rethink about this, how much we could gain by introducing a dedicated ops here? E.g would it be simpler if we simply introduce a max_config_size to vdpa device? Mainly because in this way we don't have to add new parameters to the vdpa_alloc_device() function. We do the same for example for 'get_device_id', 'get_vendor_id', 'get_vq_num_max'. All of these are usually static, but we have ops. I think because it's easier to extend. I don't know if it's worth adding a new structure for these static values at this point, like 'struct vdpa_config_params'. Yes, that's the point. I think for any static values, it should be set during device allocation. I'm fine with both. Thanks Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH 10/10] vhost/vdpa: return configuration bytes read and written to user space
On 2021/3/2 10:06 下午, Stefano Garzarella wrote: On Tue, Mar 02, 2021 at 12:05:35PM +0800, Jason Wang wrote: On 2021/2/16 5:44 下午, Stefano Garzarella wrote: vdpa_get_config() and vdpa_set_config() now return the amount of bytes read and written, so let's return them to the user space. We also modify vhost_vdpa_config_validate() to return 0 (bytes read or written) instead of an error, when the buffer length is 0. Signed-off-by: Stefano Garzarella --- drivers/vhost/vdpa.c | 26 +++--- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 21eea2be5afa..b754c53171a7 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -191,9 +191,6 @@ static ssize_t vhost_vdpa_config_validate(struct vhost_vdpa *v, struct vdpa_device *vdpa = v->vdpa; u32 size = vdpa->config->get_config_size(vdpa); - if (c->len == 0) - return -EINVAL; - return min(c->len, size); } @@ -204,6 +201,7 @@ static long vhost_vdpa_get_config(struct vhost_vdpa *v, struct vhost_vdpa_config config; unsigned long size = offsetof(struct vhost_vdpa_config, buf); ssize_t config_size; + long ret; u8 *buf; if (copy_from_user(&config, c, size)) @@ -217,15 +215,18 @@ static long vhost_vdpa_get_config(struct vhost_vdpa *v, if (!buf) return -ENOMEM; - vdpa_get_config(vdpa, config.off, buf, config_size); - - if (copy_to_user(c->buf, buf, config_size)) { - kvfree(buf); - return -EFAULT; + ret = vdpa_get_config(vdpa, config.off, buf, config_size); + if (ret < 0) { + ret = -EFAULT; + goto out; } + if (copy_to_user(c->buf, buf, config_size)) + ret = -EFAULT; + +out: kvfree(buf); - return 0; + return ret; } static long vhost_vdpa_set_config(struct vhost_vdpa *v, @@ -235,6 +236,7 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v, struct vhost_vdpa_config config; unsigned long size = offsetof(struct vhost_vdpa_config, buf); ssize_t config_size; + long ret; u8 *buf; if (copy_from_user(&config, c, size)) @@ -248,10 +250,12 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v, if (IS_ERR(buf)) return PTR_ERR(buf); - vdpa_set_config(vdpa, config.off, buf, config_size); + ret = vdpa_set_config(vdpa, config.off, buf, config_size); + if (ret < 0) + ret = -EFAULT; kvfree(buf); - return 0; + return ret; } So I wonder whether it's worth to return the number of bytes since we can't propogate the result to driver or driver doesn't care about that. Okay, but IIUC user space application that issue VHOST_VDPA_GET_CONFIG ioctl can use the return value. Yes, but it looks to it's too late to change since it's a userspace noticble behaviour. Should we change also 'struct virtio_config_ops' to propagate this value also to virtio drivers? I think not, the reason is the driver doesn't expect the get()/set() can fail... Thanks Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] Re: [PATCH] vdpa/mlx5: set_features should allow reset to zero
On 2021/3/3 4:29 下午, Cornelia Huck wrote: On Wed, 3 Mar 2021 12:01:01 +0800 Jason Wang wrote: On 2021/3/2 8:08 下午, Cornelia Huck wrote: On Mon, 1 Mar 2021 11:51:08 +0800 Jason Wang wrote: On 2021/3/1 5:25 上午, Michael S. Tsirkin wrote: On Fri, Feb 26, 2021 at 04:19:16PM +0800, Jason Wang wrote: On 2021/2/26 2:53 上午, Michael S. Tsirkin wrote: Confused. What is wrong with the above? It never reads the field unless the feature has been offered by device. So the spec said: " The following driver-read-only field, max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ is set. " If I read this correctly, there will be no max_virtqueue_pairs field if the VIRTIO_NET_F_MQ is not offered by device? If yes the offsetof() violates what spec said. Thanks I think that's a misunderstanding. This text was never intended to imply that field offsets change beased on feature bits. We had this pain with legacy and we never wanted to go back there. This merely implies that without VIRTIO_NET_F_MQ the field should not be accessed. Exists in the sense "is accessible to driver". Let's just clarify that in the spec, job done. Ok, agree. That will make things more eaiser. Yes, that makes much more sense. What about adding the following to the "Basic Facilities of a Virtio Device/Device Configuration Space" section of the spec: "If an optional configuration field does not exist, the corresponding space is still present, but reserved." This became interesting after re-reading some of the qemu codes. E.g in virtio-net.c we had: *static VirtIOFeature feature_sizes[] = { {.flags = 1ULL << VIRTIO_NET_F_MAC, .end = endof(struct virtio_net_config, mac)}, {.flags = 1ULL << VIRTIO_NET_F_STATUS, .end = endof(struct virtio_net_config, status)}, {.flags = 1ULL << VIRTIO_NET_F_MQ, .end = endof(struct virtio_net_config, max_virtqueue_pairs)}, {.flags = 1ULL << VIRTIO_NET_F_MTU, .end = endof(struct virtio_net_config, mtu)}, {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX, .end = endof(struct virtio_net_config, duplex)}, {.flags = (1ULL << VIRTIO_NET_F_RSS) | (1ULL << VIRTIO_NET_F_HASH_REPORT), .end = endof(struct virtio_net_config, supported_hash_types)}, {} };* *It has a implict dependency chain. E.g MTU doesn't presnet if DUPLEX/RSS is not offered ... * But I think it covers everything up to the relevant field, no? So MTU is included if we have the feature bit, even if we don't have DUPLEX/RSS. Given that a config space may be shorter (but must not collapse non-existing fields), maybe a better wording would be: "If an optional configuration field does not exist, the corresponding space will still be present if it is not at the end of the configuration space (i.e., further configuration fields exist.) This should work but I think we need to define the end of configuration space first? This implies that a given field, if it exists, is always at the same offset from the beginning of the configuration space." (Do we need to specify what a device needs to do if the driver tries to access a non-existing field? We cannot really make assumptions about config space accesses; virtio-ccw can just copy a chunk of config space that contains non-existing fields... Right, it looks to me ccw doesn't depends on config len which is kind of interesting. I think the answer depends on the implementation of both transport and device: (virtio-ccw is a bit odd, because channel I/O does not have the concept of a config space and I needed to come up with something) Ok. Let's take virtio-net-pci as an example, it fills status unconditionally in virtio_net_set_config() even if VIRTIO_NET_F_STATUS is not negotiated. So with the above feature_sizes: 1) if one of the MQ, MTU, DUPLEX or RSS is implemented, driver can still read status in this case 2) if none of the above four is negotied, driver can only read a 0xFF (virtio_config_readb()) I guess the device could ignore writes and return zeroes on read?) So consider the above, it might be too late to fix/clarify that in the spec on the expected behaviour of reading/writing non-exist fields. We could still advise behaviour via SHOULD, though I'm not sure it adds much at this point in time. It really feels a bit odd that a driver can still read and write fields for features that it did not negotiate, but I fear we're stuck with it. Yes, since the device (at least virtio-pci) implment thing like this. Thanks Thanks I've opened https://github.com/oasis-tcs/virtio-spec/issues/98 for the spec issues. - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virt