Re: [External] Re: [RFC PATCH] virtio-vsock: add description for datagram type
Hi Michael and Stefan, I thought about this and discussed it with my colleague Cong Wang. One idea is to make current asynchronous send_pkt flow to be synchronous, then if the virtqueue is full, the function can return ENOMEM all the way back to the caller and the caller can check the return value of sendmsg and slow down when necessary. In the spec, we can put something like, if the virtqueue is full, the caller should be notified with an error etc. In terms of implementation, that means we will remove the current send_pkt_work for both stream and dgram sockets. Currently, the code path uses RCU and a work queue, then grab a mutex in the work queue function. Since we cannot grab mutex when in rcu critical section, we have to change RCU to a normal reference counting mechanism. I think this is doable. The drawback is that the reference counting in general spends a little more cycles than the RCU, so there is a small price to pay. Another option is to use Sleepable RCU and remove the work queue. What do you guys think? btw, I will also add some SENDBUF restrictions for the dgram sockets, but I don't think it needs to be in the spec. Regards, Jiang On Tue, Mar 23, 2021 at 1:53 AM Stefan Hajnoczi wrote: > > On Mon, Mar 22, 2021 at 07:23:14PM -0700, Jiang Wang . wrote: > > Got it. Will do. > > You could look at udp_sendmsg() to see how sockets compete when > transmitting to the same net device. > > I'm not very familiar with this but I guess that the qdisc (like > fq_codel) decides which packets to place into the device's tx queue. I > guess sk_buffs waiting to be put onto the device's tx queue are > accounted for against the socket's sndbuf. Further sendmsg calls will > fail with -ENOBUFS when the sndbuf limit is reached. > > It's not clear to me how much of the existing code can be reused since > vsock does not use sk_buff or netdev :(. > > Stefan ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-balloon: move release_pages_balloon() outside of mutex_unlock(>balloon_lock)
On 26.03.21 10:53, Liu Xiang wrote: Since pages have been deflated to a local list, there is no race between fill and leak. Signed-off-by: Liu Xiang --- drivers/virtio/virtio_balloon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 8985fc2ce..7da25b87f 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -303,8 +303,8 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num) */ if (vb->num_pfns != 0) tell_host(vb, vb->deflate_vq); - release_pages_balloon(vb, ); mutex_unlock(>balloon_lock); + release_pages_balloon(vb, ); return num_freed_pages; } I think this should be fine Reviewed-by: David Hildenbrand -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 08/11] vduse: Implement an MMU-based IOMMU driver
在 2021/3/26 下午2:56, Yongji Xie 写道: On Fri, Mar 26, 2021 at 2:16 PM Jason Wang wrote: 在 2021/3/26 下午1:14, Yongji Xie 写道: + } + map->bounce_page = page; + + /* paired with vduse_domain_map_page() */ + smp_mb(); So this is suspicious. It's better to explain like, we need make sure A must be done after B. OK. I see. It's used to protect this pattern: vduse_domain_alloc_bounce_page: vduse_domain_map_page: write map->bounce_page write map->orig_phys mb()mb() read map->orig_phys read map->bounce_page Make sure there will always be a path to do bouncing. Ok. And it looks to me the iotlb_lock is sufficnet to do the synchronization here. E.g any reason that you don't take it in vduse_domain_map_bounce_page(). Yes, we can. But the performance in multi-queue cases will go down if we use iotlb_lock on this critical path. And what's more, is there anyway to aovid holding the spinlock during bouncing? Looks like we can't. In the case that multiple page faults happen on the same page, we should make sure the bouncing is done before any page fault handler returns. So it looks to me all those extra complexitiy comes from the fact that the bounce_page and orig_phys are set by different places so we need to do the bouncing in two places. I wonder how much we can gain from the "lazy" boucning in page fault. The buffer mapped via dma_ops from virtio driver is expected to be accessed by the userspace soon. It looks to me we can do all those stuffs during dma_map() then things would be greatly simplified. If so, we need to allocate lots of pages from the pool reserved for atomic memory allocation requests. This should be fine, a lot of drivers tries to allocate pages in atomic context. The point is to simplify the codes to make it easy to determince the correctness so we can add optimization on top simply by benchmarking the difference. OK. I will use this way in the next version. E.g we have serveral places that accesses orig_phys: 1) map_page(), write 2) unmap_page(), write 3) page fault handler, read It's not clear to me how they were synchronized. Or if it was synchronzied implicitly (via iova allocator?), we'd better document it. Yes. Or simply use spinlock (which is the preferrable way I'd like to go). We probably don't need to worry too much about the cost of spinlock since iova allocater use it heavily. Actually iova allocator implements a per-CPU cache to optimize it. Thanks, Yongji Right, but have a quick glance, I guess what you meant is that usually there's no lock contention unless cpu hot-plug. This can work but the problem is that such synchornization depends on the internal implementation of IOVA allocator which is kind of fragile. I still think we should do that on our own. I might miss something. Looks like we don't need any synchronization if the page fault handler is removed as you suggested. We should not access the same orig_phys concurrently (in map_page() and unmap_page()) unless we free the iova before accessing. Thanks, Yongji You're right. I overestimate the complexitiy that is required by the synchronization. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 08/11] vduse: Implement an MMU-based IOMMU driver
在 2021/3/26 下午1:14, Yongji Xie 写道: + } + map->bounce_page = page; + + /* paired with vduse_domain_map_page() */ + smp_mb(); So this is suspicious. It's better to explain like, we need make sure A must be done after B. OK. I see. It's used to protect this pattern: vduse_domain_alloc_bounce_page: vduse_domain_map_page: write map->bounce_page write map->orig_phys mb()mb() read map->orig_phys read map->bounce_page Make sure there will always be a path to do bouncing. Ok. And it looks to me the iotlb_lock is sufficnet to do the synchronization here. E.g any reason that you don't take it in vduse_domain_map_bounce_page(). Yes, we can. But the performance in multi-queue cases will go down if we use iotlb_lock on this critical path. And what's more, is there anyway to aovid holding the spinlock during bouncing? Looks like we can't. In the case that multiple page faults happen on the same page, we should make sure the bouncing is done before any page fault handler returns. So it looks to me all those extra complexitiy comes from the fact that the bounce_page and orig_phys are set by different places so we need to do the bouncing in two places. I wonder how much we can gain from the "lazy" boucning in page fault. The buffer mapped via dma_ops from virtio driver is expected to be accessed by the userspace soon. It looks to me we can do all those stuffs during dma_map() then things would be greatly simplified. If so, we need to allocate lots of pages from the pool reserved for atomic memory allocation requests. This should be fine, a lot of drivers tries to allocate pages in atomic context. The point is to simplify the codes to make it easy to determince the correctness so we can add optimization on top simply by benchmarking the difference. OK. I will use this way in the next version. E.g we have serveral places that accesses orig_phys: 1) map_page(), write 2) unmap_page(), write 3) page fault handler, read It's not clear to me how they were synchronized. Or if it was synchronzied implicitly (via iova allocator?), we'd better document it. Yes. Or simply use spinlock (which is the preferrable way I'd like to go). We probably don't need to worry too much about the cost of spinlock since iova allocater use it heavily. Actually iova allocator implements a per-CPU cache to optimize it. Thanks, Yongji Right, but have a quick glance, I guess what you meant is that usually there's no lock contention unless cpu hot-plug. This can work but the problem is that such synchornization depends on the internal implementation of IOVA allocator which is kind of fragile. I still think we should do that on our own. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization