[virtio-dev] RE: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
> From: zhenwei pi > Sent: Thursday, April 27, 2023 4:21 AM > > On 4/25/23 13:03, Parav Pandit wrote: > > > > > [...] > > > > I briefly see your rdma command descriptor example, which is not > > aligned to 16B. Perf wise it will be poor than nvme rdma fabrics. > > > > Hi, > I'm confused here, could you please give me more hint? > 1, The size of command descriptor(I defined in example) is larger than > command size of nvme rdma, more overhead leads performance worse than > nvme over rdma. > Which structure? I am guessing from the header file that you have, virtio_of_command_vring followed by virtio_of_vring_desc[cmd.ndesc] where cmd.opcode = virtio_of_op_vring if so, it seems fine to me. However, the lack of actual command missing in the virtio_of_command_vring struct is not so good. Such indirection overheads only reduce the perf as it is not constant size data coming in for the blk storage target side. And even if it comes somehow, it requires two level protocol parsers. Can be simplified as you are not starting with any history here, abstraction point can be possibly virtio commands than the vring. I don’t see a need for desc to have id and flags the way its drafted over the rdma fabrics: What I had in mind as, struct virtio_of_descriptor { le64 addr; le32 len; union { le32 rdma_key; le32 id + reserved; le32 tcp_desc_id; }; We can possibly define appropriate virtio fabric descriptors; at that point, the abstraction point is not literally taking the vring across the fabric. Depending on use case may be starting with either one of TCP or RDMA makes sense, instead of cooking all at once. > 2, The command size not aligned to 16B leads performance issue on RDMA > SEND operation. My colleague Zhuo help me test the performance on sending > 16/24/32 bytes: > taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 16 -t 1 xx.xx.xx.xx taskset > -c 30 > ib_send_bw -d mlx5_2 -i 1 -x 3 -s 24 -t 1 xx.xx.xx.xx taskset -c 30 > ib_send_bw -d > mlx5_2 -i 1 -x 3 -s 32 -t 1 xx.xx.xx.xx The QPS seems almost same. > structure [1] generated subsequent vring_desc[] descriptors to unaligned 8B address results in partial writes of the desc. It is hard to say from ib_send_bw test what is being done. I remember mlx5 have cache aligned accesses, nop wqe segments and more. I also don’t see the 'id' field coming back in the response command_status. So why to transmit over the fabric which is not used. Did I miss the id in completion side? [1] https://github.com/pizhenwei/linux/blob/7a13b310d1338c462f8e0b13d39a571645bc4698/include/uapi/linux/virtio_of.h#L129
[virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
On 4/25/23 13:03, Parav Pandit wrote: [...] I briefly see your rdma command descriptor example, which is not aligned to 16B. Perf wise it will be poor than nvme rdma fabrics. Hi, I'm confused here, could you please give me more hint? 1, The size of command descriptor(I defined in example) is larger than command size of nvme rdma, more overhead leads performance worse than nvme over rdma. 2, The command size not aligned to 16B leads performance issue on RDMA SEND operation. My colleague Zhuo help me test the performance on sending 16/24/32 bytes: taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 16 -t 1 xx.xx.xx.xx taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 24 -t 1 xx.xx.xx.xx taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 32 -t 1 xx.xx.xx.xx The QPS seems almost same. For PCI transport for net, we intent to start the work to improve descriptors, the transport binding for net device. From our research I see that some abstract virtio descriptors are great today, but if you want to get best out of the system (sw, hw, cpu), such abstraction is not the best. Sharing of "id" all the way to target and bring back is an example of such inefficiency in your example. -- zhenwei pi - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
Re: [virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
On Tue, 25 Apr 2023 14:36:04 +0800, Jason Wang wrote: > On Mon, Apr 24, 2023 at 9:38 PM zhenwei pi wrote: > > > > > > > > On 4/24/23 11:40, Jason Wang wrote: > > > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi > > > wrote: > > >> > > >> Hi, > > >> > > >> In the past years, virtio supports lots of device specifications by > > >> PCI/MMIO/CCW. These devices work fine in the virtualization environment, > > >> and we have a chance to support virtio device family for the > > >> container/host scenario. > > > > > > PCI can work for containers for sure (or does it meet any issue like > > > scalability?). It's better to describe what problems you met and why > > > you choose this way to solve it. > > > > > > It's better to compare this with > > > > > > 1) hiding the fabrics details via DPU > > > 2) vDPA > > > > > Hi, > > > > Sorry, I missed this part. "Network defined peripheral devices of virtio > > family" is the main purpose of this proposal, > > This can be achieved by either DPU or vDPA. I agree this. So I didn't understand what the meaning of this realization. Although I am also very excited to this idea, this broaden the possibility of virtio. But, I still really want to know what the meaning of this idea is, better performance? Or can achieve some situations that we cannot achieved now. > I think the advantages is, > if we standardize this in the spec, it avoids vendor specific > protocol. Sorry, I dont got this. Thanks. > > > this allows us to use many > > types of remote resources which are provided by virtio target. > > > > From the point of my view, there are 3 cases: > > 1, Host/container scenario. For example, host kernel connects a virtio > > target block service, maps it as a vdx(virtio-blk) device(used by > > Map-Reduce service which needs a fast/large size disk). The host kernel > > also connects a virtio target crypto service, maps it as virtio crypto > > device(used by nginx to accelarate HTTPS). And so on. > > > > +--++--+ +--+ > > |Map-Reduce|| nginx | ... | processes| > > +--++--+ +--+ > > > > Host | | | > > Kernel +---+ +---+ +---+ > > | ext4 | | LKCF | | HWRNG | > > +---+ +---+ +---+ > > | | | > > +---+ +---+ +---+ > > | vdx | |vCrypto| | vRNG | > > +---+ +---+ +---+ > > | | | > > | ++ | > > +-->|TCP/RDMA|<+ > > ++ > > | > > +--+ > > |NIC/IB| > > +--+ > > | +-+ > > +->|virtio target| > > +-+ > > > > 2, Typical virtualization environment. The workloads run in a guest, and > > QEMU handles virtio-pci(or MMIO), and forwards requests to target. > > +--++--+ +--+ > > |Map-Reduce|| nginx | ... | processes| > > +--++--+ +--+ > > > > Guest| | | > > Kernel +---+ +---+ +---+ > > | ext4 | | LKCF | | HWRNG | > > +---+ +---+ +---+ > > | | | > > +---+ +---+ +---+ > > | vdx | |vCrypto| | vRNG | > > +---+ +---+ +---+ > > | | | > > PCI > > | > > QEMU +--+ > > |virtio backend| > > +--+ > > | > > +--+ > > |NIC/IB| > > +--+ > > | +-+ > > +->|virtio target| > > +-+ > > > > So it's the job of the Qemu to do the translation from virtqueue to packet > here? > > > 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci > > request to virtio-of request by hardware, and forward
[virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
On 4/25/23 21:55, Stefan Hajnoczi wrote: On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote: On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi wrote: "Virtio Over Fabrics" aims at "reuse virtio device specifications", and provides network defined peripheral devices. And this protocol also could be used in virtualization environment, typically hypervisor(or vhost-user process) handles request from virtio PCI/MMIO/CCW, remaps request and forwards to target by fabrics. This requires meditation in the datapath, isn't it? - Protocol The detail protocol definition see: https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h I'd say a RFC patch for virtio spec is more suitable than the codes. VIRTIO over TCP has long been anticipated but so far no one posted an implementation. There are probably mentions of it from 10+ years ago. I'm excited to see this! Both the VIRTIO spec and the Linux drivers provide an abstraction that allows fabrics (e.g. TCP) to fit in as a VIRTIO Transport. vrings are not the only way to implement virtqueues. Many VIRTIO devices will work fine over a message passing transport like TCP. A few devices like the balloon device may not make sense. Shared Memory Regions won't work. Fully agree. Please define VIRTIO over Fabrics as a Transport in the VIRTIO spec so that the integration with the VIRTIO device model is seamless. I look forward to discussing spec patches. Stefan Thanks, I'm working on it. -- zhenwei pi - To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
[virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
On Mon, Apr 24, 2023 at 9:38 PM zhenwei pi wrote: > > > > On 4/24/23 11:40, Jason Wang wrote: > > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi wrote: > >> > >> Hi, > >> > >> In the past years, virtio supports lots of device specifications by > >> PCI/MMIO/CCW. These devices work fine in the virtualization environment, > >> and we have a chance to support virtio device family for the > >> container/host scenario. > > > > PCI can work for containers for sure (or does it meet any issue like > > scalability?). It's better to describe what problems you met and why > > you choose this way to solve it. > > > > It's better to compare this with > > > > 1) hiding the fabrics details via DPU > > 2) vDPA > > > Hi, > > Sorry, I missed this part. "Network defined peripheral devices of virtio > family" is the main purpose of this proposal, This can be achieved by either DPU or vDPA. I think the advantages is, if we standardize this in the spec, it avoids vendor specific protocol. > this allows us to use many > types of remote resources which are provided by virtio target. > > From the point of my view, there are 3 cases: > 1, Host/container scenario. For example, host kernel connects a virtio > target block service, maps it as a vdx(virtio-blk) device(used by > Map-Reduce service which needs a fast/large size disk). The host kernel > also connects a virtio target crypto service, maps it as virtio crypto > device(used by nginx to accelarate HTTPS). And so on. > > +--++--+ +--+ > |Map-Reduce|| nginx | ... | processes| > +--++--+ +--+ > > Host | | | > Kernel +---+ +---+ +---+ > | ext4 | | LKCF | | HWRNG | > +---+ +---+ +---+ > | | | > +---+ +---+ +---+ > | vdx | |vCrypto| | vRNG | > +---+ +---+ +---+ > | | | > | ++ | > +-->|TCP/RDMA|<+ > ++ > | > +--+ > |NIC/IB| > +--+ > | +-+ > +->|virtio target| > +-+ > > 2, Typical virtualization environment. The workloads run in a guest, and > QEMU handles virtio-pci(or MMIO), and forwards requests to target. > +--++--+ +--+ > |Map-Reduce|| nginx | ... | processes| > +--++--+ +--+ > > Guest| | | > Kernel +---+ +---+ +---+ > | ext4 | | LKCF | | HWRNG | > +---+ +---+ +---+ > | | | > +---+ +---+ +---+ > | vdx | |vCrypto| | vRNG | > +---+ +---+ +---+ > | | | > PCI > | > QEMU +--+ > |virtio backend| > +--+ > | > +--+ > |NIC/IB| > +--+ > | +-+ > +->|virtio target| > +-+ > So it's the job of the Qemu to do the translation from virtqueue to packet here? > 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci > request to virtio-of request by hardware, and forward request to virtio > target directly. > +--++--+ +--+ > |Map-Reduce|| nginx | ... | processes| > +--++--+ +--+ > > Host | | | > Kernel +---+ +---+ +---+ > | ext4 | | LKCF | | HWRNG | > +---+ +---+ +---+ > | | | > +---+ +---+ +---+