[virtio-dev] RE: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)

2023-04-27 Thread Parav Pandit


> From: zhenwei pi 
> Sent: Thursday, April 27, 2023 4:21 AM
> 
> On 4/25/23 13:03, Parav Pandit wrote:
> >
> >
> [...]
> >
> > I briefly see your rdma command descriptor example, which is not
> > aligned to 16B. Perf wise it will be poor than nvme rdma fabrics.
> >
> 
> Hi,
> I'm confused here, could you please give me more hint?
> 1, The size of command descriptor(I defined in example) is larger than
> command size of nvme rdma, more overhead leads performance worse than
> nvme over rdma.
> 
Which structure?

I am guessing from the header file that you have,

virtio_of_command_vring
followed by
virtio_of_vring_desc[cmd.ndesc] where cmd.opcode = virtio_of_op_vring

if so, it seems fine to me.
However, the lack of actual command missing in the virtio_of_command_vring 
struct is not so good.
Such indirection overheads only reduce the perf as it is not constant size data 
coming in for the blk storage target side.
And even if it comes somehow, it requires two level protocol parsers. 
Can be simplified as you are not starting with any history here, abstraction 
point can be possibly virtio commands than the vring.

I don’t see a need for desc to have id and flags the way its drafted over the 
rdma fabrics:
What I had in mind as,
struct virtio_of_descriptor {
le64 addr;
le32 len;
union {
le32 rdma_key;
le32 id + reserved;
le32 tcp_desc_id;
};

We can possibly define appropriate virtio fabric descriptors; at that point, 
the abstraction point is not literally taking the vring across the fabric.

Depending on use case may be starting with either one of TCP or RDMA makes 
sense, instead of cooking all at once.

> 2, The command size not aligned to 16B leads performance issue on RDMA
> SEND operation. My colleague Zhuo help me test the performance on sending
> 16/24/32 bytes:
> taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 16 -t 1 xx.xx.xx.xx taskset 
> -c 30
> ib_send_bw -d mlx5_2 -i 1 -x 3 -s 24 -t 1 xx.xx.xx.xx taskset -c 30 
> ib_send_bw -d
> mlx5_2 -i 1 -x 3 -s 32 -t 1 xx.xx.xx.xx The QPS seems almost same.
> 
structure [1] generated subsequent vring_desc[] descriptors to unaligned 8B 
address results in partial writes of the desc.

It is hard to say from ib_send_bw test what is being done.
I remember mlx5 have cache aligned accesses, nop wqe segments and more.

I also don’t see the 'id' field coming back in the response command_status.
So why to transmit over the fabric which is not used.
Did I miss the id in completion side?

[1] 
https://github.com/pizhenwei/linux/blob/7a13b310d1338c462f8e0b13d39a571645bc4698/include/uapi/linux/virtio_of.h#L129


[virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)

2023-04-27 Thread zhenwei pi

On 4/25/23 13:03, Parav Pandit wrote:




[...]


I briefly see your rdma command descriptor example, which is not aligned 
to 16B. Perf wise it will be poor than nvme rdma fabrics.




Hi,
I'm confused here, could you please give me more hint?
1, The size of command descriptor(I defined in example) is larger than 
command size of nvme rdma, more overhead leads performance worse than 
nvme over rdma.


2, The command size not aligned to 16B leads performance issue on RDMA 
SEND operation. My colleague Zhuo help me test the performance on 
sending 16/24/32 bytes:

taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 16 -t 1 xx.xx.xx.xx
taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 24 -t 1 xx.xx.xx.xx
taskset -c 30 ib_send_bw -d mlx5_2 -i 1 -x 3 -s 32 -t 1 xx.xx.xx.xx
The QPS seems almost same.

For PCI transport for net, we intent to start the work to improve 
descriptors, the transport binding for net device. From our research I 
see that some abstract virtio descriptors are great today, but if you 
want to get best out of the system (sw, hw, cpu), such abstraction is 
not the best. Sharing of "id" all the way to target and bring back is an 
example of such inefficiency in your example.


--
zhenwei pi

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)

2023-04-26 Thread Xuan Zhuo
On Tue, 25 Apr 2023 14:36:04 +0800, Jason Wang  wrote:
> On Mon, Apr 24, 2023 at 9:38 PM zhenwei pi  wrote:
> >
> >
> >
> > On 4/24/23 11:40, Jason Wang wrote:
> > > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi  
> > > wrote:
> > >>
> > >> Hi,
> > >>
> > >> In the past years, virtio supports lots of device specifications by
> > >> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> > >> and we have a chance to support virtio device family for the
> > >> container/host scenario.
> > >
> > > PCI can work for containers for sure (or does it meet any issue like
> > > scalability?). It's better to describe what problems you met and why
> > > you choose this way to solve it.
> > >
> > > It's better to compare this with
> > >
> > > 1) hiding the fabrics details via DPU
> > > 2) vDPA
> > >
> > Hi,
> >
> > Sorry, I missed this part. "Network defined peripheral devices of virtio
> > family" is the main purpose of this proposal,
>
> This can be achieved by either DPU or vDPA.

I agree this.

So I didn't understand what the meaning of this realization. Although I am also
very excited to this idea, this broaden the possibility of virtio. But, I still
really want to know what the meaning of this idea is, better performance? Or
can achieve some situations that we cannot achieved now.

> I think the advantages is,
> if we standardize this in the spec, it avoids vendor specific
> protocol.


Sorry, I dont got this.

Thanks.

>
> > this allows us to use many
> > types of remote resources which are provided by virtio target.
> >
> >  From the point of my view, there are 3 cases:
> > 1, Host/container scenario. For example, host kernel connects a virtio
> > target block service, maps it as a vdx(virtio-blk) device(used by
> > Map-Reduce service which needs a fast/large size disk). The host kernel
> > also connects a virtio target crypto service, maps it as virtio crypto
> > device(used by nginx to accelarate HTTPS). And so on.
> >
> >  +--++--+   +--+
> >  |Map-Reduce||   nginx  |  ...  | processes|
> >  +--++--+   +--+
> > 
> > Host |   |  |
> > Kernel   +---+   +---+  +---+
> >   | ext4  |   | LKCF  |  | HWRNG |
> >   +---+   +---+  +---+
> >   |   |  |
> >   +---+   +---+  +---+
> >   |  vdx  |   |vCrypto|  | vRNG  |
> >   +---+   +---+  +---+
> >   |   |  |
> >   |   ++ |
> >   +-->|TCP/RDMA|<+
> >   ++
> >   |
> >   +--+
> >   |NIC/IB|
> >   +--+
> >   |  +-+
> >   +->|virtio target|
> >  +-+
> >
> > 2, Typical virtualization environment. The workloads run in a guest, and
> > QEMU handles virtio-pci(or MMIO), and forwards requests to target.
> >  +--++--+   +--+
> >  |Map-Reduce||   nginx  |  ...  | processes|
> >  +--++--+   +--+
> > 
> > Guest|   |  |
> > Kernel   +---+   +---+  +---+
> >   | ext4  |   | LKCF  |  | HWRNG |
> >   +---+   +---+  +---+
> >   |   |  |
> >   +---+   +---+  +---+
> >   |  vdx  |   |vCrypto|  | vRNG  |
> >   +---+   +---+  +---+
> >   |   |  |
> > PCI 
> >   |
> > QEMU +--+
> >   |virtio backend|
> >   +--+
> >   |
> >   +--+
> >   |NIC/IB|
> >   +--+
> >   |  +-+
> >   +->|virtio target|
> >  +-+
> >
>
> So it's the job of the Qemu to do the translation from virtqueue to packet 
> here?
>
> > 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci
> > request to virtio-of request by hardware, and forward 

[virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)

2023-04-25 Thread zhenwei pi




On 4/25/23 21:55, Stefan Hajnoczi wrote:

On Mon, Apr 24, 2023 at 11:40:02AM +0800, Jason Wang wrote:

On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi  wrote:

"Virtio Over Fabrics" aims at "reuse virtio device specifications", and
provides network defined peripheral devices.
And this protocol also could be used in virtualization environment,
typically hypervisor(or vhost-user process) handles request from virtio
PCI/MMIO/CCW, remaps request and forwards to target by fabrics.


This requires meditation in the datapath, isn't it?



- Protocol
The detail protocol definition see:
https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h


I'd say a RFC patch for virtio spec is more suitable than the codes.


VIRTIO over TCP has long been anticipated but so far no one posted an
implementation. There are probably mentions of it from 10+ years ago.
I'm excited to see this!

Both the VIRTIO spec and the Linux drivers provide an abstraction that
allows fabrics (e.g. TCP) to fit in as a VIRTIO Transport. vrings are
not the only way to implement virtqueues.

Many VIRTIO devices will work fine over a message passing transport like
TCP. A few devices like the balloon device may not make sense. Shared
Memory Regions won't work.



Fully agree.


Please define VIRTIO over Fabrics as a Transport in the VIRTIO spec so
that the integration with the VIRTIO device model is seamless. I look
forward to discussing spec patches.

Stefan


Thanks, I'm working on it.

--
zhenwei pi

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: Re: [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)

2023-04-25 Thread Jason Wang
On Mon, Apr 24, 2023 at 9:38 PM zhenwei pi  wrote:
>
>
>
> On 4/24/23 11:40, Jason Wang wrote:
> > On Sun, Apr 23, 2023 at 7:31 PM zhenwei pi  wrote:
> >>
> >> Hi,
> >>
> >> In the past years, virtio supports lots of device specifications by
> >> PCI/MMIO/CCW. These devices work fine in the virtualization environment,
> >> and we have a chance to support virtio device family for the
> >> container/host scenario.
> >
> > PCI can work for containers for sure (or does it meet any issue like
> > scalability?). It's better to describe what problems you met and why
> > you choose this way to solve it.
> >
> > It's better to compare this with
> >
> > 1) hiding the fabrics details via DPU
> > 2) vDPA
> >
> Hi,
>
> Sorry, I missed this part. "Network defined peripheral devices of virtio
> family" is the main purpose of this proposal,

This can be achieved by either DPU or vDPA. I think the advantages is,
if we standardize this in the spec, it avoids vendor specific
protocol.

> this allows us to use many
> types of remote resources which are provided by virtio target.
>
>  From the point of my view, there are 3 cases:
> 1, Host/container scenario. For example, host kernel connects a virtio
> target block service, maps it as a vdx(virtio-blk) device(used by
> Map-Reduce service which needs a fast/large size disk). The host kernel
> also connects a virtio target crypto service, maps it as virtio crypto
> device(used by nginx to accelarate HTTPS). And so on.
>
>  +--++--+   +--+
>  |Map-Reduce||   nginx  |  ...  | processes|
>  +--++--+   +--+
> 
> Host |   |  |
> Kernel   +---+   +---+  +---+
>   | ext4  |   | LKCF  |  | HWRNG |
>   +---+   +---+  +---+
>   |   |  |
>   +---+   +---+  +---+
>   |  vdx  |   |vCrypto|  | vRNG  |
>   +---+   +---+  +---+
>   |   |  |
>   |   ++ |
>   +-->|TCP/RDMA|<+
>   ++
>   |
>   +--+
>   |NIC/IB|
>   +--+
>   |  +-+
>   +->|virtio target|
>  +-+
>
> 2, Typical virtualization environment. The workloads run in a guest, and
> QEMU handles virtio-pci(or MMIO), and forwards requests to target.
>  +--++--+   +--+
>  |Map-Reduce||   nginx  |  ...  | processes|
>  +--++--+   +--+
> 
> Guest|   |  |
> Kernel   +---+   +---+  +---+
>   | ext4  |   | LKCF  |  | HWRNG |
>   +---+   +---+  +---+
>   |   |  |
>   +---+   +---+  +---+
>   |  vdx  |   |vCrypto|  | vRNG  |
>   +---+   +---+  +---+
>   |   |  |
> PCI 
>   |
> QEMU +--+
>   |virtio backend|
>   +--+
>   |
>   +--+
>   |NIC/IB|
>   +--+
>   |  +-+
>   +->|virtio target|
>  +-+
>

So it's the job of the Qemu to do the translation from virtqueue to packet here?

> 3, SmartNIC/DPU/vDPA environment. It's possible to convert virtio-pci
> request to virtio-of request by hardware, and forward request to virtio
> target directly.
>  +--++--+   +--+
>  |Map-Reduce||   nginx  |  ...  | processes|
>  +--++--+   +--+
> 
> Host |   |  |
> Kernel   +---+   +---+  +---+
>   | ext4  |   | LKCF  |  | HWRNG |
>   +---+   +---+  +---+
>   |   |  |
>   +---+   +---+  +---+