Re: [PATCH RFC 0/2] Packed ring for vhost

2018-02-13 Thread Jason Wang



On 2018年02月14日 10:47, Michael S. Tsirkin wrote:

On Wed, Feb 14, 2018 at 10:37:07AM +0800, Jason Wang wrote:

Hi all:

This RFC implement a subset of packed ring which was described at
https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
. The code were tested with pmd implement by Jens at
http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
change was needed for pmd codes to kick virtqueue since it assumes a
busy polling backend.

Test were done between localhost and guest. Testpmd (rxonly) in guest
reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

How does this compare with the split ring design?


No obvious difference (+-5%). I believe we reach the bottleneck of vhost.




It's not a complete implemention, here's what were missed:

- Device Area
- Driver Area
- Descriptor indirection
- Zerocopy may not be functional
- Migration path is not tested
- Vhost devices except for net
- vIOMMU can not work (mainly because the metadata prefetch is not
   implemented).
- See FIXME/TODO in the codes for more details
- No batching or other optimizations were implemented

ioeventfd for PIO/mmio/s390.



Probably, but this is not the stuffs of packed ring I think.

Thanks


Re: [PATCH RFC 0/2] Packed ring for vhost

2018-02-13 Thread Michael S. Tsirkin
On Wed, Feb 14, 2018 at 10:37:07AM +0800, Jason Wang wrote:
> Hi all:
> 
> This RFC implement a subset of packed ring which was described at
> https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
> . The code were tested with pmd implement by Jens at
> http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
> change was needed for pmd codes to kick virtqueue since it assumes a
> busy polling backend.
> 
> Test were done between localhost and guest. Testpmd (rxonly) in guest
> reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

How does this compare with the split ring design?

> It's not a complete implemention, here's what were missed:
> 
> - Device Area
> - Driver Area
> - Descriptor indirection
> - Zerocopy may not be functional
> - Migration path is not tested
> - Vhost devices except for net
> - vIOMMU can not work (mainly because the metadata prefetch is not
>   implemented).
> - See FIXME/TODO in the codes for more details
> - No batching or other optimizations were implemented

ioeventfd for PIO/mmio/s390.

> For a quick prototype, this series open code the tracking of warp
> counter and descriptor index at net device. This will be addressed in
> the future by:
> 
> - Move get_rx_bufs() from net.c to vhost.c
> - Let vhost_get_vq_desc() returns vring_used_elem instad of head id
> 
> With the above, we can hide the internal (at least part of) vring
> layout from specific device.
> 
> Please review.
> 
> Thanks
> 
> Jason Wang (2):
>   virtio: introduce packed ring defines
>   vhost: packed ring support
> 
>  drivers/vhost/net.c|  14 +-
>  drivers/vhost/vhost.c  | 351 
> ++---
>  drivers/vhost/vhost.h  |   6 +-
>  include/uapi/linux/virtio_config.h |   9 +
>  include/uapi/linux/virtio_ring.h   |  17 ++
>  5 files changed, 369 insertions(+), 28 deletions(-)
> 
> -- 
> 2.7.4


Re: [PATCH RFC 0/2] Packed ring for vhost

2018-02-13 Thread Jason Wang



On 2018年02月14日 10:37, Jason Wang wrote:

Hi all:

This RFC implement a subset of packed ring which was described at
https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
. The code were tested with pmd implement by Jens at
http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
change was needed for pmd codes to kick virtqueue since it assumes a
busy polling backend.

Test were done between localhost and guest. Testpmd (rxonly) in guest
reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

It's not a complete implemention, here's what were missed:

- Device Area
- Driver Area
- Descriptor indirection
- Zerocopy may not be functional
- Migration path is not tested
- Vhost devices except for net
- vIOMMU can not work (mainly because the metadata prefetch is not
   implemented).
- See FIXME/TODO in the codes for more details
- No batching or other optimizations were implemented

For a quick prototype, this series open code the tracking of warp
counter and descriptor index at net device. This will be addressed in
the future by:

- Move get_rx_bufs() from net.c to vhost.c
- Let vhost_get_vq_desc() returns vring_used_elem instad of head id

With the above, we can hide the internal (at least part of) vring
layout from specific device.

Please review.

Thanks


It's near spring festival in China, will probably reply after the holiday.

Thanks