from:"Jason Wang"

Re: [RFC v1 3/8] vhost: Add 3 new uapi to support iommufd

2023-11-05 Thread Jason Wang

On Sat, Nov 4, 2023 at 1:17 AM Cindy Lu  wrote:
>
> VHOST_VDPA_SET_IOMMU_FD: bind the device to iommufd device
>
> VDPA_DEVICE_ATTACH_IOMMUFD_AS: Attach a vdpa device to an iommufd
> address space specified by IOAS id.
>
> VDPA_DEVICE_DETACH_IOMMUFD_AS: Detach a vdpa device
> from the iommufd address space
>
> Signed-off-by: Cindy Lu 
> ---

[...]

> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index f5c48b61ab62..07e1b2c443ca 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -219,4 +219,70 @@
>   */
>  #define VHOST_VDPA_RESUME  _IO(VHOST_VIRTIO, 0x7E)
>
> +/* vhost_vdpa_set_iommufd
> + * Input parameters:
> + * @iommufd: file descriptor from /dev/iommu; pass -1 to unset
> + * @iommufd_ioasid: IOAS identifier returned from ioctl(IOMMU_IOAS_ALLOC)
> + * Output parameters:
> + * @out_dev_id: device identifier
> + */
> +struct vhost_vdpa_set_iommufd {
> +   __s32 iommufd;
> +   __u32 iommufd_ioasid;
> +   __u32 out_dev_id;
> +};
> +
> +#define VHOST_VDPA_SET_IOMMU_FD \
> +   _IOW(VHOST_VIRTIO, 0x7F, struct vhost_vdpa_set_iommufd)
> +
> +/*
> + * VDPA_DEVICE_ATTACH_IOMMUFD_AS -
> + * _IOW(VHOST_VIRTIO, 0x7f, struct vdpa_device_attach_iommufd_as)
> + *
> + * Attach a vdpa device to an iommufd address space specified by IOAS
> + * id.
> + *
> + * Available only after a device has been bound to iommufd via
> + * VHOST_VDPA_SET_IOMMU_FD
> + *
> + * Undo by VDPA_DEVICE_DETACH_IOMMUFD_AS or device fd close.
> + *
> + * @argsz: user filled size of this data.
> + * @flags: must be 0.
> + * @ioas_id:   Input the target id which can represent an ioas
> + * allocated via iommufd subsystem.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vdpa_device_attach_iommufd_as {
> +   __u32 argsz;
> +   __u32 flags;
> +   __u32 ioas_id;
> +};

I think we need to map ioas to vDPA AS, so there should be an ASID
from the view of vDPA?

Thanks

> +
> +#define VDPA_DEVICE_ATTACH_IOMMUFD_AS \
> +   _IOW(VHOST_VIRTIO, 0x82, struct vdpa_device_attach_iommufd_as)
> +
> +/*
> + * VDPA_DEVICE_DETACH_IOMMUFD_AS
> + *
> + * Detach a vdpa device from the iommufd address space it has been
> + * attached to. After it, device should be in a blocking DMA state.
> + *
> + * Available only after a device has been bound to iommufd via
> + * VHOST_VDPA_SET_IOMMU_FD
> + *
> + * @argsz: user filled size of this data.
> + * @flags: must be 0.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vdpa_device_detach_iommufd_as {
> +   __u32 argsz;
> +   __u32 flags;
> +};
> +
> +#define VDPA_DEVICE_DETACH_IOMMUFD_AS \
> +   _IOW(VHOST_VIRTIO, 0x83, struct vdpa_device_detach_iommufd_as)
> +
>  #endif
> --
> 2.34.3
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v1 3/8] vhost: Add 3 new uapi to support iommufd

2023-11-05 Thread Jason Wang

On Sat, Nov 4, 2023 at 1:17 AM Cindy Lu  wrote:
>
> VHOST_VDPA_SET_IOMMU_FD: bind the device to iommufd device
>
> VDPA_DEVICE_ATTACH_IOMMUFD_AS: Attach a vdpa device to an iommufd
> address space specified by IOAS id.
>
> VDPA_DEVICE_DETACH_IOMMUFD_AS: Detach a vdpa device
> from the iommufd address space
>
> Signed-off-by: Cindy Lu 

As discussed in the previous version, any reason/advantages of this
compared to just having a single  VDPA_DEVICE_ATTACH_IOMMUFD_AS?

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v1 8/8] iommu: expose the function iommu_device_use_default_domain

2023-11-05 Thread Jason Wang

On Sat, Nov 4, 2023 at 1:18 AM Cindy Lu  wrote:
>
> Expose the function iommu_device_use_default_domain() and
> iommu_device_unuse_default_domain()，
> While vdpa bind the iommufd device and detach the iommu device,
> vdpa need to call the function
> iommu_device_unuse_default_domain() to release the owner
>
> Signed-off-by: Cindy Lu 

This is the end of the series, who is the user then?

Thanks

> ---
>  drivers/iommu/iommu.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3bfc56df4f78..987cbf8c9a87 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3164,6 +3164,7 @@ int iommu_device_use_default_domain(struct device *dev)
>
> return ret;
>  }
> +EXPORT_SYMBOL_GPL(iommu_device_use_default_domain);
>
>  /**
>   * iommu_device_unuse_default_domain() - Device driver stops handling device
> @@ -3187,6 +3188,7 @@ void iommu_device_unuse_default_domain(struct device 
> *dev)
> mutex_unlock(&group->mutex);
> iommu_group_put(group);
>  }
> +EXPORT_SYMBOL_GPL(iommu_device_unuse_default_domain);
>
>  static int __iommu_group_alloc_blocking_domain(struct iommu_group *group)
>  {
> --
> 2.34.3
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v1 7/8] vp_vdpa::Add support for iommufd

2023-11-05 Thread Jason Wang

On Sat, Nov 4, 2023 at 1:17 AM Cindy Lu  wrote:
>
> Add new vdpa_config_ops function to support iommufd
>
> Signed-off-by: Cindy Lu 
> ---
>  drivers/vdpa/virtio_pci/vp_vdpa.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
> b/drivers/vdpa/virtio_pci/vp_vdpa.c
> index 281287fae89f..dd2c372d36a6 100644
> --- a/drivers/vdpa/virtio_pci/vp_vdpa.c
> +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
> @@ -460,6 +460,10 @@ static const struct vdpa_config_ops vp_vdpa_ops = {
> .set_config = vp_vdpa_set_config,
> .set_config_cb  = vp_vdpa_set_config_cb,
> .get_vq_irq = vp_vdpa_get_vq_irq,
> +   .bind_iommufd = vdpa_iommufd_physical_bind,
> +   .unbind_iommufd = vdpa_iommufd_physical_unbind,
> +   .attach_ioas = vdpa_iommufd_physical_attach_ioas,
> +   .detach_ioas = vdpa_iommufd_physical_detach_ioas,

For the device that depends on the platform IOMMU, any reason we still
bother a per device config ops here just as an indirection?

Thanks

>  };
>
>  static void vp_vdpa_free_irq_vectors(void *data)
> --
> 2.34.3
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v1 0/8] vhost-vdpa: add support for iommufd

2023-11-05 Thread Jason Wang

On Sat, Nov 4, 2023 at 1:16 AM Cindy Lu  wrote:
>
>
> Hi All
> This code provides the iommufd support for vdpa device
> This code fixes the bugs from the last version and also add the asid support. 
> rebase on kernel
> v6,6-rc3
> Test passed in the physical device (vp_vdpa), but  there are still some 
> problems in the emulated device (vdpa_sim_net),
> I will continue working on it
>
> The kernel code is
> https://gitlab.com/lulu6/vhost/-/tree/iommufdRFC_v1
>
> Signed-off-by: Cindy Lu 

It would be better to have a change summary here.

Thanks

>
>
> Cindy Lu (8):
>   vhost/iommufd: Add the functions support iommufd
>   Kconfig: Add the new file vhost/iommufd
>   vhost: Add 3 new uapi to support iommufd
>   vdpa: Add new vdpa_config_ops to support iommufd
>   vdpa_sim :Add support for iommufd
>   vdpa: change the map/unmap process to support iommufd
>   vp_vdpa::Add support for iommufd
>   iommu: expose the function iommu_device_use_default_domain
>
>  drivers/iommu/iommu.c |   2 +
>  drivers/vdpa/vdpa_sim/vdpa_sim.c  |   8 ++
>  drivers/vdpa/virtio_pci/vp_vdpa.c |   4 +
>  drivers/vhost/Kconfig |   1 +
>  drivers/vhost/Makefile|   1 +
>  drivers/vhost/iommufd.c   | 178 +
>  drivers/vhost/vdpa.c  | 210 +-
>  drivers/vhost/vhost.h |  21 +++
>  include/linux/vdpa.h  |  38 +-
>  include/uapi/linux/vhost.h|  66 ++
>  10 files changed, 525 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/vhost/iommufd.c
>
> --
> 2.34.3
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V2] vdpa/mlx5: preserve CVQ vringh index

2023-11-02 Thread Jason Wang

On Thu, Nov 2, 2023 at 3:10 PM Eugenio Perez Martin  wrote:
>
> On Wed, Nov 1, 2023 at 4:55 PM Steve Sistare  
> wrote:
> >
> > mlx5_vdpa does not preserve userland's view of vring base for the control
> > queue in the following sequence:
> >
> > ioctl VHOST_SET_VRING_BASE
> > ioctl VHOST_VDPA_SET_STATUS VIRTIO_CONFIG_S_DRIVER_OK
> >   mlx5_vdpa_set_status()
> > setup_cvq_vring()
> >   vringh_init_iotlb()
> > vringh_init_kern()
> >   vrh->last_avail_idx = 0;
> > ioctl VHOST_GET_VRING_BASE
> >
> > To fix, restore the value of cvq->vring.last_avail_idx after calling
> > vringh_init_iotlb.
> >
>
> Fixes tag missing?

+1

>
> Apart from that,
>
> Acked-by: Eugenio Pérez 

With the fix tag added.

Acked-by: Jason Wang 

Thanks

>
> > Signed-off-by: Steve Sistare 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 946488b8989f..ca972af3c89a 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -2795,13 +2795,18 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev 
> > *mvdev)
> > struct mlx5_control_vq *cvq = &mvdev->cvq;
> > int err = 0;
> >
> > -   if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> > +   if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)) {
> > +   u16 idx = cvq->vring.last_avail_idx;
> > +
> > err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
> > MLX5_CVQ_MAX_ENT, false,
> > (struct vring_desc 
> > *)(uintptr_t)cvq->desc_addr,
> > (struct vring_avail 
> > *)(uintptr_t)cvq->driver_addr,
> > (struct vring_used 
> > *)(uintptr_t)cvq->device_addr);
> >
> > +   if (!err)
> > +   cvq->vring.last_avail_idx = 
> > cvq->vring.last_used_idx = idx;
> > +   }
> > return err;
> >  }
> >
> > --
> > 2.39.3
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 0/5] virtio-net: support dynamic coalescing moderation

2023-11-01 Thread Jason Wang

On Wed, Nov 1, 2023 at 5:38 PM Heng Qi  wrote:
>
>
>
> 在 2023/10/25 上午9:18, Jason Wang 写道:
> > On Tue, Oct 24, 2023 at 8:03 PM Heng Qi  wrote:
> >>
> >>
> >> 在 2023/10/12 下午4:29, Jason Wang 写道:
> >>> On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
> >>>> Now, virtio-net already supports per-queue moderation parameter
> >>>> setting. Based on this, we use the netdim library of linux to support
> >>>> dynamic coalescing moderation for virtio-net.
> >>>>
> >>>> Due to hardware scheduling issues, we only tested rx dim.
> >>> Do you have PPS numbers? And TX numbers are also important as the
> >>> throughput could be misleading due to various reasons.
> >> Hi Jason!
> >>
> >> The comparison of rx netdim performance is as follows:
> >> (the backend supporting tx dim is not yet ready)
> > Thanks a lot for the numbers.
> >
> > I'd still expect the TX result as I did play tx interrupt coalescing
>
> Hi, Jason.
>
> Sorry for the late reply to this! Our team has been blocked by other
> priorities the past few days.
>
> For tx dim, we have a fixed empirical value internally.
> This value performs better overall than manually adjusting the tx timer
> register -->
> I'll do not have tx numbers. :( So in the short term I no longer try to
> push [5/5]
> patch for tx dim and try to return -EOPNOTSUPP for it, sorry for this.
>
> > about 10 years ago.
> >
> > I will start to review the series but let's try to have some TX numbers as 
> > well.
> >
> > Btw, it would be more convenient to have a raw PPS benchmark. E.g you
>
> I got some raw pps data using pktgen from linux/sample/pktgen:
>
> 1. tx cmd
> ./pktgen_sample02_multiqueue.sh -i eth1 -s 44 -d ${dst_ip} -m ${dst_mac}
> -t 8 -f 0 -n 0
>
> This uses 8 kpktgend threads to inject data into eth1.
>
> 2. Rx side loads a simple xdp prog which drops all received udp packets.
>
> 3. Data
> pps: ~1000w

For "w" did you mean 10 million? Looks too huge to me?

> rx dim off: cpu idle= ~35%
> rx dim on: cpu idle= ~76%

This looks promising.

Thanks

>
> Thanks!
>
> > can try to use a software or hardware packet generator.
> >
> > Thanks
> >
> >>
> >> I. Sockperf UDP
> >> =
> >> 1. Env
> >> rxq_0 is affinity to cpu_0
> >>
> >> 2. Cmd
> >> client:  taskset -c 0 sockperf tp -p 8989 -i $IP -t 10 -m 16B
> >> server: taskset -c 0 sockperf sr -p 8989
> >>
> >> 3. Result
> >> dim off: 1143277.00 rxpps, throughput 17.844 MBps, cpu is 100%.
> >> dim on: 1124161.00 rxpps, throughput 17.610 MBps, cpu is 83.5%.
> >> =
> >>
> >>
> >> II. Redis
> >> =
> >> 1. Env
> >> There are 8 rxqs and rxq_i is affinity to cpu_i.
> >>
> >> 2. Result
> >> When all cpus are 100%, ops/sec of memtier_benchmark client is
> >> dim off:   978437.23
> >> dim on: 1143638.28
> >> =
> >>
> >>
> >> III. Nginx
> >> =
> >> 1. Env
> >> There are 8 rxqs and rxq_i is affinity to cpu_i.
> >>
> >> 2. Result
> >> When all cpus are 100%, requests/sec of wrk client is
> >> dim off:   877931.67
> >> dim on: 1019160.31
> >> =
> >>
> >> Thanks!
> >>
> >>> Thanks
> >>>
> >>>> @Test env
> >>>> rxq0 has affinity to cpu0.
> >>>>
> >>>> @Test cmd
> >>>> client: taskset -c 0 sockperf tp -i ${IP} -t 30 --tcp -m ${msg_size}
> >>>> server: taskset -c 0 sockperf sr --tcp
> >>>>
> >>>> @Test res
> >>>> The second column is the ratio of the result returned by client
> >>>> when rx dim is enabled to the result returned by client when
> >>>> rx dim is disabled.
> >>>>   --
> >>>>   | msg_size |  rx_dim=on / rx_dim=off |
> >>>>   --
> >>>>   |   14B| + 3%|
> >>>>   --
> >>>>

Re: [PATCH net-next 0/5] virtio-net: support dynamic coalescing moderation

2023-11-01 Thread Jason Wang

On Wed, Nov 1, 2023 at 7:03 PM Heng Qi  wrote:
>
>
>
> 在 2023/10/25 下午1:53, Michael S. Tsirkin 写道:
> > On Wed, Oct 25, 2023 at 09:18:27AM +0800, Jason Wang wrote:
> >> On Tue, Oct 24, 2023 at 8:03 PM Heng Qi  wrote:
> >>>
> >>>
> >>> 在 2023/10/12 下午4:29, Jason Wang 写道:
> >>>> On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
> >>>>> Now, virtio-net already supports per-queue moderation parameter
> >>>>> setting. Based on this, we use the netdim library of linux to support
> >>>>> dynamic coalescing moderation for virtio-net.
> >>>>>
> >>>>> Due to hardware scheduling issues, we only tested rx dim.
> >>>> Do you have PPS numbers? And TX numbers are also important as the
> >>>> throughput could be misleading due to various reasons.
> >>> Hi Jason!
> >>>
> >>> The comparison of rx netdim performance is as follows:
> >>> (the backend supporting tx dim is not yet ready)
> >> Thanks a lot for the numbers.
> >>
> >> I'd still expect the TX result as I did play tx interrupt coalescing
> >> about 10 years ago.
> >>
> >> I will start to review the series but let's try to have some TX numbers as 
> >> well.
> >>
> >> Btw, it would be more convenient to have a raw PPS benchmark. E.g you
> >> can try to use a software or hardware packet generator.
> >>
> >> Thanks
> > Latency results are also kind of interesting.
>
> I test the latency using sockperf pp:
>
> @Rx cmd
> taskset -c 0 sockperf sr -p 8989
>
> @Tx cmd
> taskset -c 0 sockperf pp -i ${ip} -p 8989 -t 10
>
> After running this cmd 5 times and averaging the results,
> we get the following data:
>
> dim off: 17.7735 usec
> dim on: 18.0110 usec

Let's add those numbers to the changelog of the next version.

Thanks

>
> Thanks!
>
> >
> >
> >>>
> >>> I. Sockperf UDP
> >>> =
> >>> 1. Env
> >>> rxq_0 is affinity to cpu_0
> >>>
> >>> 2. Cmd
> >>> client:  taskset -c 0 sockperf tp -p 8989 -i $IP -t 10 -m 16B
> >>> server: taskset -c 0 sockperf sr -p 8989
> >>>
> >>> 3. Result
> >>> dim off: 1143277.00 rxpps, throughput 17.844 MBps, cpu is 100%.
> >>> dim on: 1124161.00 rxpps, throughput 17.610 MBps, cpu is 83.5%.
> >>> =
> >>>
> >>>
> >>> II. Redis
> >>> =
> >>> 1. Env
> >>> There are 8 rxqs and rxq_i is affinity to cpu_i.
> >>>
> >>> 2. Result
> >>> When all cpus are 100%, ops/sec of memtier_benchmark client is
> >>> dim off:   978437.23
> >>> dim on: 1143638.28
> >>> =
> >>>
> >>>
> >>> III. Nginx
> >>> =
> >>> 1. Env
> >>> There are 8 rxqs and rxq_i is affinity to cpu_i.
> >>>
> >>> 2. Result
> >>> When all cpus are 100%, requests/sec of wrk client is
> >>> dim off:   877931.67
> >>> dim on: 1019160.31
> >>> =
> >>>
> >>> Thanks!
> >>>
> >>>> Thanks
> >>>>
> >>>>> @Test env
> >>>>> rxq0 has affinity to cpu0.
> >>>>>
> >>>>> @Test cmd
> >>>>> client: taskset -c 0 sockperf tp -i ${IP} -t 30 --tcp -m ${msg_size}
> >>>>> server: taskset -c 0 sockperf sr --tcp
> >>>>>
> >>>>> @Test res
> >>>>> The second column is the ratio of the result returned by client
> >>>>> when rx dim is enabled to the result returned by client when
> >>>>> rx dim is disabled.
> >>>>>   --
> >>>>>   | msg_size |  rx_dim=on / rx_dim=off |
> >>>>>   --
> >>>>>   |   14B| + 3%|
> >>>>>   --
> >>>>>   |   100B   | + 16%   |
> >>>&g

Re: [PATCH net-next 4/5] virtio-net: support rx netdim

2023-11-01 Thread Jason Wang

On Wed, Nov 1, 2023 at 6:55 PM Heng Qi  wrote:
>
>
>
> 在 2023/10/25 上午11:34, Jason Wang 写道:
> > On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
> >> By comparing the traffic information in the complete napi processes,
> >> let the virtio-net driver automatically adjust the coalescing
> >> moderation parameters of each receive queue.
> >>
> >> Signed-off-by: Heng Qi 
> >> ---
> >>   drivers/net/virtio_net.c | 147 +--
> >>   1 file changed, 126 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index caef78bb3963..6ad2890a7909 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -19,6 +19,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >>   #include 
> >>   #include 
> >> @@ -172,6 +173,17 @@ struct receive_queue {
> >>
> >>  struct virtnet_rq_stats stats;
> >>
> >> +   /* The number of rx notifications */
> >> +   u16 calls;
> >> +
> >> +   /* Is dynamic interrupt moderation enabled? */
> >> +   bool dim_enabled;
> >> +
> >> +   /* Dynamic Interrupt Moderation */
> >> +   struct dim dim;
> >> +
> >> +   u32 packets_in_napi;
> >> +
> >>  struct virtnet_interrupt_coalesce intr_coal;
> >>
> >>  /* Chain pages by the private ptr. */
> >> @@ -305,6 +317,9 @@ struct virtnet_info {
> >>  u8 duplex;
> >>  u32 speed;
> >>
> >> +   /* Is rx dynamic interrupt moderation enabled? */
> >> +   bool rx_dim_enabled;
> >> +
> >>  /* Interrupt coalescing settings */
> >>  struct virtnet_interrupt_coalesce intr_coal_tx;
> >>  struct virtnet_interrupt_coalesce intr_coal_rx;
> >> @@ -2001,6 +2016,7 @@ static void skb_recv_done(struct virtqueue *rvq)
> >>  struct virtnet_info *vi = rvq->vdev->priv;
> >>  struct receive_queue *rq = &vi->rq[vq2rxq(rvq)];
> >>
> >> +   rq->calls++;
> >>  virtqueue_napi_schedule(&rq->napi, rvq);
> >>   }
> >>
> >> @@ -2138,6 +2154,25 @@ static void virtnet_poll_cleantx(struct 
> >> receive_queue *rq)
> >>  }
> >>   }
> >>
> >> +static void virtnet_rx_dim_work(struct work_struct *work);
> >> +
> >> +static void virtnet_rx_dim_update(struct virtnet_info *vi, struct 
> >> receive_queue *rq)
> >> +{
> >> +   struct virtnet_rq_stats *stats = &rq->stats;
> >> +   struct dim_sample cur_sample = {};
> >> +
> >> +   if (!rq->packets_in_napi)
> >> +   return;
> >> +
> >> +   u64_stats_update_begin(&rq->stats.syncp);
> >> +   dim_update_sample(rq->calls, stats->packets,
> >> + stats->bytes, &cur_sample);
> >> +   u64_stats_update_end(&rq->stats.syncp);
> >> +
> >> +   net_dim(&rq->dim, cur_sample);
> >> +   rq->packets_in_napi = 0;
> >> +}
> >> +
> >>   static int virtnet_poll(struct napi_struct *napi, int budget)
> >>   {
> >>  struct receive_queue *rq =
> >> @@ -2146,17 +2181,22 @@ static int virtnet_poll(struct napi_struct *napi, 
> >> int budget)
> >>  struct send_queue *sq;
> >>  unsigned int received;
> >>  unsigned int xdp_xmit = 0;
> >> +   bool napi_complete;
> >>
> >>  virtnet_poll_cleantx(rq);
> >>
> >>  received = virtnet_receive(rq, budget, &xdp_xmit);
> >> +   rq->packets_in_napi += received;
> >>
> >>  if (xdp_xmit & VIRTIO_XDP_REDIR)
> >>  xdp_do_flush();
> >>
> >>  /* Out of packets? */
> >> -   if (received < budget)
> >> -   virtqueue_napi_complete(napi, rq->vq, received);
> >> +   if (received < budget) {
> >> +   napi_complete = virtqueue_napi_complete(napi, rq->vq, 
> >> received);
> >> +   if (napi_complete && rq->dim_enabled)
> >> +   virtnet_rx_dim_update(vi, rq);
> >> +

Re: [PATCH net-XXX] vhost-vdpa: fix use after free in vhost_vdpa_probe()

2023-10-31 Thread Jason Wang

On Fri, Oct 27, 2023 at 8:13 PM Dan Carpenter  wrote:
>
> The put_device() calls vhost_vdpa_release_dev() which calls
> ida_simple_remove() and frees "v".  So this call to
> ida_simple_remove() is a use after free and a double free.
>
> Fixes: ebe6a354fa7e ("vhost-vdpa: Call ida_simple_remove() when failed")
> Signed-off-by: Dan Carpenter 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/vhost/vdpa.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 9a2343c45df0..1aa67729e188 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1511,7 +1511,6 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
>
>  err:
> put_device(&v->dev);
> -   ida_simple_remove(&vhost_vdpa_ida, v->minor);
> return r;
>  }
>
> --
> 2.42.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_pci: move structure to a header

2023-10-31 Thread Jason Wang

On Wed, Nov 1, 2023 at 12:20 AM Michael S. Tsirkin  wrote:
>
> These are guest/host interfaces so belong in the header
> where e.g. qemu will know to find them.
> Note: we added a new structure as opposed to extending existing one
> because someone might be relying on the size of the existing structure
> staying unchanged.  Add a warning to avoid using sizeof.
>
> Signed-off-by: Michael S. Tsirkin 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_pci_modern_dev.c |  7 ---
>  include/linux/virtio_pci_modern.h  |  7 ---
>  include/uapi/linux/virtio_pci.h| 11 +++
>  3 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index e2a1fe7bb66c..7de8b1ebabac 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -294,9 +294,10 @@ int vp_modern_probe(struct virtio_pci_modern_device 
> *mdev)
>
> err = -EINVAL;
> mdev->common = vp_modern_map_capability(mdev, common,
> - sizeof(struct virtio_pci_common_cfg), 4,
> - 0, sizeof(struct 
> virtio_pci_modern_common_cfg),
> - &mdev->common_len, NULL);
> + sizeof(struct virtio_pci_common_cfg), 4, 0,
> + offsetofend(struct virtio_pci_modern_common_cfg,
> + queue_reset),
> + &mdev->common_len, NULL);
> if (!mdev->common)
> goto err_map_common;
> mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1,
> diff --git a/include/linux/virtio_pci_modern.h 
> b/include/linux/virtio_pci_modern.h
> index d0f2797420f7..a09e13a577a9 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -5,13 +5,6 @@
>  #include 
>  #include 
>
> -struct virtio_pci_modern_common_cfg {
> -   struct virtio_pci_common_cfg cfg;
> -
> -   __le16 queue_notify_data;   /* read-write */
> -   __le16 queue_reset; /* read-write */
> -};
> -
>  /**
>   * struct virtio_pci_modern_device - info for modern PCI virtio
>   * @pci_dev:   Ptr to the PCI device struct
> diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
> index f703afc7ad31..44f4dd2add18 100644
> --- a/include/uapi/linux/virtio_pci.h
> +++ b/include/uapi/linux/virtio_pci.h
> @@ -166,6 +166,17 @@ struct virtio_pci_common_cfg {
> __le32 queue_used_hi;   /* read-write */
>  };
>
> +/*
> + * Warning: do not use sizeof on this: use offsetofend for
> + * specific fields you need.
> + */
> +struct virtio_pci_modern_common_cfg {
> +   struct virtio_pci_common_cfg cfg;
> +
> +   __le16 queue_notify_data;   /* read-write */
> +   __le16 queue_reset; /* read-write */
> +};
> +
>  /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
>  struct virtio_pci_cfg_cap {
> struct virtio_pci_cap cap;
> --
> MST
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa_sim_blk: allocate the buffer zeroed

2023-10-31 Thread Jason Wang

On Wed, Nov 1, 2023 at 1:52 AM Eugenio Perez Martin  wrote:
>
> On Tue, Oct 31, 2023 at 3:44 PM Stefano Garzarella  
> wrote:
> >
> > Deleting and recreating a device can lead to having the same
> > content as the old device, so let's always allocate buffers
> > completely zeroed out.
> >
> > Fixes: abebb16254b3 ("vdpa_sim_blk: support shared backend")
> > Suggested-by: Qing Wang 
>
> Acked-by: Eugenio Pérez 

Acked-by: Jason Wang 

Thanks

>
> > Signed-off-by: Stefano Garzarella 
> > ---
> >  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c 
> > b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > index b3a3cb165795..b137f3679343 100644
> > --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > @@ -437,7 +437,7 @@ static int vdpasim_blk_dev_add(struct vdpa_mgmt_dev 
> > *mdev, const char *name,
> > if (blk->shared_backend) {
> > blk->buffer = shared_buffer;
> > } else {
> > -   blk->buffer = kvmalloc(VDPASIM_BLK_CAPACITY << SECTOR_SHIFT,
> > +   blk->buffer = kvzalloc(VDPASIM_BLK_CAPACITY << SECTOR_SHIFT,
> >GFP_KERNEL);
> > if (!blk->buffer) {
> > ret = -ENOMEM;
> > @@ -495,7 +495,7 @@ static int __init vdpasim_blk_init(void)
> > goto parent_err;
> >
> > if (shared_backend) {
> > -   shared_buffer = kvmalloc(VDPASIM_BLK_CAPACITY << 
> > SECTOR_SHIFT,
> > +   shared_buffer = kvzalloc(VDPASIM_BLK_CAPACITY << 
> > SECTOR_SHIFT,
> >  GFP_KERNEL);
> > if (!shared_buffer) {
> > ret = -ENOMEM;
> > --
> > 2.41.0
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC] vdpa/mlx5: preserve CVQ vringh index

2023-10-30 Thread Jason Wang

On Mon, Oct 30, 2023 at 10:06 PM Steven Sistare
 wrote:
>
> On 10/27/2023 2:31 AM, Jason Wang wrote:
> > On Fri, Oct 27, 2023 at 4:07 AM Steve Sistare  
> > wrote:
> >>
> >> mlx5_vdpa does not preserve userland's view of vring base for the control
> >> queue in the following sequence:
> >>
> >> ioctl VHOST_SET_VRING_BASE
> >> ioctl VHOST_VDPA_SET_STATUS VIRTIO_CONFIG_S_DRIVER_OK
> >>   mlx5_vdpa_set_status()
> >> setup_cvq_vring()
> >>   vringh_init_iotlb()
> >> vringh_init_kern()
> >>   vrh->last_avail_idx = 0;
> >> ioctl VHOST_GET_VRING_BASE
> >>
> >> To fix, restore the value of cvq->vring.last_avail_idx after calling
> >> vringh_init_iotlb.
> >>
> >> Signed-off-by: Steve Sistare 
> >> ---
> >>  drivers/vdpa/mlx5/net/mlx5_vnet.c |  7 ++-
> >>  drivers/vhost/vringh.c| 30 ++
> >>  include/linux/vringh.h|  2 ++
> >>  3 files changed, 38 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> >> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> index 946488b8989f..f64758143115 100644
> >> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> @@ -2795,13 +2795,18 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev 
> >> *mvdev)
> >> struct mlx5_control_vq *cvq = &mvdev->cvq;
> >> int err = 0;
> >>
> >> -   if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> >> +   if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)) {
> >> +   u16 last_avail_idx = cvq->vring.last_avail_idx;
> >> +
> >> err = vringh_init_iotlb(&cvq->vring, 
> >> mvdev->actual_features,
> >> MLX5_CVQ_MAX_ENT, false,
> >> (struct vring_desc 
> >> *)(uintptr_t)cvq->desc_addr,
> >> (struct vring_avail 
> >> *)(uintptr_t)cvq->driver_addr,
> >> (struct vring_used 
> >> *)(uintptr_t)cvq->device_addr);
> >>
> >> +   if (!err)
> >> +   vringh_set_base_iotlb(&cvq->vring, last_avail_idx);
> >
> > Btw, vringh_set_base_iotlb() deserves an independent patch and it
> > seems it is not specific to IOTLB,
>
> Agreed on both.  I initially submitted the smallest change needed to show the 
> problem.
>
> so we probably need an indirection
> > to have vringh_set_base() first.
>
> Not sure what you mean.  I have defined:
>
> static inline int __vringh_set_base() ...
>
> int vringh_set_base_iotlb()
> return __vringh_set_base(vrh, idx, putu16_iotlb);
>
> to which I would add:
>
> int vringh_set_base_user()
> return __vringh_set_base(vrh, idx, putu16_user);
>
> int vringh_set_base_kern()
> return __vringh_set_base(vrh, idx, putu16_kern;
>
> all in the same patch.
>
> The call site in mlx5_vnet.c would be a 2nd patch.

Right, so we just need to split it.

>
> - Steve
>
> > Or I wonder if it's better to just introduce a new parameter to
> > vringh_init_iotlb()...
>
> I considered that, but IMO the parameter list there is already large, and it 
> would
> be strange to add a parameter for the initial value of avail, but not for 
> used, and
> no one needs the latter.

Fine.

Thanks

>
> - Steve
>
> >> +   }
> >> return err;
> >>  }
> >>
> >> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> >> index 7b8fd977f71c..799762c83007 100644
> >> --- a/drivers/vhost/vringh.c
> >> +++ b/drivers/vhost/vringh.c
> >> @@ -595,6 +595,24 @@ static inline void __vringh_notify_disable(struct 
> >> vringh *vrh,
> >> }
> >>  }
> >>
> >> +static inline int __vringh_set_base(struct vringh *vrh, u16 idx,
> >> +   int (*putu16)(const struct vringh *vrh,
> >> +   __virtio16 *p, u16 val))
> >> +{
> >> +int ret;
> >> +
> >> +ret = putu16(vrh, &vrh->vring.avail->idx, idx);
> >> +if (ret)
> >> +return ret;
> >> +
> >> +ret = putu16(vrh, &vrh->vring.used->idx, idx);
>

Re: [RFC] vdpa/mlx5: preserve CVQ vringh index

2023-10-26 Thread Jason Wang

On Fri, Oct 27, 2023 at 4:07 AM Steve Sistare  wrote:
>
> mlx5_vdpa does not preserve userland's view of vring base for the control
> queue in the following sequence:
>
> ioctl VHOST_SET_VRING_BASE
> ioctl VHOST_VDPA_SET_STATUS VIRTIO_CONFIG_S_DRIVER_OK
>   mlx5_vdpa_set_status()
> setup_cvq_vring()
>   vringh_init_iotlb()
> vringh_init_kern()
>   vrh->last_avail_idx = 0;
> ioctl VHOST_GET_VRING_BASE
>
> To fix, restore the value of cvq->vring.last_avail_idx after calling
> vringh_init_iotlb.
>
> Signed-off-by: Steve Sistare 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c |  7 ++-
>  drivers/vhost/vringh.c| 30 ++
>  include/linux/vringh.h|  2 ++
>  3 files changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 946488b8989f..f64758143115 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2795,13 +2795,18 @@ static int setup_cvq_vring(struct mlx5_vdpa_dev 
> *mvdev)
> struct mlx5_control_vq *cvq = &mvdev->cvq;
> int err = 0;
>
> -   if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))
> +   if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)) {
> +   u16 last_avail_idx = cvq->vring.last_avail_idx;
> +
> err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features,
> MLX5_CVQ_MAX_ENT, false,
> (struct vring_desc 
> *)(uintptr_t)cvq->desc_addr,
> (struct vring_avail 
> *)(uintptr_t)cvq->driver_addr,
> (struct vring_used 
> *)(uintptr_t)cvq->device_addr);
>
> +   if (!err)
> +   vringh_set_base_iotlb(&cvq->vring, last_avail_idx);

Btw, vringh_set_base_iotlb() deserves an independent patch and it
seems it is not specific to IOTLB, so we probably need an indirection
to have vringh_set_base() first.

Or I wonder if it's better to just introduce a new parameter to
vringh_init_iotlb()...

Thanks

> +   }
> return err;
>  }
>
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 7b8fd977f71c..799762c83007 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -595,6 +595,24 @@ static inline void __vringh_notify_disable(struct vringh 
> *vrh,
> }
>  }
>
> +static inline int __vringh_set_base(struct vringh *vrh, u16 idx,
> +   int (*putu16)(const struct vringh *vrh,
> +   __virtio16 *p, u16 val))
> +{
> +int ret;
> +
> +ret = putu16(vrh, &vrh->vring.avail->idx, idx);
> +if (ret)
> +return ret;
> +
> +ret = putu16(vrh, &vrh->vring.used->idx, idx);
> +if (ret)
> +return ret;
> +
> +vrh->last_avail_idx = vrh->last_used_idx = idx;
> +return 0;
> +}
> +
>  /* Userspace access helpers: in this case, addresses are really userspace. */
>  static inline int getu16_user(const struct vringh *vrh, u16 *val, const 
> __virtio16 *p)
>  {
> @@ -1456,6 +1474,18 @@ void vringh_set_iotlb(struct vringh *vrh, struct 
> vhost_iotlb *iotlb,
>  }
>  EXPORT_SYMBOL(vringh_set_iotlb);
>
> +/**
> + * vringh_set_base_iotlb - set avail_idx and used_idx
> + * @vrh: the vring
> + * @idx: the value to set
> + */
> +int vringh_set_base_iotlb(struct vringh *vrh, u16 idx)
> +{
> +return __vringh_set_base(vrh, idx, putu16_iotlb);
> +}
> +EXPORT_SYMBOL(vringh_set_base_iotlb);
> +
> +
>  /**
>   * vringh_getdesc_iotlb - get next available descriptor from ring with
>   * IOTLB.
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index c3a8117dabe8..e9b8af4e6a5e 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -306,6 +306,8 @@ int vringh_init_iotlb_va(struct vringh *vrh, u64 features,
>  struct vring_avail *avail,
>  struct vring_used *used);
>
> +int vringh_set_base_iotlb(struct vringh *vrh, u16 idx);
> +
>  int vringh_getdesc_iotlb(struct vringh *vrh,
>  struct vringh_kiov *riov,
>  struct vringh_kiov *wiov,
> --
> 2.39.3
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 5/5] virtio-net: support tx netdim

2023-10-24 Thread Jason Wang

On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
>
> Similar to rx netdim, this patch supports adaptive tx
> coalescing moderation for the virtio-net.
>
> Signed-off-by: Heng Qi 
> ---
>  drivers/net/virtio_net.c | 143 ---
>  1 file changed, 119 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6ad2890a7909..1c680cb09d48 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -154,6 +154,15 @@ struct send_queue {
>
> struct virtnet_sq_stats stats;
>
> +   /* The number of tx notifications */
> +   u16 calls;
> +
> +   /* Is dynamic interrupt moderation enabled? */
> +   bool dim_enabled;
> +
> +   /* Dynamic Interrupt Moderation */
> +   struct dim dim;
> +
> struct virtnet_interrupt_coalesce intr_coal;
>
> struct napi_struct napi;
> @@ -317,8 +326,9 @@ struct virtnet_info {
> u8 duplex;
> u32 speed;
>
> -   /* Is rx dynamic interrupt moderation enabled? */
> +   /* Is dynamic interrupt moderation enabled? */
> bool rx_dim_enabled;
> +   bool tx_dim_enabled;
>
> /* Interrupt coalescing settings */
> struct virtnet_interrupt_coalesce intr_coal_tx;
> @@ -464,19 +474,40 @@ static bool virtqueue_napi_complete(struct napi_struct 
> *napi,
> return false;
>  }
>
> +static void virtnet_tx_dim_work(struct work_struct *work);
> +
> +static void virtnet_tx_dim_update(struct virtnet_info *vi, struct send_queue 
> *sq)
> +{
> +   struct virtnet_sq_stats *stats = &sq->stats;
> +   struct dim_sample cur_sample = {};
> +
> +   u64_stats_update_begin(&sq->stats.syncp);
> +   dim_update_sample(sq->calls, stats->packets,
> + stats->bytes, &cur_sample);
> +   u64_stats_update_end(&sq->stats.syncp);
> +
> +   net_dim(&sq->dim, cur_sample);
> +}
> +
>  static void skb_xmit_done(struct virtqueue *vq)
>  {
> struct virtnet_info *vi = vq->vdev->priv;
> -   struct napi_struct *napi = &vi->sq[vq2txq(vq)].napi;
> +   struct send_queue *sq = &vi->sq[vq2txq(vq)];
> +   struct napi_struct *napi = &sq->napi;
> +
> +   sq->calls++;

I wonder what's the impact of this counters for netdim. As we have a
mode of orphan skb in xmit.

We need to test to see.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 4/5] virtio-net: support rx netdim

2023-10-24 Thread Jason Wang

On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
>
> By comparing the traffic information in the complete napi processes,
> let the virtio-net driver automatically adjust the coalescing
> moderation parameters of each receive queue.
>
> Signed-off-by: Heng Qi 
> ---
>  drivers/net/virtio_net.c | 147 +--
>  1 file changed, 126 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index caef78bb3963..6ad2890a7909 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -172,6 +173,17 @@ struct receive_queue {
>
> struct virtnet_rq_stats stats;
>
> +   /* The number of rx notifications */
> +   u16 calls;
> +
> +   /* Is dynamic interrupt moderation enabled? */
> +   bool dim_enabled;
> +
> +   /* Dynamic Interrupt Moderation */
> +   struct dim dim;
> +
> +   u32 packets_in_napi;
> +
> struct virtnet_interrupt_coalesce intr_coal;
>
> /* Chain pages by the private ptr. */
> @@ -305,6 +317,9 @@ struct virtnet_info {
> u8 duplex;
> u32 speed;
>
> +   /* Is rx dynamic interrupt moderation enabled? */
> +   bool rx_dim_enabled;
> +
> /* Interrupt coalescing settings */
> struct virtnet_interrupt_coalesce intr_coal_tx;
> struct virtnet_interrupt_coalesce intr_coal_rx;
> @@ -2001,6 +2016,7 @@ static void skb_recv_done(struct virtqueue *rvq)
> struct virtnet_info *vi = rvq->vdev->priv;
> struct receive_queue *rq = &vi->rq[vq2rxq(rvq)];
>
> +   rq->calls++;
> virtqueue_napi_schedule(&rq->napi, rvq);
>  }
>
> @@ -2138,6 +2154,25 @@ static void virtnet_poll_cleantx(struct receive_queue 
> *rq)
> }
>  }
>
> +static void virtnet_rx_dim_work(struct work_struct *work);
> +
> +static void virtnet_rx_dim_update(struct virtnet_info *vi, struct 
> receive_queue *rq)
> +{
> +   struct virtnet_rq_stats *stats = &rq->stats;
> +   struct dim_sample cur_sample = {};
> +
> +   if (!rq->packets_in_napi)
> +   return;
> +
> +   u64_stats_update_begin(&rq->stats.syncp);
> +   dim_update_sample(rq->calls, stats->packets,
> + stats->bytes, &cur_sample);
> +   u64_stats_update_end(&rq->stats.syncp);
> +
> +   net_dim(&rq->dim, cur_sample);
> +   rq->packets_in_napi = 0;
> +}
> +
>  static int virtnet_poll(struct napi_struct *napi, int budget)
>  {
> struct receive_queue *rq =
> @@ -2146,17 +2181,22 @@ static int virtnet_poll(struct napi_struct *napi, int 
> budget)
> struct send_queue *sq;
> unsigned int received;
> unsigned int xdp_xmit = 0;
> +   bool napi_complete;
>
> virtnet_poll_cleantx(rq);
>
> received = virtnet_receive(rq, budget, &xdp_xmit);
> +   rq->packets_in_napi += received;
>
> if (xdp_xmit & VIRTIO_XDP_REDIR)
> xdp_do_flush();
>
> /* Out of packets? */
> -   if (received < budget)
> -   virtqueue_napi_complete(napi, rq->vq, received);
> +   if (received < budget) {
> +   napi_complete = virtqueue_napi_complete(napi, rq->vq, 
> received);
> +   if (napi_complete && rq->dim_enabled)
> +   virtnet_rx_dim_update(vi, rq);
> +   }
>
> if (xdp_xmit & VIRTIO_XDP_TX) {
> sq = virtnet_xdp_get_sq(vi);
> @@ -2176,6 +2216,7 @@ static void virtnet_disable_queue_pair(struct 
> virtnet_info *vi, int qp_index)
> virtnet_napi_tx_disable(&vi->sq[qp_index].napi);
> napi_disable(&vi->rq[qp_index].napi);
> xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq);
> +   cancel_work_sync(&vi->rq[qp_index].dim.work);
>  }
>
>  static int virtnet_enable_queue_pair(struct virtnet_info *vi, int qp_index)
> @@ -2193,6 +2234,9 @@ static int virtnet_enable_queue_pair(struct 
> virtnet_info *vi, int qp_index)
> if (err < 0)
> goto err_xdp_reg_mem_model;
>
> +   INIT_WORK(&vi->rq[qp_index].dim.work, virtnet_rx_dim_work);
> +   vi->rq[qp_index].dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> +
> virtnet_napi_enable(vi->rq[qp_index].vq, &vi->rq[qp_index].napi);
> virtnet_napi_tx_enable(vi, vi->sq[qp_index].vq, 
> &vi->sq[qp_index].napi);
>
> @@ -3335,23 +3379,42 @@ static int virtnet_send_tx_notf_coal_cmds(struct 
> virtnet_info *vi,
>  static int virtnet_send_rx_notf_coal_cmds(struct virtnet_info *vi,
>   struct ethtool_coalesce *ec)
>  {
> +   bool rx_ctrl_dim_on = !!ec->use_adaptive_rx_coalesce;
> struct scatterlist sgs_rx;
> +   int i;
>
> -   vi->ctrl->coal_rx.rx_usecs = cpu_to_le32(ec->rx_coalesce_usecs);
> -   vi->ctrl->coal_rx.rx_max_packets = 
> cpu_to_le32(ec->rx_max_coalesced_frames);
> -   sg_init_one(&sgs_rx, &vi->ctrl->coal_

Re: [PATCH net-next 3/5] virtio-net: extract virtqueue coalescig cmd for reuse

2023-10-24 Thread Jason Wang

On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
>
> Extract commands to set virtqueue coalescing parameters for reuse
> by ethtool -Q, vq resize and netdim.
>
> Signed-off-by: Heng Qi 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/net/virtio_net.c | 106 +++
>  1 file changed, 64 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 54b3fb8d0384..caef78bb3963 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2846,6 +2846,58 @@ static void virtnet_cpu_notif_remove(struct 
> virtnet_info *vi)
> &vi->node_dead);
>  }
>
> +static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
> +u16 vqn, u32 max_usecs, u32 
> max_packets)
> +{
> +   struct scatterlist sgs;
> +
> +   vi->ctrl->coal_vq.vqn = cpu_to_le16(vqn);
> +   vi->ctrl->coal_vq.coal.max_usecs = cpu_to_le32(max_usecs);
> +   vi->ctrl->coal_vq.coal.max_packets = cpu_to_le32(max_packets);
> +   sg_init_one(&sgs, &vi->ctrl->coal_vq, sizeof(vi->ctrl->coal_vq));
> +
> +   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
> + VIRTIO_NET_CTRL_NOTF_COAL_VQ_SET,
> + &sgs))
> +   return -EINVAL;
> +
> +   return 0;
> +}
> +
> +static int virtnet_send_rx_ctrl_coal_vq_cmd(struct virtnet_info *vi,
> +   u16 queue, u32 max_usecs,
> +   u32 max_packets)
> +{
> +   int err;
> +
> +   err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(queue),
> +   max_usecs, max_packets);
> +   if (err)
> +   return err;
> +
> +   vi->rq[queue].intr_coal.max_usecs = max_usecs;
> +   vi->rq[queue].intr_coal.max_packets = max_packets;
> +
> +   return 0;
> +}
> +
> +static int virtnet_send_tx_ctrl_coal_vq_cmd(struct virtnet_info *vi,
> +   u16 queue, u32 max_usecs,
> +   u32 max_packets)
> +{
> +   int err;
> +
> +   err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(queue),
> +   max_usecs, max_packets);
> +   if (err)
> +   return err;
> +
> +   vi->sq[queue].intr_coal.max_usecs = max_usecs;
> +   vi->sq[queue].intr_coal.max_packets = max_packets;
> +
> +   return 0;
> +}
> +
>  static void virtnet_get_ringparam(struct net_device *dev,
>   struct ethtool_ringparam *ring,
>   struct kernel_ethtool_ringparam 
> *kernel_ring,
> @@ -2903,14 +2955,11 @@ static int virtnet_set_ringparam(struct net_device 
> *dev,
>  * through the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET 
> command, or, if the driver
>  * did not set any TX coalescing parameters, to 0.
>  */
> -   err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(i),
> -   
> vi->intr_coal_tx.max_usecs,
> -   
> vi->intr_coal_tx.max_packets);
> +   err = virtnet_send_tx_ctrl_coal_vq_cmd(vi, i,
> +  
> vi->intr_coal_tx.max_usecs,
> +  
> vi->intr_coal_tx.max_packets);
> if (err)
> return err;
> -
> -   vi->sq[i].intr_coal.max_usecs = 
> vi->intr_coal_tx.max_usecs;
> -   vi->sq[i].intr_coal.max_packets = 
> vi->intr_coal_tx.max_packets;
> }
>
> if (ring->rx_pending != rx_pending) {
> @@ -2919,14 +2968,11 @@ static int virtnet_set_ringparam(struct net_device 
> *dev,
> return err;
>
> /* The reason is same as the transmit virtqueue reset 
> */
> -   err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(i),
> -   
> vi->intr_coal_rx.max_usecs,
> -   
> vi->intr_coal_rx.max_packets);
> +   err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, i,
> +

Re: [PATCH net-next 1/5] virtio-net: returns whether napi is complete

2023-10-24 Thread Jason Wang

On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
>
> rx netdim needs to count the traffic during a complete napi process,
> and start updating and comparing samples to make decisions after
> the napi ends. Let virtqueue_napi_complete() return true if napi is done,
> otherwise vice versa.
>
> Signed-off-by: Heng Qi 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/net/virtio_net.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a52fd743504a..cf5d2ef4bd24 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -431,7 +431,7 @@ static void virtqueue_napi_schedule(struct napi_struct 
> *napi,
> }
>  }
>
> -static void virtqueue_napi_complete(struct napi_struct *napi,
> +static bool virtqueue_napi_complete(struct napi_struct *napi,
> struct virtqueue *vq, int processed)
>  {
> int opaque;
> @@ -440,9 +440,13 @@ static void virtqueue_napi_complete(struct napi_struct 
> *napi,
> if (napi_complete_done(napi, processed)) {
> if (unlikely(virtqueue_poll(vq, opaque)))
> virtqueue_napi_schedule(napi, vq);
> +   else
> +   return true;
> } else {
> virtqueue_disable_cb(vq);
> }
> +
> +   return false;
>  }
>
>  static void skb_xmit_done(struct virtqueue *vq)
> --
> 2.19.1.6.gb485710b
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 0/5] virtio-net: support dynamic coalescing moderation

2023-10-24 Thread Jason Wang

On Tue, Oct 24, 2023 at 8:03 PM Heng Qi  wrote:
>
>
>
> 在 2023/10/12 下午4:29, Jason Wang 写道:
> > On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
> >> Now, virtio-net already supports per-queue moderation parameter
> >> setting. Based on this, we use the netdim library of linux to support
> >> dynamic coalescing moderation for virtio-net.
> >>
> >> Due to hardware scheduling issues, we only tested rx dim.
> > Do you have PPS numbers? And TX numbers are also important as the
> > throughput could be misleading due to various reasons.
>
> Hi Jason!
>
> The comparison of rx netdim performance is as follows:
> (the backend supporting tx dim is not yet ready)

Thanks a lot for the numbers.

I'd still expect the TX result as I did play tx interrupt coalescing
about 10 years ago.

I will start to review the series but let's try to have some TX numbers as well.

Btw, it would be more convenient to have a raw PPS benchmark. E.g you
can try to use a software or hardware packet generator.

Thanks

>
>
> I. Sockperf UDP
> =
> 1. Env
> rxq_0 is affinity to cpu_0
>
> 2. Cmd
> client:  taskset -c 0 sockperf tp -p 8989 -i $IP -t 10 -m 16B
> server: taskset -c 0 sockperf sr -p 8989
>
> 3. Result
> dim off: 1143277.00 rxpps, throughput 17.844 MBps, cpu is 100%.
> dim on: 1124161.00 rxpps, throughput 17.610 MBps, cpu is 83.5%.
> =
>
>
> II. Redis
> =
> 1. Env
> There are 8 rxqs and rxq_i is affinity to cpu_i.
>
> 2. Result
> When all cpus are 100%, ops/sec of memtier_benchmark client is
> dim off:   978437.23
> dim on: 1143638.28
> =
>
>
> III. Nginx
> =
> 1. Env
> There are 8 rxqs and rxq_i is affinity to cpu_i.
>
> 2. Result
> When all cpus are 100%, requests/sec of wrk client is
> dim off:   877931.67
> dim on: 1019160.31
> =
>
> Thanks!
>
> >
> > Thanks
> >
> >> @Test env
> >> rxq0 has affinity to cpu0.
> >>
> >> @Test cmd
> >> client: taskset -c 0 sockperf tp -i ${IP} -t 30 --tcp -m ${msg_size}
> >> server: taskset -c 0 sockperf sr --tcp
> >>
> >> @Test res
> >> The second column is the ratio of the result returned by client
> >> when rx dim is enabled to the result returned by client when
> >> rx dim is disabled.
> >>  --
> >>  | msg_size |  rx_dim=on / rx_dim=off |
> >>  --
> >>  |   14B| + 3%|
> >>  --
> >>  |   100B   | + 16%   |
> >>  --
> >>  |   500B   | + 25%   |
> >>  --
> >>  |   1400B  | + 28%   |
> >>  --
> >>  |   2048B  | + 22%   |
> >>  --
> >>  |   4096B  | + 5%|
> >>  --
> >>
> >> ---
> >> This patch set was part of the previous netdim patch set[1].
> >> [1] was split into a merged bugfix set[2] and the current set.
> >> The previous relevant commentators have been Cced.
> >>
> >> [1] 
> >> https://lore.kernel.org/all/20230811065512.22190-1-hen...@linux.alibaba.com/
> >> [2] 
> >> https://lore.kernel.org/all/cover.1696745452.git.hen...@linux.alibaba.com/
> >>
> >> Heng Qi (5):
> >>virtio-net: returns whether napi is complete
> >>virtio-net: separate rx/tx coalescing moderation cmds
> >>virtio-net: extract virtqueue coalescig cmd for reuse
> >>virtio-net: support rx netdim
> >>virtio-net: support tx netdim
> >>
> >>   drivers/net/virtio_net.c | 394 ---
> >>   1 file changed, 322 insertions(+), 72 deletions(-)
> >>
> >> --
> >> 2.19.1.6.gb485710b
> >>
> >>
>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 5/7] vhost-vdpa: clean iotlb map during reset for older userspace

2023-10-24 Thread Jason Wang

On Wed, Oct 25, 2023 at 12:25 AM Si-Wei Liu  wrote:
>
>
>
> On 10/24/2023 9:21 AM, Si-Wei Liu wrote:
> >
> >
> > On 10/23/2023 10:45 PM, Jason Wang wrote:
> >> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu 
> >> wrote:
> >>> Using .compat_reset op from the previous patch, the buggy .reset
> >>> behaviour can be kept as-is on older userspace apps, which don't ack
> >>> the
> >>> IOTLB_PERSIST backend feature. As this compatibility quirk is
> >>> limited to
> >>> those drivers that used to be buggy in the past, it won't affect change
> >>> the behaviour or affect ABI on the setups with API compliant driver.
> >>>
> >>> The separation of .compat_reset from the regular .reset allows
> >>> vhost-vdpa able to know which driver had broken behaviour before, so it
> >>> can apply the corresponding compatibility quirk to the individual
> >>> driver
> >>> whenever needed.  Compared to overloading the existing .reset with
> >>> flags, .compat_reset won't cause any extra burden to the implementation
> >>> of every compliant driver.
> >>>
> >>> Signed-off-by: Si-Wei Liu 
> >>> ---
> >>>   drivers/vhost/vdpa.c | 17 +
> >>>   drivers/virtio/virtio_vdpa.c |  2 +-
> >>>   include/linux/vdpa.h |  7 +--
> >>>   3 files changed, 19 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >>> index acc7c74ba7d6..9ce40003793b 100644
> >>> --- a/drivers/vhost/vdpa.c
> >>> +++ b/drivers/vhost/vdpa.c
> >>> @@ -227,13 +227,22 @@ static void vhost_vdpa_unsetup_vq_irq(struct
> >>> vhost_vdpa *v, u16 qid)
> >>> irq_bypass_unregister_producer(&vq->call_ctx.producer);
> >>>   }
> >>>
> >>> -static int vhost_vdpa_reset(struct vhost_vdpa *v)
> >>> +static int _compat_vdpa_reset(struct vhost_vdpa *v)
> >>>   {
> >>>  struct vdpa_device *vdpa = v->vdpa;
> >>> +   u32 flags = 0;
> >>>
> >>> -   v->in_batch = 0;
> >>> +   flags |= !vhost_backend_has_feature(v->vdev.vqs[0],
> >>> + VHOST_BACKEND_F_IOTLB_PERSIST) ?
> >>> +VDPA_RESET_F_CLEAN_MAP : 0;
> >>> +
> >>> +   return vdpa_reset(vdpa, flags);
> >>> +}
> >>>
> >>> -   return vdpa_reset(vdpa);
> >>> +static int vhost_vdpa_reset(struct vhost_vdpa *v)
> >>> +{
> >>> +   v->in_batch = 0;
> >>> +   return _compat_vdpa_reset(v);
> >>>   }
> >>>
> >>>   static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
> >>> @@ -312,7 +321,7 @@ static long vhost_vdpa_set_status(struct
> >>> vhost_vdpa *v, u8 __user *statusp)
> >>>  vhost_vdpa_unsetup_vq_irq(v, i);
> >>>
> >>>  if (status == 0) {
> >>> -   ret = vdpa_reset(vdpa);
> >>> +   ret = _compat_vdpa_reset(v);
> >>>  if (ret)
> >>>  return ret;
> >>>  } else
> >>> diff --git a/drivers/virtio/virtio_vdpa.c
> >>> b/drivers/virtio/virtio_vdpa.c
> >>> index 06ce6d8c2e00..8d63e5923d24 100644
> >>> --- a/drivers/virtio/virtio_vdpa.c
> >>> +++ b/drivers/virtio/virtio_vdpa.c
> >>> @@ -100,7 +100,7 @@ static void virtio_vdpa_reset(struct
> >>> virtio_device *vdev)
> >>>   {
> >>>  struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> >>>
> >>> -   vdpa_reset(vdpa);
> >>> +   vdpa_reset(vdpa, 0);
> >>>   }
> >>>
> >>>   static bool virtio_vdpa_notify(struct virtqueue *vq)
> >>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> >>> index 6b8cbf75712d..db15ac07f8a6 100644
> >>> --- a/include/linux/vdpa.h
> >>> +++ b/include/linux/vdpa.h
> >>> @@ -519,14 +519,17 @@ static inline struct device
> >>> *vdpa_get_dma_dev(struct vdpa_device *vdev)
> >>>  return vdev->dma_dev;
> >>>   }
> >>>
> >>> -static inline int vdpa_reset(struct vdpa_device *vdev)
> >>> +static inline int vdpa_reset(struct vdpa_device *vdev, u32 flags)
> >>>   {
> >>>

Re: [PATCH v4 5/7] vhost-vdpa: clean iotlb map during reset for older userspace

2023-10-23 Thread Jason Wang

On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu  wrote:
>
> Using .compat_reset op from the previous patch, the buggy .reset
> behaviour can be kept as-is on older userspace apps, which don't ack the
> IOTLB_PERSIST backend feature. As this compatibility quirk is limited to
> those drivers that used to be buggy in the past, it won't affect change
> the behaviour or affect ABI on the setups with API compliant driver.
>
> The separation of .compat_reset from the regular .reset allows
> vhost-vdpa able to know which driver had broken behaviour before, so it
> can apply the corresponding compatibility quirk to the individual driver
> whenever needed.  Compared to overloading the existing .reset with
> flags, .compat_reset won't cause any extra burden to the implementation
> of every compliant driver.
>
> Signed-off-by: Si-Wei Liu 
> ---
>  drivers/vhost/vdpa.c | 17 +
>  drivers/virtio/virtio_vdpa.c |  2 +-
>  include/linux/vdpa.h |  7 +--
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index acc7c74ba7d6..9ce40003793b 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -227,13 +227,22 @@ static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa 
> *v, u16 qid)
> irq_bypass_unregister_producer(&vq->call_ctx.producer);
>  }
>
> -static int vhost_vdpa_reset(struct vhost_vdpa *v)
> +static int _compat_vdpa_reset(struct vhost_vdpa *v)
>  {
> struct vdpa_device *vdpa = v->vdpa;
> +   u32 flags = 0;
>
> -   v->in_batch = 0;
> +   flags |= !vhost_backend_has_feature(v->vdev.vqs[0],
> +   VHOST_BACKEND_F_IOTLB_PERSIST) ?
> +VDPA_RESET_F_CLEAN_MAP : 0;
> +
> +   return vdpa_reset(vdpa, flags);
> +}
>
> -   return vdpa_reset(vdpa);
> +static int vhost_vdpa_reset(struct vhost_vdpa *v)
> +{
> +   v->in_batch = 0;
> +   return _compat_vdpa_reset(v);
>  }
>
>  static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
> @@ -312,7 +321,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, 
> u8 __user *statusp)
> vhost_vdpa_unsetup_vq_irq(v, i);
>
> if (status == 0) {
> -   ret = vdpa_reset(vdpa);
> +   ret = _compat_vdpa_reset(v);
> if (ret)
> return ret;
> } else
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index 06ce6d8c2e00..8d63e5923d24 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -100,7 +100,7 @@ static void virtio_vdpa_reset(struct virtio_device *vdev)
>  {
> struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>
> -   vdpa_reset(vdpa);
> +   vdpa_reset(vdpa, 0);
>  }
>
>  static bool virtio_vdpa_notify(struct virtqueue *vq)
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 6b8cbf75712d..db15ac07f8a6 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -519,14 +519,17 @@ static inline struct device *vdpa_get_dma_dev(struct 
> vdpa_device *vdev)
> return vdev->dma_dev;
>  }
>
> -static inline int vdpa_reset(struct vdpa_device *vdev)
> +static inline int vdpa_reset(struct vdpa_device *vdev, u32 flags)
>  {
> const struct vdpa_config_ops *ops = vdev->config;
> int ret;
>
> down_write(&vdev->cf_lock);
> vdev->features_valid = false;
> -   ret = ops->reset(vdev);
> +   if (ops->compat_reset && flags)
> +   ret = ops->compat_reset(vdev, flags);
> +   else
> +   ret = ops->reset(vdev);

Instead of inventing a new API that carries the flags. Tweak the
existing one seems to be simpler and better?

As compat_reset(vdev, 0) == reset(vdev)

Then you don't need the switch in the parent as well

+static int vdpasim_reset(struct vdpa_device *vdpa)
+{
+   return vdpasim_compat_reset(vdpa, 0);
+}

Thanks


> up_write(&vdev->cf_lock);
> return ret;
>  }
> --
> 2.39.3
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC] vhost: vmap virtio descriptor table kernel/kvm

2023-10-23 Thread Jason Wang

On Tue, Oct 24, 2023 at 11:17 AM Liang Chen  wrote:
>
> The current vhost code uses 'copy_from_user' to copy descriptors from
> userspace to vhost. We attempted to 'vmap' the descriptor table to
> reduce the overhead associated with 'copy_from_user' during descriptor
> access, because it can be accessed quite frequently. This change
> resulted in a moderate performance improvement (approximately 3%) in
> our pktgen test, as shown below. Additionally, the latency in the
> 'vhost_get_vq_desc' function shows a noticeable decrease.
>
> current code:
> IFACE   rxpck/s   txpck/srxkB/stxkB/s
> rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> Average:vnet0  0.31 1330807.03  0.02  77976.98
> 0.00  0.00  0.00  6.39
> # /usr/share/bcc/tools/funclatency -d 10 vhost_get_vq_desc
> avg = 145 nsecs, total: 1455980341 nsecs, count: 9985224
>
> vmap:
> IFACE   rxpck/s   txpck/srxkB/stxkB/s
> rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> Average:vnet0  0.07 1371870.49  0.00  80383.04
> 0.00  0.00  0.00  6.58
> # /usr/share/bcc/tools/funclatency -d 10 vhost_get_vq_desc
> avg = 122 nsecs, total: 1286983929 nsecs, count: 10478134
>
> We are uncertain if there are any aspects we may have overlooked and
> would appreciate any advice before we submit an actual patch.

So the idea is to use a shadow page table instead of the userspace one
to avoid things like spec barriers or SMAP.

I've tried this in the past:

commit 7f466032dc9e5a61217f22ea34b2df932786bbfc (HEAD)
Author: Jason Wang 
Date:   Fri May 24 04:12:18 2019 -0400

vhost: access vq metadata through kernel virtual address

It was noticed that the copy_to/from_user() friends that was used to
access virtqueue metdata tends to be very expensive for dataplane
implementation like vhost since it involves lots of software checks,
speculation barriers, hardware feature toggling (e.g SMAP). The
extra cost will be more obvious when transferring small packets since
the time spent on metadata accessing become more significant.
...

Note that it tries to use a direct map instead of a VMAP as Andrea
suggests. But it led to several fallouts which were tricky to be
fixed[1] (like the use of MMU notifiers to do synchronization). So it
is reverted finally.

I'm not saying it's a dead end. But we need to find a way to solve the
issues or use something different. I'm happy to offer help.

1) Avoid using SMAP for vhost kthread, for example using shed
notifier, I'm not sure this is possible or not
2) A new iov iterator that doesn't do SMAP at all, this looks
dangerous and Al might not like it
3) (Re)using HMM
...

You may want to see archives for more information. We've had a lot of
discussions.

Thanks

[1] https://lore.kernel.org/lkml/20190731084655.7024-1-jasow...@redhat.com/

>
>
> Thanks,
> Liang
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset

2023-10-22 Thread Jason Wang

Hi Si-Wei:

On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu  wrote:
>
> In order to reduce needlessly high setup and teardown cost
> of iotlb mapping during live migration, it's crucial to
> decouple the vhost-vdpa iotlb abstraction from the virtio
> device life cycle, i.e. iotlb mappings should be left
> intact across virtio device reset [1]. For it to work, the
> on-chip IOMMU parent device could implement a separate
> .reset_map() operation callback to restore 1:1 DMA mapping
> without having to resort to the .reset() callback, the
> latter of which is mainly used to reset virtio device state.
> This new .reset_map() callback will be invoked only before
> the vhost-vdpa driver is to be removed and detached from
> the vdpa bus, such that other vdpa bus drivers, e.g.
> virtio-vdpa, can start with 1:1 DMA mapping when they
> are attached. For the context, those on-chip IOMMU parent
> devices, create the 1:1 DMA mapping at vdpa device creation,
> and they would implicitly destroy the 1:1 mapping when
> the first .set_map or .dma_map callback is invoked.
>
> This patchset is rebased on top of the latest vhost tree.
>
> [1] Reducing vdpa migration downtime because of memory pin / maps
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg953755.html
>
> ---
> v4:
> - Rework compatibility using new .compat_reset driver op

I still think having a set_backend_feature() or reset_map(clean=true)
might be better. As it tries hard to not introduce new stuff on the
bus.

But we can listen to others for sure.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 4/4] vduse: Add LSM hooks to check Virtio device type

2023-10-22 Thread Jason Wang

On Fri, Oct 20, 2023 at 11:58 PM Maxime Coquelin
 wrote:
>
> This patch introduces LSM hooks for devices creation,
> destruction and opening operations, checking the
> application is allowed to perform these operations for
> the Virtio device type.
>
> Signed-off-by: Maxime Coquelin 
> ---

Hi Maxime:

I think we need to document the reason why we need those hooks now.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 3/4] vduse: Temporarily disable control queue features

2023-10-22 Thread Jason Wang

On Fri, Oct 20, 2023 at 11:58 PM Maxime Coquelin
 wrote:
>
> Virtio-net driver control queue implementation is not safe
> when used with VDUSE. If the VDUSE application does not
> reply to control queue messages, it currently ends up
> hanging the kernel thread sending this command.
>
> Some work is on-going to make the control queue
> implementation robust with VDUSE. Until it is completed,
> let's disable control virtqueue and features that depend on
> it.
>
> Signed-off-by: Maxime Coquelin 

I wonder if it's better to do this with patch 2 or before patch 2 to
unbreak the bisection?

Thanks

> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 37 ++
>  1 file changed, 37 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 73ad3b7efd8e..0243dee9cf0e 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include "iova_domain.h"
> @@ -46,6 +47,30 @@
>
>  #define IRQ_UNBOUND -1
>
> +#define VDUSE_NET_VALID_FEATURES_MASK   \
> +   (BIT_ULL(VIRTIO_NET_F_CSUM) |   \
> +BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | \
> +BIT_ULL(VIRTIO_NET_F_MTU) |\
> +BIT_ULL(VIRTIO_NET_F_MAC) |\
> +BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) | \
> +BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) | \
> +BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |  \
> +BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |  \
> +BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |  \
> +BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |  \
> +BIT_ULL(VIRTIO_NET_F_HOST_ECN) |   \
> +BIT_ULL(VIRTIO_NET_F_HOST_UFO) |   \
> +BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |  \
> +BIT_ULL(VIRTIO_NET_F_STATUS) | \
> +BIT_ULL(VIRTIO_NET_F_HOST_USO) |   \
> +BIT_ULL(VIRTIO_F_ANY_LAYOUT) | \
> +BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) | \
> +BIT_ULL(VIRTIO_RING_F_EVENT_IDX) |  \
> +BIT_ULL(VIRTIO_F_VERSION_1) |  \
> +BIT_ULL(VIRTIO_F_ACCESS_PLATFORM) | \
> +BIT_ULL(VIRTIO_F_RING_PACKED) |\
> +BIT_ULL(VIRTIO_F_IN_ORDER))
> +
>  struct vduse_virtqueue {
> u16 index;
> u16 num_max;
> @@ -1778,6 +1803,16 @@ static struct attribute *vduse_dev_attrs[] = {
>
>  ATTRIBUTE_GROUPS(vduse_dev);
>
> +static void vduse_dev_features_filter(struct vduse_dev_config *config)
> +{
> +   /*
> +* Temporarily filter out virtio-net's control virtqueue and features
> +* that depend on it while CVQ is being made more robust for VDUSE.
> +*/
> +   if (config->device_id == VIRTIO_ID_NET)
> +   config->features &= VDUSE_NET_VALID_FEATURES_MASK;
> +}
> +
>  static int vduse_create_dev(struct vduse_dev_config *config,
> void *config_buf, u64 api_version)
>  {
> @@ -1793,6 +1828,8 @@ static int vduse_create_dev(struct vduse_dev_config 
> *config,
> if (!dev)
> goto err;
>
> +   vduse_dev_features_filter(config);
> +
> dev->api_version = api_version;
> dev->device_features = config->features;
> dev->device_id = config->device_id;
> --
> 2.41.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v1 04/19] virtio_net: move to virtio_net.h

2023-10-20 Thread Jason Wang

On Thu, Oct 19, 2023 at 3:20 PM Xuan Zhuo  wrote:
>
> On Thu, 19 Oct 2023 14:12:55 +0800, Jason Wang  wrote:
> > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo  
> > wrote:
> > >
> > > Move some structure definitions and inline functions into the
> > > virtio_net.h file.
> >
> > Some of the functions are not inline one before the moving. I'm not
> > sure what's the criteria to choose the function to be moved.
>
>
> That will used by xsk.c or other funcions in headers in the subsequence
> commits.
>
> If you are confused, I can try move the function when that is needed.
> This commit just move some important structures.

That's fine.

Thanks

>
> Thanks.
>
> >
> >
> > >
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/net/virtio/main.c   | 252 +--
> > >  drivers/net/virtio/virtio_net.h | 256 
> > >  2 files changed, 258 insertions(+), 250 deletions(-)
> > >  create mode 100644 drivers/net/virtio/virtio_net.h
> > >
> > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > index 6cf77b6acdab..d8b6c0d86f29 100644
> > > --- a/drivers/net/virtio/main.c
> > > +++ b/drivers/net/virtio/main.c
> > > @@ -6,7 +6,6 @@
> > >  //#define DEBUG
> > >  #include 
> > >  #include 
> > > -#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -16,7 +15,6 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > -#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -24,6 +22,8 @@
> > >  #include 
> > >  #include 
> > >
> > > +#include "virtio_net.h"
> > > +
> > >  static int napi_weight = NAPI_POLL_WEIGHT;
> > >  module_param(napi_weight, int, 0444);
> > >
> > > @@ -45,15 +45,6 @@ module_param(napi_tx, bool, 0644);
> > >  #define VIRTIO_XDP_TX  BIT(0)
> > >  #define VIRTIO_XDP_REDIR   BIT(1)
> > >
> > > -#define VIRTIO_XDP_FLAGBIT(0)
> > > -
> > > -/* RX packet size EWMA. The average packet size is used to determine the 
> > > packet
> > > - * buffer size when refilling RX rings. As the entire RX ring may be 
> > > refilled
> > > - * at once, the weight is chosen so that the EWMA will be insensitive to 
> > > short-
> > > - * term, transient changes in packet size.
> > > - */
> > > -DECLARE_EWMA(pkt_len, 0, 64)
> > > -
> > >  #define VIRTNET_DRIVER_VERSION "1.0.0"
> > >
> > >  static const unsigned long guest_offloads[] = {
> > > @@ -74,36 +65,6 @@ static const unsigned long guest_offloads[] = {
> > > (1ULL << VIRTIO_NET_F_GUEST_USO4) | \
> > > (1ULL << VIRTIO_NET_F_GUEST_USO6))
> > >
> > > -struct virtnet_stat_desc {
> > > -   char desc[ETH_GSTRING_LEN];
> > > -   size_t offset;
> > > -};
> > > -
> > > -struct virtnet_sq_stats {
> > > -   struct u64_stats_sync syncp;
> > > -   u64 packets;
> > > -   u64 bytes;
> > > -   u64 xdp_tx;
> > > -   u64 xdp_tx_drops;
> > > -   u64 kicks;
> > > -   u64 tx_timeouts;
> > > -};
> > > -
> > > -struct virtnet_rq_stats {
> > > -   struct u64_stats_sync syncp;
> > > -   u64 packets;
> > > -   u64 bytes;
> > > -   u64 drops;
> > > -   u64 xdp_packets;
> > > -   u64 xdp_tx;
> > > -   u64 xdp_redirects;
> > > -   u64 xdp_drops;
> > > -   u64 kicks;
> > > -};
> > > -
> > > -#define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m)
> > > -#define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m)
> > > -
> > >  static const struct virtnet_stat_desc virtnet_sq_stats_desc[] = {
> > > { "packets",VIRTNET_SQ_STAT(packets) },
> > > { "bytes",  VIRTNET_SQ_STAT(bytes) },
> > > @@ -127,80 +88,6 @@ static const struct virtnet_stat_desc 
> > > virtnet_rq_stats_desc[] = {
> > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > >
> > > -struct

Re: [PATCH net-next v1 18/19] virtio_net: update tx timeout record

2023-10-19 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> If send queue sent some packets, we update the tx timeout
> record to prevent the tx timeout.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks


> ---
>  drivers/net/virtio/xsk.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> index f1c64414fac9..5d3de505c56c 100644
> --- a/drivers/net/virtio/xsk.c
> +++ b/drivers/net/virtio/xsk.c
> @@ -274,6 +274,16 @@ bool virtnet_xsk_xmit(struct virtnet_sq *sq, struct 
> xsk_buff_pool *pool,
>
> virtnet_xsk_check_queue(sq);
>
> +   if (stats.packets) {
> +   struct netdev_queue *txq;
> +   struct virtnet_info *vi;
> +
> +   vi = sq->vq->vdev->priv;
> +
> +   txq = netdev_get_tx_queue(vi->dev, sq - vi->sq);
> +   txq_trans_cond_update(txq);
> +   }
> +
> u64_stats_update_begin(&sq->stats.syncp);
> sq->stats.packets += stats.packets;
> sq->stats.bytes += stats.bytes;
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v1 16/19] virtio_net: xsk: rx: introduce receive_xsk() to recv xsk buffer

2023-10-19 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> Implementing the logic of xsk rx. If this packet is not for XSK
> determined in XDP, then we need to copy once to generate a SKB.
> If it is for XSK, it is a zerocopy receive packet process.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/main.c   |  14 ++--
>  drivers/net/virtio/virtio_net.h |   4 ++
>  drivers/net/virtio/xsk.c| 120 
>  drivers/net/virtio/xsk.h|   4 ++
>  4 files changed, 137 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 0e740447b142..003dd67ab707 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -822,10 +822,10 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> }
>  }
>
> -static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff 
> *xdp,
> -  struct net_device *dev,
> -  unsigned int *xdp_xmit,
> -  struct virtnet_rq_stats *stats)
> +int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> +   struct net_device *dev,
> +   unsigned int *xdp_xmit,
> +   struct virtnet_rq_stats *stats)
>  {
> struct xdp_frame *xdpf;
> int err;
> @@ -1589,13 +1589,17 @@ static void receive_buf(struct virtnet_info *vi, 
> struct virtnet_rq *rq,
> return;
> }
>
> -   if (vi->mergeable_rx_bufs)
> +   rcu_read_lock();
> +   if (rcu_dereference(rq->xsk.pool))
> +   skb = virtnet_receive_xsk(dev, vi, rq, buf, len, xdp_xmit, 
> stats);
> +   else if (vi->mergeable_rx_bufs)
> skb = receive_mergeable(dev, vi, rq, buf, ctx, len, xdp_xmit,
> stats);
> else if (vi->big_packets)
> skb = receive_big(dev, vi, rq, buf, len, stats);
> else
> skb = receive_small(dev, vi, rq, buf, ctx, len, xdp_xmit, 
> stats);
> +   rcu_read_unlock();
>
> if (unlikely(!skb))
> return;
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index 6e71622fca45..fd7f34703c9b 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -346,6 +346,10 @@ static inline bool 
> virtnet_is_xdp_raw_buffer_queue(struct virtnet_info *vi, int
> return false;
>  }
>
> +int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
> +   struct net_device *dev,
> +   unsigned int *xdp_xmit,
> +   struct virtnet_rq_stats *stats);
>  void virtnet_rx_pause(struct virtnet_info *vi, struct virtnet_rq *rq);
>  void virtnet_rx_resume(struct virtnet_info *vi, struct virtnet_rq *rq);
>  void virtnet_tx_pause(struct virtnet_info *vi, struct virtnet_sq *sq);
> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> index 841fb078882a..f1c64414fac9 100644
> --- a/drivers/net/virtio/xsk.c
> +++ b/drivers/net/virtio/xsk.c
> @@ -13,6 +13,18 @@ static void sg_fill_dma(struct scatterlist *sg, dma_addr_t 
> addr, u32 len)
> sg->length = len;
>  }
>
> +static unsigned int virtnet_receive_buf_num(struct virtnet_info *vi, char 
> *buf)
> +{
> +   struct virtio_net_hdr_mrg_rxbuf *hdr;
> +
> +   if (vi->mergeable_rx_bufs) {
> +   hdr = (struct virtio_net_hdr_mrg_rxbuf *)buf;
> +   return virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> +   }
> +
> +   return 1;
> +}
> +
>  static void virtnet_xsk_check_queue(struct virtnet_sq *sq)
>  {
> struct virtnet_info *vi = sq->vq->vdev->priv;
> @@ -37,6 +49,114 @@ static void virtnet_xsk_check_queue(struct virtnet_sq *sq)
> netif_stop_subqueue(dev, qnum);
>  }
>
> +static void merge_drop_follow_xdp(struct net_device *dev,
> + struct virtnet_rq *rq,
> + u32 num_buf,
> + struct virtnet_rq_stats *stats)
> +{
> +   struct xdp_buff *xdp;
> +   u32 len;
> +
> +   while (num_buf-- > 1) {
> +   xdp = virtqueue_get_buf(rq->vq, &len);
> +   if (unlikely(!xdp)) {
> +   pr_debug("%s: rx error: %d buffers missing\n",
> +dev->name, num_buf);
> +   dev->stats.rx_length_errors++;
> +   break;
> +   }
> +   stats->bytes += len;
> +   xsk_buff_free(xdp);
> +   }
> +}
> +
> +static struct sk_buff *construct_skb(struct virtnet_rq *rq,
> +struct xdp_buff *xdp)
> +{
> +   unsigned int metasize = xdp->data - xdp->data_meta;
> +   struct sk_buff *skb;
> +   unsigned int size;
> +
> +   size = xdp->data_end - xdp->data_hard_start;
> +   skb

Re: [PATCH net-next v1 15/19] virtio_net: xsk: rx: introduce add_recvbuf_xsk()

2023-10-19 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> Implement the logic of filling vq with XSK buffer.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/main.c   | 13 +++
>  drivers/net/virtio/virtio_net.h |  5 +++
>  drivers/net/virtio/xsk.c| 66 -
>  drivers/net/virtio/xsk.h|  2 +
>  4 files changed, 85 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 58bb38f9b453..0e740447b142 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -1787,9 +1787,20 @@ static int add_recvbuf_mergeable(struct virtnet_info 
> *vi,
>  static bool try_fill_recv(struct virtnet_info *vi, struct virtnet_rq *rq,
>   gfp_t gfp)
>  {
> +   struct xsk_buff_pool *pool;
> int err;
> bool oom;
>
> +   rcu_read_lock();

A question here: should we sync with refill work during rx_pause?

> +   pool = rcu_dereference(rq->xsk.pool);
> +   if (pool) {
> +   err = virtnet_add_recvbuf_xsk(vi, rq, pool, gfp);
> +   oom = err == -ENOMEM;
> +   rcu_read_unlock();
> +   goto kick;
> +   }
> +   rcu_read_unlock();

And if we synchronize with that there's probably no need for the rcu
and we can merge the logic with the following ones?

Thanks


> +
> do {
> if (vi->mergeable_rx_bufs)
> err = add_recvbuf_mergeable(vi, rq, gfp);
> @@ -1802,6 +1813,8 @@ static bool try_fill_recv(struct virtnet_info *vi, 
> struct virtnet_rq *rq,
> if (err)
> break;
> } while (rq->vq->num_free);
> +
> +kick:
> if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
> unsigned long flags;
>
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index d4e620a084f4..6e71622fca45 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -156,6 +156,11 @@ struct virtnet_rq {
>
> /* xdp rxq used by xsk */
> struct xdp_rxq_info xdp_rxq;
> +
> +   struct xdp_buff **xsk_buffs;
> +   u32 nxt_idx;
> +   u32 num;
> +   u32 size;
> } xsk;
>  };
>
> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> index 973e783260c3..841fb078882a 100644
> --- a/drivers/net/virtio/xsk.c
> +++ b/drivers/net/virtio/xsk.c
> @@ -37,6 +37,58 @@ static void virtnet_xsk_check_queue(struct virtnet_sq *sq)
> netif_stop_subqueue(dev, qnum);
>  }
>
> +static int virtnet_add_recvbuf_batch(struct virtnet_info *vi, struct 
> virtnet_rq *rq,
> +struct xsk_buff_pool *pool, gfp_t gfp)
> +{
> +   struct xdp_buff **xsk_buffs;
> +   dma_addr_t addr;
> +   u32 len, i;
> +   int err = 0;
> +
> +   xsk_buffs = rq->xsk.xsk_buffs;
> +
> +   if (rq->xsk.nxt_idx >= rq->xsk.num) {
> +   rq->xsk.num = xsk_buff_alloc_batch(pool, xsk_buffs, 
> rq->xsk.size);
> +   if (!rq->xsk.num)
> +   return -ENOMEM;
> +   rq->xsk.nxt_idx = 0;
> +   }
> +
> +   while (rq->xsk.nxt_idx < rq->xsk.num) {
> +   i = rq->xsk.nxt_idx;
> +
> +   /* use the part of XDP_PACKET_HEADROOM as the virtnet hdr 
> space */
> +   addr = xsk_buff_xdp_get_dma(xsk_buffs[i]) - vi->hdr_len;
> +   len = xsk_pool_get_rx_frame_size(pool) + vi->hdr_len;
> +
> +   sg_init_table(rq->sg, 1);
> +   sg_fill_dma(rq->sg, addr, len);
> +
> +   err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, xsk_buffs[i], 
> gfp);
> +   if (err)
> +   return err;
> +
> +   rq->xsk.nxt_idx++;
> +   }
> +
> +   return 0;
> +}
> +
> +int virtnet_add_recvbuf_xsk(struct virtnet_info *vi, struct virtnet_rq *rq,
> +   struct xsk_buff_pool *pool, gfp_t gfp)
> +{
> +   int err;
> +
> +   do {
> +   err = virtnet_add_recvbuf_batch(vi, rq, pool, gfp);
> +   if (err)
> +   return err;
> +
> +   } while (rq->vq->num_free);
> +
> +   return 0;
> +}
> +
>  static int virtnet_xsk_xmit_one(struct virtnet_sq *sq,
> struct xsk_buff_pool *pool,
> struct xdp_desc *desc)
> @@ -244,7 +296,7 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
> struct virtnet_sq *sq;
> struct device *dma_dev;
> dma_addr_t hdr_dma;
> -   int err;
> +   int err, size;
>
> /* In big_packets mode, xdp cannot work, so there is no need to
>  * initialize xsk of rq.
> @@ -276,6 +328,16 @@ static int virtnet_xsk_pool_enable(struct net_device 
> *dev,
> if (!dma_dev)
> return -EPERM;
>
> +   size = vir

Re: [PATCH net-next v1 14/19] virtio_net: xsk: tx: virtnet_sq_free_unused_buf() check xsk buffer

2023-10-19 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> virtnet_sq_free_unused_buf() check xsk buffer.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks


> ---
>  drivers/net/virtio/main.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 1a21352e..58bb38f9b453 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -3876,10 +3876,12 @@ static void free_receive_page_frags(struct 
> virtnet_info *vi)
>
>  void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
>  {
> -   if (!virtnet_is_xdp_frame(buf))
> +   if (virtnet_is_skb_ptr(buf))
> dev_kfree_skb(buf);
> -   else
> +   else if (virtnet_is_xdp_frame(buf))
> xdp_return_frame(virtnet_ptr_to_xdp(buf));
> +
> +   /* xsk buffer do not need handle. */
>  }
>
>  void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf)
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v1 12/19] virtio_net: xsk: tx: support wakeup

2023-10-19 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> xsk wakeup is used to trigger the logic for xsk xmit by xsk framework or
> user.
>
> Virtio-Net does not support to actively generate an interruption, so it
> tries to trigger tx NAPI on the tx interrupt cpu.
>
> Consider the effect of cache. When interrupt triggers, it is
> generally fixed on a CPU. It is better to start TX Napi on the same
> CPU.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/main.c   |  3 ++
>  drivers/net/virtio/virtio_net.h |  8 +
>  drivers/net/virtio/xsk.c| 57 +
>  drivers/net/virtio/xsk.h|  1 +
>  4 files changed, 69 insertions(+)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index a08429bef61f..1a21352e 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -2066,6 +2066,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, 
> int budget)
> return 0;
> }
>
> +   sq->xsk.last_cpu = smp_processor_id();
> +
> txq = netdev_get_tx_queue(vi->dev, index);
> __netif_tx_lock(txq, raw_smp_processor_id());
> virtqueue_disable_cb(sq->vq);
> @@ -3770,6 +3772,7 @@ static const struct net_device_ops virtnet_netdev = {
> .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
> .ndo_bpf= virtnet_xdp,
> .ndo_xdp_xmit   = virtnet_xdp_xmit,
> +   .ndo_xsk_wakeup = virtnet_xsk_wakeup,
> .ndo_features_check = passthru_features_check,
> .ndo_get_phys_port_name = virtnet_get_phys_port_name,
> .ndo_set_features   = virtnet_set_features,
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index 3bbb1f5baad5..7c72a8bb1813 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -101,6 +101,14 @@ struct virtnet_sq {
> struct xsk_buff_pool __rcu *pool;
>
> dma_addr_t hdr_dma_address;
> +
> +   u32 last_cpu;
> +   struct __call_single_data csd;
> +
> +   /* The lock to prevent the repeat of calling
> +* smp_call_function_single_async().
> +*/
> +   spinlock_t ipi_lock;
> } xsk;
>  };
>
> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> index 0e775a9d270f..973e783260c3 100644
> --- a/drivers/net/virtio/xsk.c
> +++ b/drivers/net/virtio/xsk.c
> @@ -115,6 +115,60 @@ bool virtnet_xsk_xmit(struct virtnet_sq *sq, struct 
> xsk_buff_pool *pool,
> return sent == budget;
>  }
>
> +static void virtnet_remote_napi_schedule(void *info)
> +{
> +   struct virtnet_sq *sq = info;
> +
> +   virtnet_vq_napi_schedule(&sq->napi, sq->vq);
> +}
> +
> +static void virtnet_remote_raise_napi(struct virtnet_sq *sq)
> +{
> +   u32 last_cpu, cur_cpu;
> +
> +   last_cpu = sq->xsk.last_cpu;
> +   cur_cpu = get_cpu();
> +
> +   /* On remote cpu, softirq will run automatically when ipi irq exit. On
> +* local cpu, smp_call_xxx will not trigger ipi interrupt, then 
> softirq
> +* cannot be triggered automatically. So Call local_bh_enable after to
> +* trigger softIRQ processing.
> +*/
> +   if (last_cpu == cur_cpu) {
> +   local_bh_disable();
> +   virtnet_vq_napi_schedule(&sq->napi, sq->vq);
> +   local_bh_enable();
> +   } else {
> +   if (spin_trylock(&sq->xsk.ipi_lock)) {
> +   smp_call_function_single_async(last_cpu, 
> &sq->xsk.csd);
> +   spin_unlock(&sq->xsk.ipi_lock);
> +   }
> +   }

Is there any number to show whether it's worth it for an IPI here? For
example, GVE doesn't do this.

Thanks


> +
> +   put_cpu();
> +}
> +
> +int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
> +{
> +   struct virtnet_info *vi = netdev_priv(dev);
> +   struct virtnet_sq *sq;
> +
> +   if (!netif_running(dev))
> +   return -ENETDOWN;
> +
> +   if (qid >= vi->curr_queue_pairs)
> +   return -EINVAL;
> +
> +   sq = &vi->sq[qid];
> +
> +   if (napi_if_scheduled_mark_missed(&sq->napi))
> +   return 0;
> +
> +   virtnet_remote_raise_napi(sq);
> +
> +   return 0;
> +}
> +
>  static int virtnet_rq_bind_xsk_pool(struct virtnet_info *vi, struct 
> virtnet_rq *rq,
> struct xsk_buff_pool *pool)
>  {
> @@ -240,6 +294,9 @@ static int virtnet_xsk_pool_enable(struct net_device *dev,
>
> sq->xsk.hdr_dma_address = hdr_dma;
>
> +   INIT_CSD(&sq->xsk.csd, virtnet_remote_napi_schedule, sq);
> +   spin_lock_init(&sq->xsk.ipi_lock);
> +
> return 0;
>
>  err_sq:
> diff --git a/drivers/net/virtio/xsk.h b/drivers/net/virtio/xsk.h
> index 73ca8cd5308b..1bd19dcda649 100644
> --- a/drivers/net/virtio/xsk.h
> +++ b/drivers/net/virtio/xsk.h
> @@ -17,4

Re: [PATCH net-next v1 11/19] virtio_net: xsk: tx: support tx

2023-10-19 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> The driver's tx napi is very important for XSK. It is responsible for
> obtaining data from the XSK queue and sending it out.
>
> At the beginning, we need to trigger tx napi.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/main.c   |  18 +-
>  drivers/net/virtio/virtio_net.h |   3 +-
>  drivers/net/virtio/xsk.c| 108 
>  drivers/net/virtio/xsk.h|  13 
>  4 files changed, 140 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index b320770e5f4e..a08429bef61f 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -2054,7 +2054,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, 
> int budget)
> struct virtnet_sq *sq = container_of(napi, struct virtnet_sq, napi);
> struct virtnet_info *vi = sq->vq->vdev->priv;
> unsigned int index = vq2txq(sq->vq);
> +   struct xsk_buff_pool *pool;
> struct netdev_queue *txq;
> +   int busy = 0;
> int opaque;
> bool done;
>
> @@ -2067,11 +2069,25 @@ static int virtnet_poll_tx(struct napi_struct *napi, 
> int budget)
> txq = netdev_get_tx_queue(vi->dev, index);
> __netif_tx_lock(txq, raw_smp_processor_id());
> virtqueue_disable_cb(sq->vq);
> -   free_old_xmit(sq, true);
> +
> +   rcu_read_lock();
> +   pool = rcu_dereference(sq->xsk.pool);
> +   if (pool) {
> +   busy |= virtnet_xsk_xmit(sq, pool, budget);
> +   rcu_read_unlock();
> +   } else {
> +   rcu_read_unlock();
> +   free_old_xmit(sq, true);
> +   }
>
> if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> netif_tx_wake_queue(txq);
>
> +   if (busy) {
> +   __netif_tx_unlock(txq);
> +   return budget;
> +   }
> +
> opaque = virtqueue_enable_cb_prepare(sq->vq);
>
> done = napi_complete_done(napi, 0);
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index 9e69b6c5921b..3bbb1f5baad5 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -9,7 +9,8 @@
>  #include 
>
>  #define VIRTIO_XDP_FLAGBIT(0)
> -#define VIRTIO_XMIT_DATA_MASK (VIRTIO_XDP_FLAG)
> +#define VIRTIO_XSK_FLAGBIT(1)
> +#define VIRTIO_XMIT_DATA_MASK (VIRTIO_XDP_FLAG | VIRTIO_XSK_FLAG)
>
>  /* RX packet size EWMA. The average packet size is used to determine the 
> packet
>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> index 01962a3f..0e775a9d270f 100644
> --- a/drivers/net/virtio/xsk.c
> +++ b/drivers/net/virtio/xsk.c
> @@ -7,6 +7,114 @@
>
>  static struct virtio_net_hdr_mrg_rxbuf xsk_hdr;
>
> +static void sg_fill_dma(struct scatterlist *sg, dma_addr_t addr, u32 len)
> +{
> +   sg->dma_address = addr;
> +   sg->length = len;
> +}
> +
> +static void virtnet_xsk_check_queue(struct virtnet_sq *sq)
> +{
> +   struct virtnet_info *vi = sq->vq->vdev->priv;
> +   struct net_device *dev = vi->dev;
> +   int qnum = sq - vi->sq;
> +
> +   /* If it is a raw buffer queue, it does not check whether the status
> +* of the queue is stopped when sending. So there is no need to check
> +* the situation of the raw buffer queue.
> +*/
> +   if (virtnet_is_xdp_raw_buffer_queue(vi, qnum))
> +   return;
> +
> +   /* If this sq is not the exclusive queue of the current cpu,
> +* then it may be called by start_xmit, so check it running out
> +* of space.
> +*
> +* Stop the queue to avoid getting packets that we are
> +* then unable to transmit. Then wait the tx interrupt.
> +*/
> +   if (sq->vq->num_free < 2 + MAX_SKB_FRAGS)
> +   netif_stop_subqueue(dev, qnum);
> +}
> +
> +static int virtnet_xsk_xmit_one(struct virtnet_sq *sq,
> +   struct xsk_buff_pool *pool,
> +   struct xdp_desc *desc)
> +{
> +   struct virtnet_info *vi;
> +   dma_addr_t addr;
> +
> +   vi = sq->vq->vdev->priv;
> +
> +   addr = xsk_buff_raw_get_dma(pool, desc->addr);
> +   xsk_buff_raw_dma_sync_for_device(pool, addr, desc->len);
> +
> +   sg_init_table(sq->sg, 2);
> +
> +   sg_fill_dma(sq->sg, sq->xsk.hdr_dma_address, vi->hdr_len);
> +   sg_fill_dma(sq->sg + 1, addr, desc->len);
> +
> +   return virtqueue_add_outbuf(sq->vq, sq->sg, 2,
> +   virtnet_xsk_to_ptr(desc->len), 
> GFP_ATOMIC);
> +}
> +
> +static int virtnet_xsk_xmit_batch(struct virtnet_sq *sq,
> + struct xsk_buff_pool *pool,
> + unsigned int budget,
> + struct virtnet_sq_stats *stats)
> +{
> +   str

Re: [PATCH net-next v1 10/19] virtio_net: xsk: prevent disable tx napi

2023-10-19 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> Since xsk's TX queue is consumed by TX NAPI, if sq is bound to xsk, then
> we must stop tx napi from being disabled.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks


> ---
>  drivers/net/virtio/main.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 38733a782f12..b320770e5f4e 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -3203,7 +3203,7 @@ static int virtnet_set_coalesce(struct net_device *dev,
> struct netlink_ext_ack *extack)
>  {
> struct virtnet_info *vi = netdev_priv(dev);
> -   int ret, queue_number, napi_weight;
> +   int ret, queue_number, napi_weight, i;
> bool update_napi = false;
>
> /* Can't change NAPI weight if the link is up */
> @@ -3232,6 +3232,14 @@ static int virtnet_set_coalesce(struct net_device *dev,
> return ret;
>
> if (update_napi) {
> +   /* xsk xmit depends on the tx napi. So if xsk is active,
> +* prevent modifications to tx napi.
> +*/
> +   for (i = queue_number; i < vi->max_queue_pairs; i++) {
> +   if (rtnl_dereference(vi->sq[i].xsk.pool))
> +   return -EBUSY;
> +   }
> +
> for (; queue_number < vi->max_queue_pairs; queue_number++)
> vi->sq[queue_number].napi.weight = napi_weight;
> }
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v1 09/19] virtio_net: xsk: bind/unbind xsk

2023-10-19 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> This patch implement the logic of bind/unbind xsk pool to sq and rq.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/Makefile |   2 +-
>  drivers/net/virtio/main.c   |  10 +-
>  drivers/net/virtio/virtio_net.h |  18 
>  drivers/net/virtio/xsk.c| 186 
>  drivers/net/virtio/xsk.h|   7 ++
>  5 files changed, 216 insertions(+), 7 deletions(-)
>  create mode 100644 drivers/net/virtio/xsk.c
>  create mode 100644 drivers/net/virtio/xsk.h
>
> diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> index 15ed7c97fd4f..8c2a884d2dba 100644
> --- a/drivers/net/virtio/Makefile
> +++ b/drivers/net/virtio/Makefile
> @@ -5,4 +5,4 @@
>
>  obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
>
> -virtio_net-y := main.o
> +virtio_net-y := main.o xsk.o
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 02d27101fef1..38733a782f12 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -8,7 +8,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -139,9 +138,6 @@ struct virtio_net_common_hdr {
> };
>  };
>
> -static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> -static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> -
>  static void *xdp_to_ptr(struct xdp_frame *ptr)
>  {
> return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG);
> @@ -3664,6 +3660,8 @@ static int virtnet_xdp(struct net_device *dev, struct 
> netdev_bpf *xdp)
> switch (xdp->command) {
> case XDP_SETUP_PROG:
> return virtnet_xdp_set(dev, xdp->prog, xdp->extack);
> +   case XDP_SETUP_XSK_POOL:
> +   return virtnet_xsk_pool_setup(dev, xdp);
> default:
> return -EINVAL;
> }
> @@ -3849,7 +3847,7 @@ static void free_receive_page_frags(struct virtnet_info 
> *vi)
> }
>  }
>
> -static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
> +void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
>  {
> if (!virtnet_is_xdp_frame(buf))
> dev_kfree_skb(buf);
> @@ -3857,7 +3855,7 @@ static void virtnet_sq_free_unused_buf(struct virtqueue 
> *vq, void *buf)
> xdp_return_frame(virtnet_ptr_to_xdp(buf));
>  }
>
> -static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf)
> +void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf)
>  {
> struct virtnet_info *vi = vq->vdev->priv;
> int i = vq2rxq(vq);
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index cc742756e19a..9e69b6c5921b 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -5,6 +5,8 @@
>
>  #include 
>  #include 
> +#include 
> +#include 
>
>  #define VIRTIO_XDP_FLAGBIT(0)
>  #define VIRTIO_XMIT_DATA_MASK (VIRTIO_XDP_FLAG)
> @@ -94,6 +96,11 @@ struct virtnet_sq {
> bool do_dma;
>
> struct virtnet_sq_dma_head dmainfo;
> +   struct {
> +   struct xsk_buff_pool __rcu *pool;
> +
> +   dma_addr_t hdr_dma_address;
> +   } xsk;
>  };
>
>  /* Internal representation of a receive virtqueue */
> @@ -134,6 +141,13 @@ struct virtnet_rq {
>
> /* Do dma by self */
> bool do_dma;
> +
> +   struct {
> +   struct xsk_buff_pool __rcu *pool;
> +
> +   /* xdp rxq used by xsk */
> +   struct xdp_rxq_info xdp_rxq;
> +   } xsk;
>  };
>
>  struct virtnet_info {
> @@ -218,6 +232,8 @@ struct virtnet_info {
> struct failover *failover;
>  };
>
> +#include "xsk.h"
> +

Any reason we don't do it with other headers?

>  static inline bool virtnet_is_xdp_frame(void *ptr)
>  {
> return (unsigned long)ptr & VIRTIO_XDP_FLAG;
> @@ -308,4 +324,6 @@ void virtnet_rx_pause(struct virtnet_info *vi, struct 
> virtnet_rq *rq);
>  void virtnet_rx_resume(struct virtnet_info *vi, struct virtnet_rq *rq);
>  void virtnet_tx_pause(struct virtnet_info *vi, struct virtnet_sq *sq);
>  void virtnet_tx_resume(struct virtnet_info *vi, struct virtnet_sq *sq);
> +void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> +void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
>  #endif
> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> new file mode 100644
> index ..01962a3f
> --- /dev/null
> +++ b/drivers/net/virtio/xsk.c
> @@ -0,0 +1,186 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * virtio-net xsk
> + */
> +
> +#include "virtio_net.h"
> +
> +static struct virtio_net_hdr_mrg_rxbuf xsk_hdr;
> +
> +static int virtnet_rq_bind_xsk_pool(struct virtnet_info *vi, struct 
> virtnet_rq *rq,
> +   struct xsk_buff_pool *pool)
> +{
> +   int err, qindex;
> +
> +   qindex = rq - vi->rq;
> +
> +   if (poo

Re: [PATCH net-next v1 08/19] virtio_net: sq support premapped mode

2023-10-19 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> If the xsk is enabling, the xsk tx will share the send queue.
> But the xsk requires that the send queue use the premapped mode.
> So the send queue must support premapped mode.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/main.c   | 108 
>  drivers/net/virtio/virtio_net.h |  54 +++-
>  2 files changed, 149 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 8da84ea9bcbe..02d27101fef1 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -514,20 +514,104 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, 
> u32 size, gfp_t gfp)
> return buf;
>  }
>
> -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> +static int virtnet_sq_set_premapped(struct virtnet_sq *sq)
>  {
> -   int i;
> +   struct virtnet_sq_dma *d;
> +   int err, size, i;
>
> -   /* disable for big mode */
> -   if (!vi->mergeable_rx_bufs && vi->big_packets)
> -   return;

Not specific to this patch but any plan to fix the big mode?


> +   size = virtqueue_get_vring_size(sq->vq);
> +
> +   size += MAX_SKB_FRAGS + 2;
> +
> +   sq->dmainfo.head = kcalloc(size, sizeof(*sq->dmainfo.head), 
> GFP_KERNEL);
> +   if (!sq->dmainfo.head)
> +   return -ENOMEM;
> +
> +   err = virtqueue_set_dma_premapped(sq->vq);
> +   if (err) {
> +   kfree(sq->dmainfo.head);
> +   return err;
> +   }
> +
> +   sq->dmainfo.free = NULL;
> +
> +   sq->do_dma = true;
> +
> +   for (i = 0; i < size; ++i) {
> +   d = &sq->dmainfo.head[i];
> +
> +   d->next = sq->dmainfo.free;
> +   sq->dmainfo.free = d;
> +   }
> +
> +   return 0;
> +}
> +
> +static void virtnet_set_premapped(struct virtnet_info *vi)
> +{
> +   int i;
>
> for (i = 0; i < vi->max_queue_pairs; i++) {
> -   if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> +   if (!virtnet_sq_set_premapped(&vi->sq[i]))
> +   vi->sq[i].do_dma = true;
> +
> +   /* disable for big mode */
> +   if (!vi->mergeable_rx_bufs && vi->big_packets)
> continue;
>
> -   vi->rq[i].do_dma = true;
> +   if (!virtqueue_set_dma_premapped(vi->rq[i].vq))
> +   vi->rq[i].do_dma = true;
> +   }
> +}
> +
> +static struct virtnet_sq_dma *virtnet_sq_map_sg(struct virtnet_sq *sq, int 
> nents, void *data)
> +{
> +   struct virtnet_sq_dma *d, *head;
> +   struct scatterlist *sg;
> +   int i;
> +
> +   head = NULL;
> +
> +   for_each_sg(sq->sg, sg, nents, i) {
> +   sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, 
> sg_virt(sg),
> +sg->length,
> +
> DMA_TO_DEVICE, 0);
> +   if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> +   goto err;
> +
> +   d = sq->dmainfo.free;
> +   sq->dmainfo.free = d->next;
> +
> +   d->addr = sg->dma_address;
> +   d->len = sg->length;
> +
> +   d->next = head;
> +   head = d;

It's really a pity that we need to duplicate those DMA metata twice.
Could we invent a new API to just fetch it from the virtio core?

> +   }
> +
> +   head->data = data;
> +
> +   return (void *)((unsigned long)head | ((unsigned long)data & 
> VIRTIO_XMIT_DATA_MASK));

If we packed everything into dmainfo, we can leave the type (XDP vs
skb) there to avoid trick like packing it into the pointer here?

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-19 Thread Jason Wang

On Fri, Oct 20, 2023 at 6:28 AM Si-Wei Liu  wrote:
>
>
>
> On 10/19/2023 7:39 AM, Eugenio Perez Martin wrote:
> > On Thu, Oct 19, 2023 at 10:27 AM Jason Wang  wrote:
> >> On Thu, Oct 19, 2023 at 2:47 PM Si-Wei Liu  wrote:
> >>>
> >>>
> >>> On 10/18/2023 7:53 PM, Jason Wang wrote:
> >>>> On Wed, Oct 18, 2023 at 4:49 PM Si-Wei Liu  wrote:
> >>>>>
> >>>>> On 10/18/2023 12:00 AM, Jason Wang wrote:
> >>>>>>> Unfortunately, it's a must to stick to ABI. I agree it's a mess but we
> >>>>>>> don't have a better choice. Or we can fail the probe if userspace
> >>>>>>> doesn't ack this feature.
> >>>>>> Antoher idea we can just do the following in vhost_vdpa reset?
> >>>>>>
> >>>>>> config->reset()
> >>>>>> if (IOTLB_PERSIST is not set) {
> >>>>>>config->reset_map()
> >>>>>> }
> >>>>>>
> >>>>>> Then we don't have the burden to maintain them in the parent?
> >>>>>>
> >>>>>> Thanks
> >>>>> Please see my earlier response in the other email, thanks.
> >>>>>
> >>>>> %<%<
> >>>>>
> >>>>> First, the ideal fix would be to leave this reset_vendor_mappings()
> >>>>> emulation code on the individual driver itself, which already has the
> >>>>> broken behavior.
> >>>> So the point is, not about whether the existing behavior is "broken"
> >>>> or not.
> >>> Hold on, I thought earlier we all agreed upon that the existing behavior
> >>> of vendor driver self-clearing maps during .reset violates the vhost
> >>> iotlb abstraction and also breaks the .set_map/.dma_map API. This is
> >>> 100% buggy driver implementation itself that we should discourage or
> >>> eliminate as much as possible (that's part of the goal for this series),
> >> I'm not saying it's not an issue, what I'm saying is, if the fix
> >> breaks another userspace, it's a new bug in the kernel. See what Linus
> >> said in [1]
> >>
> >> "If a change results in user programs breaking, it's a bug in the kernel."
> >>
> >>> but here you seem to go existentialism and suggests the very opposite
> >>> that every .set_map/.dma_map driver implementation, regardless being the
> >>> current or the new/upcoming, should unconditionally try to emulate the
> >>> broken reset behavior for the sake of not breaking older userspace.
> >> Such "emulation" is not done at the parent level. New parents just
> >> need to implement reset_map() or not. everything could be done inside
> >> vhost-vDPA as pseudo code that is shown above.
> >>
> >>> Set
> >>> aside the criteria and definition for how userspace can be broken, can
> >>> we step back to the original question why we think it's broken, and what
> >>> we can do to promote good driver implementation instead of discuss the
> >>> implementation details?
> >> I'm not sure I get the point of this question. I'm not saying we don't
> >> need to fix, what I am saying is that such a fix must be done in a
> >> negotiable way. And it's better if parents won't get any burden. It
> >> can just decide to implement reset_map() or not.
> >>
> >>> Reading the below response I found my major
> >>> points are not heard even if written for quite a few times.
> >> I try my best to not ignore any important things, but I can't promise
> >> I will not miss any. I hope the above clarifies my points.
> >>
> >>> It's not
> >>> that I don't understand the importance of not breaking old userspace, I
> >>> appreciate your questions and extra patience, however I do feel the
> >>> "broken" part is very relevant to our discussion here.
> >>> If it's broken (in the sense of vhost IOTLB API) that you agree, I think
> >>> we should at least allow good driver implementations; and when you think
> >>> about the possibility of those valid good driver cases
> >>> (.set_map/.dma_map implementations that do not clear maps in .reset),
> >>> you might be able to see why it&#x

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-19 Thread Jason Wang

On Thu, Oct 19, 2023 at 2:47 PM Si-Wei Liu  wrote:
>
>
>
> On 10/18/2023 7:53 PM, Jason Wang wrote:
> > On Wed, Oct 18, 2023 at 4:49 PM Si-Wei Liu  wrote:
> >>
> >>
> >> On 10/18/2023 12:00 AM, Jason Wang wrote:
> >>>> Unfortunately, it's a must to stick to ABI. I agree it's a mess but we
> >>>> don't have a better choice. Or we can fail the probe if userspace
> >>>> doesn't ack this feature.
> >>> Antoher idea we can just do the following in vhost_vdpa reset?
> >>>
> >>> config->reset()
> >>> if (IOTLB_PERSIST is not set) {
> >>>   config->reset_map()
> >>> }
> >>>
> >>> Then we don't have the burden to maintain them in the parent?
> >>>
> >>> Thanks
> >> Please see my earlier response in the other email, thanks.
> >>
> >> %<%<
> >>
> >> First, the ideal fix would be to leave this reset_vendor_mappings()
> >> emulation code on the individual driver itself, which already has the
> >> broken behavior.
> > So the point is, not about whether the existing behavior is "broken"
> > or not.
> Hold on, I thought earlier we all agreed upon that the existing behavior
> of vendor driver self-clearing maps during .reset violates the vhost
> iotlb abstraction and also breaks the .set_map/.dma_map API. This is
> 100% buggy driver implementation itself that we should discourage or
> eliminate as much as possible (that's part of the goal for this series),

I'm not saying it's not an issue, what I'm saying is, if the fix
breaks another userspace, it's a new bug in the kernel. See what Linus
said in [1]

"If a change results in user programs breaking, it's a bug in the kernel."

> but here you seem to go existentialism and suggests the very opposite
> that every .set_map/.dma_map driver implementation, regardless being the
> current or the new/upcoming, should unconditionally try to emulate the
> broken reset behavior for the sake of not breaking older userspace.

Such "emulation" is not done at the parent level. New parents just
need to implement reset_map() or not. everything could be done inside
vhost-vDPA as pseudo code that is shown above.

> Set
> aside the criteria and definition for how userspace can be broken, can
> we step back to the original question why we think it's broken, and what
> we can do to promote good driver implementation instead of discuss the
> implementation details?

I'm not sure I get the point of this question. I'm not saying we don't
need to fix, what I am saying is that such a fix must be done in a
negotiable way. And it's better if parents won't get any burden. It
can just decide to implement reset_map() or not.

> Reading the below response I found my major
> points are not heard even if written for quite a few times.

I try my best to not ignore any important things, but I can't promise
I will not miss any. I hope the above clarifies my points.

> It's not
> that I don't understand the importance of not breaking old userspace, I
> appreciate your questions and extra patience, however I do feel the
> "broken" part is very relevant to our discussion here.
> If it's broken (in the sense of vhost IOTLB API) that you agree, I think
> we should at least allow good driver implementations; and when you think
> about the possibility of those valid good driver cases
> (.set_map/.dma_map implementations that do not clear maps in .reset),
> you might be able to see why it's coded the way as it is now.
>
> >   It's about whether we could stick to the old behaviour without
> > too much cost. And I believe we could.
> >
> > And just to clarify here, reset_vendor_mappings() = config->reset_map()
> >
> >> But today there's no backend feature negotiation
> >> between vhost-vdpa and the parent driver. Do we want to send down the
> >> acked_backend_features to parent drivers?
> > There's no need to do that with the above code, or anything I missed here?
> >
> > config->reset()
> > if (IOTLB_PERSIST is not set) {
> >config->reset_map()
> > }
> Implementation issue: this implies reset_map() has to be there for every
> .set_map implementations, but vendor driver implementation for custom
> IOMMU could well implement DMA ops by itself instead of .reset_map. This
> won't work for every set_map driver (think about the vduse case).

Well let me do it once again, reset_map() is not mandated:

config->r

Re: [PATCH net-next v1 07/19] virtio_net: separate virtnet_tx_resize()

2023-10-18 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo  wrote:
>
> This patch separates two sub-functions from virtnet_tx_resize():
>
> * virtnet_tx_pause
> * virtnet_tx_resume
>
> Then the subsequent virtnet_tx_reset() can share these two functions.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/net/virtio/main.c   | 35 +++--
>  drivers/net/virtio/virtio_net.h |  2 ++
>  2 files changed, 31 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index e6b262341619..8da84ea9bcbe 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -2156,12 +2156,11 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
> return err;
>  }
>
> -static int virtnet_tx_resize(struct virtnet_info *vi,
> -struct virtnet_sq *sq, u32 ring_num)
> +void virtnet_tx_pause(struct virtnet_info *vi, struct virtnet_sq *sq)
>  {
> bool running = netif_running(vi->dev);
> struct netdev_queue *txq;
> -   int err, qindex;
> +   int qindex;
>
> qindex = sq - vi->sq;
>
> @@ -2182,10 +2181,17 @@ static int virtnet_tx_resize(struct virtnet_info *vi,
> netif_stop_subqueue(vi->dev, qindex);
>
> __netif_tx_unlock_bh(txq);
> +}
>
> -   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> -   if (err)
> -   netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: 
> %d\n", qindex, err);
> +void virtnet_tx_resume(struct virtnet_info *vi, struct virtnet_sq *sq)
> +{
> +   bool running = netif_running(vi->dev);
> +   struct netdev_queue *txq;
> +   int qindex;
> +
> +   qindex = sq - vi->sq;
> +
> +   txq = netdev_get_tx_queue(vi->dev, qindex);
>
> __netif_tx_lock_bh(txq);
> sq->reset = false;
> @@ -2194,6 +2200,23 @@ static int virtnet_tx_resize(struct virtnet_info *vi,
>
> if (running)
> virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> +}
> +
> +static int virtnet_tx_resize(struct virtnet_info *vi, struct virtnet_sq *sq,
> +u32 ring_num)
> +{
> +   int qindex, err;
> +
> +   qindex = sq - vi->sq;
> +
> +   virtnet_tx_pause(vi, sq);
> +
> +   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
> +   if (err)
> +   netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: 
> %d\n", qindex, err);
> +
> +   virtnet_tx_resume(vi, sq);
> +
> return err;
>  }
>
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index 70eea23adba6..2f930af35364 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -256,4 +256,6 @@ static inline bool virtnet_is_xdp_raw_buffer_queue(struct 
> virtnet_info *vi, int
>
>  void virtnet_rx_pause(struct virtnet_info *vi, struct virtnet_rq *rq);
>  void virtnet_rx_resume(struct virtnet_info *vi, struct virtnet_rq *rq);
> +void virtnet_tx_pause(struct virtnet_info *vi, struct virtnet_sq *sq);
> +void virtnet_tx_resume(struct virtnet_info *vi, struct virtnet_sq *sq);
>  #endif
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v1 06/19] virtio_net: separate virtnet_rx_resize()

2023-10-18 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo  wrote:
>
> This patch separates two sub-functions from virtnet_rx_resize():
>
> * virtnet_rx_pause
> * virtnet_rx_resume
>
> Then the subsequent reset rx for xsk can share these two functions.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/net/virtio/main.c   | 29 +
>  drivers/net/virtio/virtio_net.h |  3 +++
>  2 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index ba38b6078e1d..e6b262341619 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -2120,26 +2120,39 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
> struct net_device *dev)
> return NETDEV_TX_OK;
>  }
>
> -static int virtnet_rx_resize(struct virtnet_info *vi,
> -struct virtnet_rq *rq, u32 ring_num)
> +void virtnet_rx_pause(struct virtnet_info *vi, struct virtnet_rq *rq)
>  {
> bool running = netif_running(vi->dev);
> -   int err, qindex;
> -
> -   qindex = rq - vi->rq;
>
> if (running)
> napi_disable(&rq->napi);
> +}
>
> -   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> -   if (err)
> -   netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: 
> %d\n", qindex, err);
> +void virtnet_rx_resume(struct virtnet_info *vi, struct virtnet_rq *rq)
> +{
> +   bool running = netif_running(vi->dev);
>
> if (!try_fill_recv(vi, rq, GFP_KERNEL))
> schedule_delayed_work(&vi->refill, 0);
>
> if (running)
> virtnet_napi_enable(rq->vq, &rq->napi);
> +}
> +
> +static int virtnet_rx_resize(struct virtnet_info *vi,
> +struct virtnet_rq *rq, u32 ring_num)
> +{
> +   int err, qindex;
> +
> +   qindex = rq - vi->rq;
> +
> +   virtnet_rx_pause(vi, rq);
> +
> +   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
> +   if (err)
> +   netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: 
> %d\n", qindex, err);
> +
> +   virtnet_rx_resume(vi, rq);
> return err;
>  }
>
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index 282504d6639a..70eea23adba6 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -253,4 +253,7 @@ static inline bool virtnet_is_xdp_raw_buffer_queue(struct 
> virtnet_info *vi, int
> else
> return false;
>  }
> +
> +void virtnet_rx_pause(struct virtnet_info *vi, struct virtnet_rq *rq);
> +void virtnet_rx_resume(struct virtnet_info *vi, struct virtnet_rq *rq);
>  #endif
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v1 05/19] virtio_net: add prefix virtnet to all struct/api inside virtio_net.h

2023-10-18 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> We move some structures and APIs to the header file, but these
> structures and APIs do not prefixed with virtnet. This patch adds
> virtnet for these.

What's the benefit of doing this? AFAIK virtio-net is the only user
for virtio-net.h?

THanks

>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/main.c   | 122 
>  drivers/net/virtio/virtio_net.h |  30 
>  2 files changed, 76 insertions(+), 76 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index d8b6c0d86f29..ba38b6078e1d 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -180,7 +180,7 @@ skb_vnet_common_hdr(struct sk_buff *skb)
>   * private is used to chain pages for big packets, put the whole
>   * most recent used list in the beginning for reuse
>   */
> -static void give_pages(struct receive_queue *rq, struct page *page)
> +static void give_pages(struct virtnet_rq *rq, struct page *page)
>  {
> struct page *end;
>
> @@ -190,7 +190,7 @@ static void give_pages(struct receive_queue *rq, struct 
> page *page)
> rq->pages = page;
>  }
>
> -static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> +static struct page *get_a_page(struct virtnet_rq *rq, gfp_t gfp_mask)
>  {
> struct page *p = rq->pages;
>
> @@ -225,7 +225,7 @@ static void virtqueue_napi_complete(struct napi_struct 
> *napi,
> opaque = virtqueue_enable_cb_prepare(vq);
> if (napi_complete_done(napi, processed)) {
> if (unlikely(virtqueue_poll(vq, opaque)))
> -   virtqueue_napi_schedule(napi, vq);
> +   virtnet_vq_napi_schedule(napi, vq);
> } else {
> virtqueue_disable_cb(vq);
> }
> @@ -240,7 +240,7 @@ static void skb_xmit_done(struct virtqueue *vq)
> virtqueue_disable_cb(vq);
>
> if (napi->weight)
> -   virtqueue_napi_schedule(napi, vq);
> +   virtnet_vq_napi_schedule(napi, vq);
> else
> /* We were probably waiting for more output buffers. */
> netif_wake_subqueue(vi->dev, vq2txq(vq));
> @@ -281,7 +281,7 @@ static struct sk_buff *virtnet_build_skb(void *buf, 
> unsigned int buflen,
>
>  /* Called from bottom half context */
>  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> -  struct receive_queue *rq,
> +  struct virtnet_rq *rq,
>struct page *page, unsigned int offset,
>unsigned int len, unsigned int truesize,
>unsigned int headroom)
> @@ -380,7 +380,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
> return skb;
>  }
>
> -static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len)
> +static void virtnet_rq_unmap(struct virtnet_rq *rq, void *buf, u32 len)
>  {
> struct page *page = virt_to_head_page(buf);
> struct virtnet_rq_dma *dma;
> @@ -409,7 +409,7 @@ static void virtnet_rq_unmap(struct receive_queue *rq, 
> void *buf, u32 len)
> put_page(page);
>  }
>
> -static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void 
> **ctx)
> +static void *virtnet_rq_get_buf(struct virtnet_rq *rq, u32 *len, void **ctx)
>  {
> void *buf;
>
> @@ -420,7 +420,7 @@ static void *virtnet_rq_get_buf(struct receive_queue *rq, 
> u32 *len, void **ctx)
> return buf;
>  }
>
> -static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> +static void *virtnet_rq_detach_unused_buf(struct virtnet_rq *rq)
>  {
> void *buf;
>
> @@ -431,7 +431,7 @@ static void *virtnet_rq_detach_unused_buf(struct 
> receive_queue *rq)
> return buf;
>  }
>
> -static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 
> len)
> +static void virtnet_rq_init_one_sg(struct virtnet_rq *rq, void *buf, u32 len)
>  {
> struct virtnet_rq_dma *dma;
> dma_addr_t addr;
> @@ -456,7 +456,7 @@ static void virtnet_rq_init_one_sg(struct receive_queue 
> *rq, void *buf, u32 len)
> rq->sg[0].length = len;
>  }
>
> -static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> +static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
>  {
> struct page_frag *alloc_frag = &rq->alloc_frag;
> struct virtnet_rq_dma *dma;
> @@ -530,11 +530,11 @@ static void virtnet_rq_set_premapped(struct 
> virtnet_info *vi)
> }
>  }
>
> -static void free_old_xmit(struct send_queue *sq, bool in_napi)
> +static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
>  {
> struct virtnet_sq_stats stats = {};
>
> -   __free_old_xmit(sq, in_napi, &stats);
> +   virtnet_free_old_xmit(sq, in_napi, &stats);
>
> /* Avoid overhead when no packets have been processed
>

Re: [PATCH net-next v1 04/19] virtio_net: move to virtio_net.h

2023-10-18 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo  wrote:
>
> Move some structure definitions and inline functions into the
> virtio_net.h file.

Some of the functions are not inline one before the moving. I'm not
sure what's the criteria to choose the function to be moved.


>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/main.c   | 252 +--
>  drivers/net/virtio/virtio_net.h | 256 
>  2 files changed, 258 insertions(+), 250 deletions(-)
>  create mode 100644 drivers/net/virtio/virtio_net.h
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 6cf77b6acdab..d8b6c0d86f29 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -6,7 +6,6 @@
>  //#define DEBUG
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -16,7 +15,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -24,6 +22,8 @@
>  #include 
>  #include 
>
> +#include "virtio_net.h"
> +
>  static int napi_weight = NAPI_POLL_WEIGHT;
>  module_param(napi_weight, int, 0444);
>
> @@ -45,15 +45,6 @@ module_param(napi_tx, bool, 0644);
>  #define VIRTIO_XDP_TX  BIT(0)
>  #define VIRTIO_XDP_REDIR   BIT(1)
>
> -#define VIRTIO_XDP_FLAGBIT(0)
> -
> -/* RX packet size EWMA. The average packet size is used to determine the 
> packet
> - * buffer size when refilling RX rings. As the entire RX ring may be refilled
> - * at once, the weight is chosen so that the EWMA will be insensitive to 
> short-
> - * term, transient changes in packet size.
> - */
> -DECLARE_EWMA(pkt_len, 0, 64)
> -
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>
>  static const unsigned long guest_offloads[] = {
> @@ -74,36 +65,6 @@ static const unsigned long guest_offloads[] = {
> (1ULL << VIRTIO_NET_F_GUEST_USO4) | \
> (1ULL << VIRTIO_NET_F_GUEST_USO6))
>
> -struct virtnet_stat_desc {
> -   char desc[ETH_GSTRING_LEN];
> -   size_t offset;
> -};
> -
> -struct virtnet_sq_stats {
> -   struct u64_stats_sync syncp;
> -   u64 packets;
> -   u64 bytes;
> -   u64 xdp_tx;
> -   u64 xdp_tx_drops;
> -   u64 kicks;
> -   u64 tx_timeouts;
> -};
> -
> -struct virtnet_rq_stats {
> -   struct u64_stats_sync syncp;
> -   u64 packets;
> -   u64 bytes;
> -   u64 drops;
> -   u64 xdp_packets;
> -   u64 xdp_tx;
> -   u64 xdp_redirects;
> -   u64 xdp_drops;
> -   u64 kicks;
> -};
> -
> -#define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m)
> -#define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m)
> -
>  static const struct virtnet_stat_desc virtnet_sq_stats_desc[] = {
> { "packets",VIRTNET_SQ_STAT(packets) },
> { "bytes",  VIRTNET_SQ_STAT(bytes) },
> @@ -127,80 +88,6 @@ static const struct virtnet_stat_desc 
> virtnet_rq_stats_desc[] = {
>  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
>  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
>
> -struct virtnet_interrupt_coalesce {
> -   u32 max_packets;
> -   u32 max_usecs;
> -};
> -
> -/* The dma information of pages allocated at a time. */
> -struct virtnet_rq_dma {
> -   dma_addr_t addr;
> -   u32 ref;
> -   u16 len;
> -   u16 need_sync;
> -};
> -
> -/* Internal representation of a send virtqueue */
> -struct send_queue {
> -   /* Virtqueue associated with this send _queue */
> -   struct virtqueue *vq;
> -
> -   /* TX: fragments + linear part + virtio header */
> -   struct scatterlist sg[MAX_SKB_FRAGS + 2];
> -
> -   /* Name of the send queue: output.$index */
> -   char name[16];
> -
> -   struct virtnet_sq_stats stats;
> -
> -   struct virtnet_interrupt_coalesce intr_coal;
> -
> -   struct napi_struct napi;
> -
> -   /* Record whether sq is in reset state. */
> -   bool reset;
> -};
> -
> -/* Internal representation of a receive virtqueue */
> -struct receive_queue {
> -   /* Virtqueue associated with this receive_queue */
> -   struct virtqueue *vq;
> -
> -   struct napi_struct napi;
> -
> -   struct bpf_prog __rcu *xdp_prog;
> -
> -   struct virtnet_rq_stats stats;
> -
> -   struct virtnet_interrupt_coalesce intr_coal;
> -
> -   /* Chain pages by the private ptr. */
> -   struct page *pages;
> -
> -   /* Average packet length for mergeable receive buffers. */
> -   struct ewma_pkt_len mrg_avg_pkt_len;
> -
> -   /* Page frag for packet buffer allocation. */
> -   struct page_frag alloc_frag;
> -
> -   /* RX: fragments + linear part + virtio header */
> -   struct scatterlist sg[MAX_SKB_FRAGS + 2];
> -
> -   /* Min single buffer size for mergeable buffers case. */
> -   unsigned int min_buf_len;
> -
> -   /* Name of this receive queue: input.$index */
> -   char name[16];
> -
> -

Re: [PATCH net-next v1 03/19] virtio_net: independent directory

2023-10-18 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:01 PM Xuan Zhuo  wrote:
>
> Create a separate directory for virtio-net. AF_XDP support will be added
> later, then a separate xsk.c file will be added, so we should create a
> directory for virtio-net.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  MAINTAINERS |  2 +-
>  drivers/net/Kconfig |  8 +---
>  drivers/net/Makefile|  2 +-
>  drivers/net/virtio/Kconfig  | 13 +
>  drivers/net/virtio/Makefile |  8 
>  drivers/net/{virtio_net.c => virtio/main.c} |  0
>  6 files changed, 24 insertions(+), 9 deletions(-)
>  create mode 100644 drivers/net/virtio/Kconfig
>  create mode 100644 drivers/net/virtio/Makefile
>  rename drivers/net/{virtio_net.c => virtio/main.c} (100%)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9c186c214c54..e4fbcbc100e3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22768,7 +22768,7 @@ F:  Documentation/devicetree/bindings/virtio/
>  F: Documentation/driver-api/virtio/
>  F: drivers/block/virtio_blk.c
>  F: drivers/crypto/virtio/
> -F: drivers/net/virtio_net.c
> +F: drivers/net/virtio/
>  F: drivers/vdpa/
>  F: drivers/virtio/
>  F: include/linux/vdpa.h
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 44eeb5d61ba9..54ee6fa4f4a6 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -430,13 +430,7 @@ config VETH
>   When one end receives the packet it appears on its pair and vice
>   versa.
>
> -config VIRTIO_NET
> -   tristate "Virtio network driver"
> -   depends on VIRTIO
> -   select NET_FAILOVER
> -   help
> - This is the virtual network driver for virtio.  It can be used with
> - QEMU based VMMs (like KVM or Xen).  Say Y or M.
> +source "drivers/net/virtio/Kconfig"
>
>  config NLMON
> tristate "Virtual netlink monitoring device"
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index e26f98f897c5..47537dd0f120 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -31,7 +31,7 @@ obj-$(CONFIG_NET_TEAM) += team/
>  obj-$(CONFIG_TUN) += tun.o
>  obj-$(CONFIG_TAP) += tap.o
>  obj-$(CONFIG_VETH) += veth.o
> -obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
> +obj-$(CONFIG_VIRTIO_NET) += virtio/
>  obj-$(CONFIG_VXLAN) += vxlan/
>  obj-$(CONFIG_GENEVE) += geneve.o
>  obj-$(CONFIG_BAREUDP) += bareudp.o
> diff --git a/drivers/net/virtio/Kconfig b/drivers/net/virtio/Kconfig
> new file mode 100644
> index ..d8ccb3ac49df
> --- /dev/null
> +++ b/drivers/net/virtio/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# virtio-net device configuration
> +#
> +config VIRTIO_NET
> +   tristate "Virtio network driver"
> +   depends on VIRTIO
> +   select NET_FAILOVER
> +   help
> + This is the virtual network driver for virtio.  It can be used with
> + QEMU based VMMs (like KVM or Xen).
> +
> + Say Y or M.
> diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> new file mode 100644
> index ..15ed7c97fd4f
> --- /dev/null
> +++ b/drivers/net/virtio/Makefile
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Makefile for the virtio network device drivers.
> +#
> +
> +obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
> +
> +virtio_net-y := main.o
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio/main.c
> similarity index 100%
> rename from drivers/net/virtio_net.c
> rename to drivers/net/virtio/main.c
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v1 02/19] virtio_net: unify the code for recycling the xmit ptr

2023-10-18 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo  wrote:
>
> There are two completely similar and independent implementations. This
> is inconvenient for the subsequent addition of new types. So extract a
> function from this piece of code and call this function uniformly to
> recover old xmit ptr.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/net/virtio_net.c | 76 +---
>  1 file changed, 33 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d87386d8220..6cf77b6acdab 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -352,6 +352,30 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
>  }
>
> +static void __free_old_xmit(struct send_queue *sq, bool in_napi,
> +   struct virtnet_sq_stats *stats)
> +{
> +   unsigned int len;
> +   void *ptr;
> +
> +   while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +   if (!is_xdp_frame(ptr)) {
> +   struct sk_buff *skb = ptr;
> +
> +   pr_debug("Sent skb %p\n", skb);
> +
> +   stats->bytes += skb->len;
> +   napi_consume_skb(skb, in_napi);
> +   } else {
> +   struct xdp_frame *frame = ptr_to_xdp(ptr);
> +
> +   stats->bytes += xdp_get_frame_len(frame);
> +   xdp_return_frame(frame);
> +   }
> +   stats->packets++;
> +   }
> +}
> +
>  /* Converting between virtqueue no. and kernel tx/rx queue no.
>   * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq
>   */
> @@ -746,37 +770,19 @@ static void virtnet_rq_set_premapped(struct 
> virtnet_info *vi)
>
>  static void free_old_xmit(struct send_queue *sq, bool in_napi)
>  {
> -   unsigned int len;
> -   unsigned int packets = 0;
> -   unsigned int bytes = 0;
> -   void *ptr;
> +   struct virtnet_sq_stats stats = {};
>
> -   while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> -   if (likely(!is_xdp_frame(ptr))) {
> -   struct sk_buff *skb = ptr;
> -
> -   pr_debug("Sent skb %p\n", skb);
> -
> -   bytes += skb->len;
> -   napi_consume_skb(skb, in_napi);
> -   } else {
> -   struct xdp_frame *frame = ptr_to_xdp(ptr);
> -
> -   bytes += xdp_get_frame_len(frame);
> -   xdp_return_frame(frame);
> -   }
> -   packets++;
> -   }
> +   __free_old_xmit(sq, in_napi, &stats);
>
> /* Avoid overhead when no packets have been processed
>  * happens when called speculatively from start_xmit.
>  */
> -   if (!packets)
> +   if (!stats.packets)
> return;
>
> u64_stats_update_begin(&sq->stats.syncp);
> -   sq->stats.bytes += bytes;
> -   sq->stats.packets += packets;
> +   sq->stats.bytes += stats.bytes;
> +   sq->stats.packets += stats.packets;
> u64_stats_update_end(&sq->stats.syncp);
>  }
>
> @@ -915,15 +921,12 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> int n, struct xdp_frame **frames, u32 flags)
>  {
> struct virtnet_info *vi = netdev_priv(dev);
> +   struct virtnet_sq_stats stats = {};
> struct receive_queue *rq = vi->rq;
> struct bpf_prog *xdp_prog;
> struct send_queue *sq;
> -   unsigned int len;
> -   int packets = 0;
> -   int bytes = 0;
> int nxmit = 0;
> int kicks = 0;
> -   void *ptr;
> int ret;
> int i;
>
> @@ -942,20 +945,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> }
>
> /* Free up any pending old buffers before queueing new ones. */
> -   while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> -   if (likely(is_xdp_frame(ptr))) {
> -   struct xdp_frame *frame = ptr_to_xdp(ptr);
> -
> -   bytes += xdp_get_frame_len(frame);
> -   xdp_return_frame(frame);
> -   } else {
> -   struct sk_buff *skb = ptr;
> -
> -   bytes += skb->len;
> -   napi_consume_skb(skb, false);
> -   }
> -   p

Re: [PATCH net-next v1 01/19] virtio_net: rename free_old_xmit_skbs to free_old_xmit

2023-10-18 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo  wrote:
>
> Since free_old_xmit_skbs not only deals with skb, but also xdp frame and
> subsequent added xsk, so change the name of this function to
> free_old_xmit.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/net/virtio_net.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index fe7f314d65c9..3d87386d8220 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -744,7 +744,7 @@ static void virtnet_rq_set_premapped(struct virtnet_info 
> *vi)
> }
>  }
>
> -static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> +static void free_old_xmit(struct send_queue *sq, bool in_napi)
>  {
> unsigned int len;
> unsigned int packets = 0;
> @@ -816,7 +816,7 @@ static void check_sq_full_and_disable(struct virtnet_info 
> *vi,
> virtqueue_napi_schedule(&sq->napi, sq->vq);
> } else if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
> /* More just got used, free them then recheck. */
> -   free_old_xmit_skbs(sq, false);
> +   free_old_xmit(sq, false);
> if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> netif_start_subqueue(dev, qnum);
> virtqueue_disable_cb(sq->vq);
> @@ -2124,7 +2124,7 @@ static void virtnet_poll_cleantx(struct receive_queue 
> *rq)
>
> do {
> virtqueue_disable_cb(sq->vq);
> -   free_old_xmit_skbs(sq, true);
> +   free_old_xmit(sq, true);
> } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>
> if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> @@ -2246,7 +2246,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, 
> int budget)
> txq = netdev_get_tx_queue(vi->dev, index);
> __netif_tx_lock(txq, raw_smp_processor_id());
> virtqueue_disable_cb(sq->vq);
> -   free_old_xmit_skbs(sq, true);
> +   free_old_xmit(sq, true);
>
> if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> netif_tx_wake_queue(txq);
> @@ -2336,7 +2336,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
> struct net_device *dev)
> if (use_napi)
> virtqueue_disable_cb(sq->vq);
>
> -   free_old_xmit_skbs(sq, false);
> +   free_old_xmit(sq, false);
>
> } while (use_napi && kick &&
>unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-18 Thread Jason Wang

On Wed, Oct 18, 2023 at 4:49 PM Si-Wei Liu  wrote:
>
>
>
> On 10/18/2023 12:00 AM, Jason Wang wrote:
> >> Unfortunately, it's a must to stick to ABI. I agree it's a mess but we
> >> don't have a better choice. Or we can fail the probe if userspace
> >> doesn't ack this feature.
> > Antoher idea we can just do the following in vhost_vdpa reset?
> >
> > config->reset()
> > if (IOTLB_PERSIST is not set) {
> >  config->reset_map()
> > }
> >
> > Then we don't have the burden to maintain them in the parent?
> >
> > Thanks
> Please see my earlier response in the other email, thanks.
>
> %<%<
>
> First, the ideal fix would be to leave this reset_vendor_mappings()
> emulation code on the individual driver itself, which already has the
> broken behavior.

So the point is, not about whether the existing behavior is "broken"
or not. It's about whether we could stick to the old behaviour without
too much cost. And I believe we could.

And just to clarify here, reset_vendor_mappings() = config->reset_map()

> But today there's no backend feature negotiation
> between vhost-vdpa and the parent driver. Do we want to send down the
> acked_backend_features to parent drivers?

There's no need to do that with the above code, or anything I missed here?

config->reset()
if (IOTLB_PERSIST is not set) {
  config->reset_map()
}

>
> Second, IOTLB_PERSIST is needed but not sufficient. Due to lack of
> backend feature negotiation in parent driver, if vhost-vdpa has to
> provide the old-behaviour emulation for compatibility on driver's
> behalf, it needs to be done per-driver basis. There could be good
> on-chip or vendor IOMMU implementation which doesn't clear the IOTLB in
> .reset, and vendor specific IOMMU doesn't have to provide .reset_map,

Then we just don't offer IOTLB_PRESIST, isn't this by design?

> we
> should allow these good driver implementations rather than
> unconditionally stick to some specific problematic behavior for every
> other good driver.

Then you can force reset_map() with set_map() that is what I suggest
in another thread, no?

> Then we need a set of device flags (backend_features
> bit again?) to indicate the specific driver needs upper layer's help on
> old-behaviour emulation.
>
> Last but not least, I'm not sure how to properly emulate
> reset_vendor_mappings() from vhost-vdpa layer. If a vendor driver has no
> .reset_map op implemented, or if .reset_map has a slightly different
> implementation than what it used to reset the iotlb in the .reset op,

See above, for reset_vendor_mappings() I meant config->reset_map() exactly.

Thanks

> then this either becomes effectively dead code if no one ends up using,
> or the vhost-vdpa emulation is helpless and limited in scope, unable to
> cover all the cases.
>
> %<%<
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-18 Thread Jason Wang

On Thu, Oct 19, 2023 at 7:21 AM Si-Wei Liu  wrote:
>
>
>
> On 10/18/2023 4:14 AM, Eugenio Perez Martin wrote:
> > On Wed, Oct 18, 2023 at 10:44 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 10/17/2023 10:27 PM, Jason Wang wrote:
> >>> On Wed, Oct 18, 2023 at 12:36 PM Si-Wei Liu  wrote:
> >>>>
> >>>> On 10/16/2023 7:35 PM, Jason Wang wrote:
> >>>>> On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu  
> >>>>> wrote:
> >>>>>> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
> >>>>>>> On Mon, Oct 16, 2023 at 8:33 AM Jason Wang  
> >>>>>>> wrote:
> >>>>>>>> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu  
> >>>>>>>> wrote:
> >>>>>>>>> On 10/12/2023 8:01 PM, Jason Wang wrote:
> >>>>>>>>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  
> >>>>>>>>>> wrote:
> >>>>>>>>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> >>>>>>>>>>> may need to restore iotlb mapping to the initial or default state
> >>>>>>>>>>> using the .reset_map op, as it's desirable for some parent devices
> >>>>>>>>>>> to solely manipulate mappings by its own, independent of virtio 
> >>>>>>>>>>> device
> >>>>>>>>>>> state. For instance, device reset does not cause mapping go away 
> >>>>>>>>>>> on
> >>>>>>>>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> >>>>>>>>>>> is going away, give them a chance to reset iotlb back to the 
> >>>>>>>>>>> initial
> >>>>>>>>>>> state in vhost_vdpa_cleanup().
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Si-Wei Liu 
> >>>>>>>>>>> ---
> >>>>>>>>>>>   drivers/vhost/vdpa.c | 16 
> >>>>>>>>>>>   1 file changed, 16 insertions(+)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >>>>>>>>>>> index 851535f..a3f8160 100644
> >>>>>>>>>>> --- a/drivers/vhost/vdpa.c
> >>>>>>>>>>> +++ b/drivers/vhost/vdpa.c
> >>>>>>>>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
> >>>>>>>>>>> *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> >>>>>>>>>>>  return vhost_vdpa_alloc_as(v, asid);
> >>>>>>>>>>>   }
> >>>>>>>>>>>
> >>>>>>>>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> >>>>>>>>>>> +{
> >>>>>>>>>>> +   struct vdpa_device *vdpa = v->vdpa;
> >>>>>>>>>>> +   const struct vdpa_config_ops *ops = vdpa->config;
> >>>>>>>>>>> +
> >>>>>>>>>>> +   if (ops->reset_map)
> >>>>>>>>>>> +   ops->reset_map(vdpa, asid);
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>>   static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 
> >>>>>>>>>>> asid)
> >>>>>>>>>>>   {
> >>>>>>>>>>>  struct vhost_vdpa_as *as = asid_to_as(v, asid);
> >>>>>>>>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct 
> >>>>>>>>>>> vhost_vdpa *v, u32 asid)
> >>>>>>>>>>>
> >>>>>>>>>>>  hlist_del(&as->hash_link);
> >>>>>>>>>>>  vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 
> >>>>>>>>>>> 1, asid);
> >>>>>>>>>>> +   /*
> >>>>>>>>>>> +

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-18 Thread Jason Wang

On Wed, Oct 18, 2023 at 1:27 PM Jason Wang  wrote:
>
> On Wed, Oct 18, 2023 at 12:36 PM Si-Wei Liu  wrote:
> >
> >
> >
> > On 10/16/2023 7:35 PM, Jason Wang wrote:
> > > On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu  wrote:
> > >>
> > >>
> > >> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
> > >>> On Mon, Oct 16, 2023 at 8:33 AM Jason Wang  wrote:
> > >>>> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu  
> > >>>> wrote:
> > >>>>>
> > >>>>> On 10/12/2023 8:01 PM, Jason Wang wrote:
> > >>>>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  
> > >>>>>> wrote:
> > >>>>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> > >>>>>>> may need to restore iotlb mapping to the initial or default state
> > >>>>>>> using the .reset_map op, as it's desirable for some parent devices
> > >>>>>>> to solely manipulate mappings by its own, independent of virtio 
> > >>>>>>> device
> > >>>>>>> state. For instance, device reset does not cause mapping go away on
> > >>>>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> > >>>>>>> is going away, give them a chance to reset iotlb back to the initial
> > >>>>>>> state in vhost_vdpa_cleanup().
> > >>>>>>>
> > >>>>>>> Signed-off-by: Si-Wei Liu 
> > >>>>>>> ---
> > >>>>>>> drivers/vhost/vdpa.c | 16 
> > >>>>>>> 1 file changed, 16 insertions(+)
> > >>>>>>>
> > >>>>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > >>>>>>> index 851535f..a3f8160 100644
> > >>>>>>> --- a/drivers/vhost/vdpa.c
> > >>>>>>> +++ b/drivers/vhost/vdpa.c
> > >>>>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
> > >>>>>>> *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> > >>>>>>>return vhost_vdpa_alloc_as(v, asid);
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> > >>>>>>> +{
> > >>>>>>> +   struct vdpa_device *vdpa = v->vdpa;
> > >>>>>>> +   const struct vdpa_config_ops *ops = vdpa->config;
> > >>>>>>> +
> > >>>>>>> +   if (ops->reset_map)
> > >>>>>>> +   ops->reset_map(vdpa, asid);
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> > >>>>>>> {
> > >>>>>>>struct vhost_vdpa_as *as = asid_to_as(v, asid);
> > >>>>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct 
> > >>>>>>> vhost_vdpa *v, u32 asid)
> > >>>>>>>
> > >>>>>>>hlist_del(&as->hash_link);
> > >>>>>>>vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, 
> > >>>>>>> asid);
> > >>>>>>> +   /*
> > >>>>>>> +* Devices with vendor specific IOMMU may need to restore
> > >>>>>>> +* iotlb to the initial or default state which is not done
> > >>>>>>> +* through device reset, as the IOTLB mapping manipulation
> > >>>>>>> +* could be decoupled from the virtio device life cycle.
> > >>>>>>> +*/
> > >>>>>> Should we do this according to whether IOTLB_PRESIST is set?
> > >>>>> Well, in theory this seems like so but it's unnecessary code change
> > >>>>> actually, as that is the way how vDPA parent behind platform IOMMU 
> > >>>>> works
> > >>>>> today, and userspace doesn't break as of today. :)
> > >>>> Well, this is one question I've ever asked before. You have explained
> > &g

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-17 Thread Jason Wang

On Wed, Oct 18, 2023 at 12:36 PM Si-Wei Liu  wrote:
>
>
>
> On 10/16/2023 7:35 PM, Jason Wang wrote:
> > On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
> >>> On Mon, Oct 16, 2023 at 8:33 AM Jason Wang  wrote:
> >>>> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu  wrote:
> >>>>>
> >>>>> On 10/12/2023 8:01 PM, Jason Wang wrote:
> >>>>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  
> >>>>>> wrote:
> >>>>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> >>>>>>> may need to restore iotlb mapping to the initial or default state
> >>>>>>> using the .reset_map op, as it's desirable for some parent devices
> >>>>>>> to solely manipulate mappings by its own, independent of virtio device
> >>>>>>> state. For instance, device reset does not cause mapping go away on
> >>>>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> >>>>>>> is going away, give them a chance to reset iotlb back to the initial
> >>>>>>> state in vhost_vdpa_cleanup().
> >>>>>>>
> >>>>>>> Signed-off-by: Si-Wei Liu 
> >>>>>>> ---
> >>>>>>> drivers/vhost/vdpa.c | 16 
> >>>>>>> 1 file changed, 16 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >>>>>>> index 851535f..a3f8160 100644
> >>>>>>> --- a/drivers/vhost/vdpa.c
> >>>>>>> +++ b/drivers/vhost/vdpa.c
> >>>>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
> >>>>>>> *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> >>>>>>>return vhost_vdpa_alloc_as(v, asid);
> >>>>>>> }
> >>>>>>>
> >>>>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> >>>>>>> +{
> >>>>>>> +   struct vdpa_device *vdpa = v->vdpa;
> >>>>>>> +   const struct vdpa_config_ops *ops = vdpa->config;
> >>>>>>> +
> >>>>>>> +   if (ops->reset_map)
> >>>>>>> +   ops->reset_map(vdpa, asid);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> >>>>>>> {
> >>>>>>>struct vhost_vdpa_as *as = asid_to_as(v, asid);
> >>>>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct 
> >>>>>>> vhost_vdpa *v, u32 asid)
> >>>>>>>
> >>>>>>>hlist_del(&as->hash_link);
> >>>>>>>vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, 
> >>>>>>> asid);
> >>>>>>> +   /*
> >>>>>>> +* Devices with vendor specific IOMMU may need to restore
> >>>>>>> +* iotlb to the initial or default state which is not done
> >>>>>>> +* through device reset, as the IOTLB mapping manipulation
> >>>>>>> +* could be decoupled from the virtio device life cycle.
> >>>>>>> +*/
> >>>>>> Should we do this according to whether IOTLB_PRESIST is set?
> >>>>> Well, in theory this seems like so but it's unnecessary code change
> >>>>> actually, as that is the way how vDPA parent behind platform IOMMU works
> >>>>> today, and userspace doesn't break as of today. :)
> >>>> Well, this is one question I've ever asked before. You have explained
> >>>> that one of the reason that we don't break userspace is that they may
> >>>> couple IOTLB reset with vDPA reset as well. One example is the Qemu.
> >>>>
> >>>>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set
> >>>>> it doesn't necessarily mean the iotlb will definitely be destroyed
> >>>>> across reset (think about the platform IOMMU case), so userspace toda

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Jason Wang

On Wed, Oct 18, 2023 at 11:38 AM Xuan Zhuo  wrote:
>
> On Tue, 17 Oct 2023 19:19:41 +0800, Xuan Zhuo  
> wrote:
> > On Tue, 17 Oct 2023 14:43:33 +0800, Xuan Zhuo  
> > wrote:
> > > On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  
> > > wrote:
> > > > On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang 
> > > > > > > >  wrote:
> > > > > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > > > > >  wrote:
> > > > > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > ## AF_XDP
> > > > > > > > > > > >
> > > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel 
> > > > > > > > > > > > network framework. The zero
> > > > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported 
> > > > > > > > > > > > by the driver. The
> > > > > > > > > > > > performance of zero copy is very good. mlx5 and intel 
> > > > > > > > > > > > ixgbe already support
> > > > > > > > > > > > this feature, This patch set allows virtio-net to 
> > > > > > > > > > > > support xsk's zerocopy xmit
> > > > > > > > > > > > feature.
> > > > > > > > > > > >
> > > > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > > > >
> > > > > > > > > > > > So it is time for Virtio-Net to complete the support 
> > > > > > > > > > > > for the XDP Socket
> > > > > > > > > > > > Zerocopy.
> > > > > > > > > > > >
> > > > > > > > > > > > Virtio-net can not increase the queue num at will, so 
> > > > > > > > > > > > xsk shares the queue with
> > > > > > > > > > > > kernel.
> > > > > > > > > > > >
> > > > > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > > > > interrupt from driver
> > > > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. 
> > > > > > > > > > > > If the CPU run by TX
> > > > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI 
> > > > > > > > > > > > on the remote CPU. If it
> > > > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > > > >
> > > > > > > > > > > > This patch set includes some refactor to the virtio-net 
> > > > > > > > > > > > to let that to support
> > > > > > > > > > > > AF_XDP.
> > > > > > > > > > > >
> > > > > > > > > > > > ## performance
> > > > > > > > > > > >
> > > > >

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Jason Wang

On Tue, Oct 17, 2023 at 7:28 PM Xuan Zhuo  wrote:
>
> On Tue, 17 Oct 2023 14:43:33 +0800, Xuan Zhuo  
> wrote:
> > On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  wrote:
> > > On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang 
> > > > > > >  wrote:
> > > > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > > > >  wrote:
> > > > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > ## AF_XDP
> > > > > > > > > > >
> > > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network 
> > > > > > > > > > > framework. The zero
> > > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by 
> > > > > > > > > > > the driver. The
> > > > > > > > > > > performance of zero copy is very good. mlx5 and intel 
> > > > > > > > > > > ixgbe already support
> > > > > > > > > > > this feature, This patch set allows virtio-net to support 
> > > > > > > > > > > xsk's zerocopy xmit
> > > > > > > > > > > feature.
> > > > > > > > > > >
> > > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > > >
> > > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > > >
> > > > > > > > > > > So it is time for Virtio-Net to complete the support for 
> > > > > > > > > > > the XDP Socket
> > > > > > > > > > > Zerocopy.
> > > > > > > > > > >
> > > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk 
> > > > > > > > > > > shares the queue with
> > > > > > > > > > > kernel.
> > > > > > > > > > >
> > > > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > > > interrupt from driver
> > > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. 
> > > > > > > > > > > If the CPU run by TX
> > > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on 
> > > > > > > > > > > the remote CPU. If it
> > > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > > >
> > > > > > > > > > > This patch set includes some refactor to the virtio-net 
> > > > > > > > > > > to let that to support
> > > > > > > > > > > AF_XDP.
> > > > > > > > > > >
> > > > > > > > > > > ## performance
> > > > > > > > > > >
> > > > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > > >
> > > > > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > > > > >
> > > >

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-17 Thread Jason Wang

On Tue, Oct 17, 2023 at 3:00 PM Xuan Zhuo  wrote:
>
> On Tue, 17 Oct 2023 14:26:01 +0800, Jason Wang  wrote:
> > On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  
> > wrote:
> > >
> > > On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  
> > > wrote:
> > > > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  wrote:
> > > > >
> > > > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang 
> > > > > >  wrote:
> > > > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > > > >  wrote:
> > > > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > ## AF_XDP
> > > > > > > > > >
> > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network 
> > > > > > > > > > framework. The zero
> > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by 
> > > > > > > > > > the driver. The
> > > > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe 
> > > > > > > > > > already support
> > > > > > > > > > this feature, This patch set allows virtio-net to support 
> > > > > > > > > > xsk's zerocopy xmit
> > > > > > > > > > feature.
> > > > > > > > > >
> > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > >
> > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > >
> > > > > > > > > > So it is time for Virtio-Net to complete the support for 
> > > > > > > > > > the XDP Socket
> > > > > > > > > > Zerocopy.
> > > > > > > > > >
> > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk 
> > > > > > > > > > shares the queue with
> > > > > > > > > > kernel.
> > > > > > > > > >
> > > > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > > > interrupt from driver
> > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If 
> > > > > > > > > > the CPU run by TX
> > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on 
> > > > > > > > > > the remote CPU. If it
> > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > >
> > > > > > > > > > This patch set includes some refactor to the virtio-net to 
> > > > > > > > > > let that to support
> > > > > > > > > > AF_XDP.
> > > > > > > > > >
> > > > > > > > > > ## performance
> > > > > > > > > >
> > > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > >
> > > > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > > > >
> > > > > > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > > > > > >
> > > > > > > > > > I write a tool that sends udp packets or recvs udp packets 
> > > > > > > > > > by AF_XDP.
> > > > > > > > > >
> > > > > > > > > >   |

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-16 Thread Jason Wang

On Tue, Oct 17, 2023 at 2:17 PM Xuan Zhuo  wrote:
>
> On Tue, 17 Oct 2023 13:27:47 +0800, Jason Wang  wrote:
> > On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  wrote:
> > >
> > > On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang 
> > > > > >  wrote:
> > > > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > ## AF_XDP
> > > > > > > >
> > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network 
> > > > > > > > framework. The zero
> > > > > > > > copy feature of xsk (XDP socket) needs to be supported by the 
> > > > > > > > driver. The
> > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe 
> > > > > > > > already support
> > > > > > > > this feature, This patch set allows virtio-net to support xsk's 
> > > > > > > > zerocopy xmit
> > > > > > > > feature.
> > > > > > > >
> > > > > > > > At present, we have completed some preparation:
> > > > > > > >
> > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > 2. virtio-core premapped dma
> > > > > > > > 3. virtio-net xdp refactor
> > > > > > > >
> > > > > > > > So it is time for Virtio-Net to complete the support for the 
> > > > > > > > XDP Socket
> > > > > > > > Zerocopy.
> > > > > > > >
> > > > > > > > Virtio-net can not increase the queue num at will, so xsk 
> > > > > > > > shares the queue with
> > > > > > > > kernel.
> > > > > > > >
> > > > > > > > On the other hand, Virtio-Net does not support generate 
> > > > > > > > interrupt from driver
> > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the 
> > > > > > > > CPU run by TX
> > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the 
> > > > > > > > remote CPU. If it
> > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > >
> > > > > > > > This patch set includes some refactor to the virtio-net to let 
> > > > > > > > that to support
> > > > > > > > AF_XDP.
> > > > > > > >
> > > > > > > > ## performance
> > > > > > > >
> > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > >
> > > > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > > > >
> > > > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > > > >
> > > > > > > > I write a tool that sends udp packets or recvs udp packets by 
> > > > > > > > AF_XDP.
> > > > > > > >
> > > > > > > >   | Guest APP CPU |Guest Softirq CPU | UDP PPS
> > > > > > > > --|---|--|
> > > > > > > > xmit by syscall   |   100%|  |   676,915
> > > > > > > > xmit by xsk   |   59.1%   |   100%   | 5,447,168
> > > > > > > > recv by syscall   |   60% |   100%   |   932,288
> > > > > > > > recv by xsk   |   35.7%   |   100%   | 3,343,168
> > > > > > >
> > > > > > > Any chance we can get a testpmd result (which I guess should be 
> > > > > > > better
> > > > > > > than PPS above)?
> > > > > >
>

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-16 Thread Jason Wang

On Tue, Oct 17, 2023 at 11:28 AM Jason Wang  wrote:
>
> On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo  wrote:
> >
> > On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang  wrote:
> > > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang  
> > > > wrote:
> > > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > ## AF_XDP
> > > > > >
> > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. 
> > > > > > The zero
> > > > > > copy feature of xsk (XDP socket) needs to be supported by the 
> > > > > > driver. The
> > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already 
> > > > > > support
> > > > > > this feature, This patch set allows virtio-net to support xsk's 
> > > > > > zerocopy xmit
> > > > > > feature.
> > > > > >
> > > > > > At present, we have completed some preparation:
> > > > > >
> > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > 2. virtio-core premapped dma
> > > > > > 3. virtio-net xdp refactor
> > > > > >
> > > > > > So it is time for Virtio-Net to complete the support for the XDP 
> > > > > > Socket
> > > > > > Zerocopy.
> > > > > >
> > > > > > Virtio-net can not increase the queue num at will, so xsk shares 
> > > > > > the queue with
> > > > > > kernel.
> > > > > >
> > > > > > On the other hand, Virtio-Net does not support generate interrupt 
> > > > > > from driver
> > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU 
> > > > > > run by TX
> > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote 
> > > > > > CPU. If it
> > > > > > is also the local CPU, then we wake up napi directly.
> > > > > >
> > > > > > This patch set includes some refactor to the virtio-net to let that 
> > > > > > to support
> > > > > > AF_XDP.
> > > > > >
> > > > > > ## performance
> > > > > >
> > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > >
> > > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > > I use this tool to send udp packet by kernel syscall.
> > > > > >
> > > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > > >
> > > > > > I write a tool that sends udp packets or recvs udp packets by 
> > > > > > AF_XDP.
> > > > > >
> > > > > >   | Guest APP CPU |Guest Softirq CPU | UDP PPS
> > > > > > --|---|--|
> > > > > > xmit by syscall   |   100%|  |   676,915
> > > > > > xmit by xsk   |   59.1%   |   100%   | 5,447,168
> > > > > > recv by syscall   |   60% |   100%   |   932,288
> > > > > > recv by xsk   |   35.7%   |   100%   | 3,343,168
> > > > >
> > > > > Any chance we can get a testpmd result (which I guess should be better
> > > > > than PPS above)?
> > > >
> > > > Do you mean testpmd + DPDK + AF_XDP?
> > >
> > > Yes.
> > >
> > > >
> > > > Yes. This is probably better because my tool does more work. That is 
> > > > not a
> > > > complete testing tool used by our business.
> > >
> > > Probably, but it would be appealing for others. Especially considering
> > > DPDK supports AF_XDP PMD now.
> >
> > OK.
> >
> > Let me try.
> >
> > But could you start to review firstly?
>
> Yes, it's in my todo list.

Speaking too fast, I think if it doesn't take too long time, I would
wait for the result first as netdim series. One reason is that I
remember claims to be only 10% to 20% loss comparing to wire speed, so
I'd expect it should be much faster. I vaguely remember, even a vhost
can gives us more than 3M P

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-16 Thread Jason Wang

On Tue, Oct 17, 2023 at 11:26 AM Xuan Zhuo  wrote:
>
> On Tue, 17 Oct 2023 11:20:41 +0800, Jason Wang  wrote:
> > On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo  
> > wrote:
> > >
> > > On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang  
> > > wrote:
> > > > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > ## AF_XDP
> > > > >
> > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. 
> > > > > The zero
> > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. 
> > > > > The
> > > > > performance of zero copy is very good. mlx5 and intel ixgbe already 
> > > > > support
> > > > > this feature, This patch set allows virtio-net to support xsk's 
> > > > > zerocopy xmit
> > > > > feature.
> > > > >
> > > > > At present, we have completed some preparation:
> > > > >
> > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > 2. virtio-core premapped dma
> > > > > 3. virtio-net xdp refactor
> > > > >
> > > > > So it is time for Virtio-Net to complete the support for the XDP 
> > > > > Socket
> > > > > Zerocopy.
> > > > >
> > > > > Virtio-net can not increase the queue num at will, so xsk shares the 
> > > > > queue with
> > > > > kernel.
> > > > >
> > > > > On the other hand, Virtio-Net does not support generate interrupt 
> > > > > from driver
> > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU 
> > > > > run by TX
> > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote 
> > > > > CPU. If it
> > > > > is also the local CPU, then we wake up napi directly.
> > > > >
> > > > > This patch set includes some refactor to the virtio-net to let that 
> > > > > to support
> > > > > AF_XDP.
> > > > >
> > > > > ## performance
> > > > >
> > > > > ENV: Qemu with vhost-user(polling mode).
> > > > >
> > > > > Sockperf: https://github.com/Mellanox/sockperf
> > > > > I use this tool to send udp packet by kernel syscall.
> > > > >
> > > > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > > > >
> > > > > I write a tool that sends udp packets or recvs udp packets by AF_XDP.
> > > > >
> > > > >   | Guest APP CPU |Guest Softirq CPU | UDP PPS
> > > > > --|---|--|
> > > > > xmit by syscall   |   100%|  |   676,915
> > > > > xmit by xsk   |   59.1%   |   100%   | 5,447,168
> > > > > recv by syscall   |   60% |   100%   |   932,288
> > > > > recv by xsk   |   35.7%   |   100%   | 3,343,168
> > > >
> > > > Any chance we can get a testpmd result (which I guess should be better
> > > > than PPS above)?
> > >
> > > Do you mean testpmd + DPDK + AF_XDP?
> >
> > Yes.
> >
> > >
> > > Yes. This is probably better because my tool does more work. That is not a
> > > complete testing tool used by our business.
> >
> > Probably, but it would be appealing for others. Especially considering
> > DPDK supports AF_XDP PMD now.
>
> OK.
>
> Let me try.
>
> But could you start to review firstly?

Yes, it's in my todo list.

>
>
> >
> > >
> > > What I noticed is that the hotspot is the driver writing virtio desc. 
> > > Because
> > > the device is in busy mode. So there is race between driver and device.
> > > So I modified the virtio core and lazily updated avail idx. Then pps can 
> > > reach
> > > 10,000,000.
> >
> > Care to post a draft for this?
>
> YES, I is thinking for this.
> But maybe that is just work for split. The packed mode has some troubles.

Ok.

Thanks

>
> Thanks.
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > ## maintain
> > > > >
> > > > > I am currently a

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-16 Thread Jason Wang

On Tue, Oct 17, 2023 at 11:11 AM Xuan Zhuo  wrote:
>
> On Tue, 17 Oct 2023 10:53:44 +0800, Jason Wang  wrote:
> > On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo  
> > wrote:
> > >
> > > ## AF_XDP
> > >
> > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The 
> > > zero
> > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > performance of zero copy is very good. mlx5 and intel ixgbe already 
> > > support
> > > this feature, This patch set allows virtio-net to support xsk's zerocopy 
> > > xmit
> > > feature.
> > >
> > > At present, we have completed some preparation:
> > >
> > > 1. vq-reset (virtio spec and kernel code)
> > > 2. virtio-core premapped dma
> > > 3. virtio-net xdp refactor
> > >
> > > So it is time for Virtio-Net to complete the support for the XDP Socket
> > > Zerocopy.
> > >
> > > Virtio-net can not increase the queue num at will, so xsk shares the 
> > > queue with
> > > kernel.
> > >
> > > On the other hand, Virtio-Net does not support generate interrupt from 
> > > driver
> > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by 
> > > TX
> > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. 
> > > If it
> > > is also the local CPU, then we wake up napi directly.
> > >
> > > This patch set includes some refactor to the virtio-net to let that to 
> > > support
> > > AF_XDP.
> > >
> > > ## performance
> > >
> > > ENV: Qemu with vhost-user(polling mode).
> > >
> > > Sockperf: https://github.com/Mellanox/sockperf
> > > I use this tool to send udp packet by kernel syscall.
> > >
> > > xmit command: sockperf tp -i 10.0.3.1 -t 1000
> > >
> > > I write a tool that sends udp packets or recvs udp packets by AF_XDP.
> > >
> > >   | Guest APP CPU |Guest Softirq CPU | UDP PPS
> > > --|---|--|
> > > xmit by syscall   |   100%|  |   676,915
> > > xmit by xsk   |   59.1%   |   100%   | 5,447,168
> > > recv by syscall   |   60% |   100%   |   932,288
> > > recv by xsk   |   35.7%   |   100%   | 3,343,168
> >
> > Any chance we can get a testpmd result (which I guess should be better
> > than PPS above)?
>
> Do you mean testpmd + DPDK + AF_XDP?

Yes.

>
> Yes. This is probably better because my tool does more work. That is not a
> complete testing tool used by our business.

Probably, but it would be appealing for others. Especially considering
DPDK supports AF_XDP PMD now.

>
> What I noticed is that the hotspot is the driver writing virtio desc. Because
> the device is in busy mode. So there is race between driver and device.
> So I modified the virtio core and lazily updated avail idx. Then pps can reach
> 10,000,000.

Care to post a draft for this?

Thanks

>
> Thanks.
>
> >
> > Thanks
> >
> > >
> > > ## maintain
> > >
> > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP 
> > > support in
> > > virtio-net.
> > >
> > > Please review.
> > >
> > > Thanks.
> > >
> > > v1:
> > > 1. remove two virtio commits. Push this patchset to net-next
> > > 2. squash "virtio_net: virtnet_poll_tx support rescheduled" to xsk: 
> > > support tx
> > > 3. fix some warnings
> > >
> > > Xuan Zhuo (19):
> > >   virtio_net: rename free_old_xmit_skbs to free_old_xmit
> > >   virtio_net: unify the code for recycling the xmit ptr
> > >   virtio_net: independent directory
> > >   virtio_net: move to virtio_net.h
> > >   virtio_net: add prefix virtnet to all struct/api inside virtio_net.h
> > >   virtio_net: separate virtnet_rx_resize()
> > >   virtio_net: separate virtnet_tx_resize()
> > >   virtio_net: sq support premapped mode
> > >   virtio_net: xsk: bind/unbind xsk
> > >   virtio_net: xsk: prevent disable tx napi
> > >   virtio_net: xsk: tx: support tx
> > >   virtio_net: xsk: tx: support wakeup
> > >   virtio_net: xsk: tx: virtnet_free_old_xmit() distinguishes xsk buffer
> > >   virtio_net: xsk: tx: virtnet_sq_free_unused_buf() check xsk buffer
> > >   virtio_net: xsk: rx: introd

Re: [PATCH net-next v1 00/19] virtio-net: support AF_XDP zero copy

2023-10-16 Thread Jason Wang

On Mon, Oct 16, 2023 at 8:00 PM Xuan Zhuo  wrote:
>
> ## AF_XDP
>
> XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> copy feature of xsk (XDP socket) needs to be supported by the driver. The
> performance of zero copy is very good. mlx5 and intel ixgbe already support
> this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> feature.
>
> At present, we have completed some preparation:
>
> 1. vq-reset (virtio spec and kernel code)
> 2. virtio-core premapped dma
> 3. virtio-net xdp refactor
>
> So it is time for Virtio-Net to complete the support for the XDP Socket
> Zerocopy.
>
> Virtio-net can not increase the queue num at will, so xsk shares the queue 
> with
> kernel.
>
> On the other hand, Virtio-Net does not support generate interrupt from driver
> manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> is also the local CPU, then we wake up napi directly.
>
> This patch set includes some refactor to the virtio-net to let that to support
> AF_XDP.
>
> ## performance
>
> ENV: Qemu with vhost-user(polling mode).
>
> Sockperf: https://github.com/Mellanox/sockperf
> I use this tool to send udp packet by kernel syscall.
>
> xmit command: sockperf tp -i 10.0.3.1 -t 1000
>
> I write a tool that sends udp packets or recvs udp packets by AF_XDP.
>
>   | Guest APP CPU |Guest Softirq CPU | UDP PPS
> --|---|--|
> xmit by syscall   |   100%|  |   676,915
> xmit by xsk   |   59.1%   |   100%   | 5,447,168
> recv by syscall   |   60% |   100%   |   932,288
> recv by xsk   |   35.7%   |   100%   | 3,343,168

Any chance we can get a testpmd result (which I guess should be better
than PPS above)?

Thanks

>
> ## maintain
>
> I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support 
> in
> virtio-net.
>
> Please review.
>
> Thanks.
>
> v1:
> 1. remove two virtio commits. Push this patchset to net-next
> 2. squash "virtio_net: virtnet_poll_tx support rescheduled" to xsk: 
> support tx
> 3. fix some warnings
>
> Xuan Zhuo (19):
>   virtio_net: rename free_old_xmit_skbs to free_old_xmit
>   virtio_net: unify the code for recycling the xmit ptr
>   virtio_net: independent directory
>   virtio_net: move to virtio_net.h
>   virtio_net: add prefix virtnet to all struct/api inside virtio_net.h
>   virtio_net: separate virtnet_rx_resize()
>   virtio_net: separate virtnet_tx_resize()
>   virtio_net: sq support premapped mode
>   virtio_net: xsk: bind/unbind xsk
>   virtio_net: xsk: prevent disable tx napi
>   virtio_net: xsk: tx: support tx
>   virtio_net: xsk: tx: support wakeup
>   virtio_net: xsk: tx: virtnet_free_old_xmit() distinguishes xsk buffer
>   virtio_net: xsk: tx: virtnet_sq_free_unused_buf() check xsk buffer
>   virtio_net: xsk: rx: introduce add_recvbuf_xsk()
>   virtio_net: xsk: rx: introduce receive_xsk() to recv xsk buffer
>   virtio_net: xsk: rx: virtnet_rq_free_unused_buf() check xsk buffer
>   virtio_net: update tx timeout record
>   virtio_net: xdp_features add NETDEV_XDP_ACT_XSK_ZEROCOPY
>
>  MAINTAINERS |   2 +-
>  drivers/net/Kconfig |   8 +-
>  drivers/net/Makefile|   2 +-
>  drivers/net/virtio/Kconfig  |  13 +
>  drivers/net/virtio/Makefile |   8 +
>  drivers/net/{virtio_net.c => virtio/main.c} | 652 +---
>  drivers/net/virtio/virtio_net.h | 359 +++
>  drivers/net/virtio/xsk.c| 545 
>  drivers/net/virtio/xsk.h|  32 +
>  9 files changed, 1247 insertions(+), 374 deletions(-)
>  create mode 100644 drivers/net/virtio/Kconfig
>  create mode 100644 drivers/net/virtio/Makefile
>  rename drivers/net/{virtio_net.c => virtio/main.c} (91%)
>  create mode 100644 drivers/net/virtio/virtio_net.h
>  create mode 100644 drivers/net/virtio/xsk.c
>  create mode 100644 drivers/net/virtio/xsk.h
>
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH

2023-10-16 Thread Jason Wang

On Tue, Oct 17, 2023 at 7:53 AM Alexei Starovoitov
 wrote:
>
> On Sun, Oct 15, 2023 at 10:10 AM Akihiko Odaki  
> wrote:
> >
> > On 2023/10/16 1:07, Alexei Starovoitov wrote:
> > > On Sun, Oct 15, 2023 at 7:17 AM Akihiko Odaki  
> > > wrote:
> > >>
> > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > >> index 0448700890f7..298634556fab 100644
> > >> --- a/include/uapi/linux/bpf.h
> > >> +++ b/include/uapi/linux/bpf.h
> > >> @@ -988,6 +988,7 @@ enum bpf_prog_type {
> > >>  BPF_PROG_TYPE_SK_LOOKUP,
> > >>  BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
> > >>  BPF_PROG_TYPE_NETFILTER,
> > >> +   BPF_PROG_TYPE_VNET_HASH,
> > >
> > > Sorry, we do not add new stable program types anymore.
> > >
> > >> @@ -6111,6 +6112,10 @@ struct __sk_buff {
> > >>  __u8  tstamp_type;
> > >>  __u32 :24;  /* Padding, future use. */
> > >>  __u64 hwtstamp;
> > >> +
> > >> +   __u32 vnet_hash_value;
> > >> +   __u16 vnet_hash_report;
> > >> +   __u16 vnet_rss_queue;
> > >>   };
> > >
> > > we also do not add anything to uapi __sk_buff.
> > >
> > >> +const struct bpf_verifier_ops vnet_hash_verifier_ops = {
> > >> +   .get_func_proto = sk_filter_func_proto,
> > >> +   .is_valid_access= sk_filter_is_valid_access,
> > >> +   .convert_ctx_access = bpf_convert_ctx_access,
> > >> +   .gen_ld_abs = bpf_gen_ld_abs,
> > >> +};
> > >
> > > and we don't do ctx rewrites like this either.
> > >
> > > Please see how hid-bpf and cgroup rstat are hooking up bpf
> > > in _unstable_ way.
> >
> > Can you describe what "stable" and "unstable" mean here? I'm new to BPF
> > and I'm worried if it may mean the interface stability.
> >
> > Let me describe the context. QEMU bundles an eBPF program that is used
> > for the "eBPF steering program" feature of tun. Now I'm proposing to
> > extend the feature to allow to return some values to the userspace and
> > vhost_net. As such, the extension needs to be done in a way that ensures
> > interface stability.
>
> bpf is not an option then.
> we do not add stable bpf program types or hooks any more.

Does this mean eBPF could not be used for any new use cases other than
the existing ones?

> If a kernel subsystem wants to use bpf it needs to accept the fact
> that such bpf extensibility will be unstable and subsystem maintainers
> can decide to remove such bpf support in the future.

I don't see how it is different from the existing ones.

Thanks

>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-16 Thread Jason Wang

On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu  wrote:
>
>
>
> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
> > On Mon, Oct 16, 2023 at 8:33 AM Jason Wang  wrote:
> >> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu  wrote:
> >>>
> >>>
> >>> On 10/12/2023 8:01 PM, Jason Wang wrote:
> >>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:
> >>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> >>>>> may need to restore iotlb mapping to the initial or default state
> >>>>> using the .reset_map op, as it's desirable for some parent devices
> >>>>> to solely manipulate mappings by its own, independent of virtio device
> >>>>> state. For instance, device reset does not cause mapping go away on
> >>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> >>>>> is going away, give them a chance to reset iotlb back to the initial
> >>>>> state in vhost_vdpa_cleanup().
> >>>>>
> >>>>> Signed-off-by: Si-Wei Liu 
> >>>>> ---
> >>>>>drivers/vhost/vdpa.c | 16 
> >>>>>1 file changed, 16 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >>>>> index 851535f..a3f8160 100644
> >>>>> --- a/drivers/vhost/vdpa.c
> >>>>> +++ b/drivers/vhost/vdpa.c
> >>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
> >>>>> *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> >>>>>   return vhost_vdpa_alloc_as(v, asid);
> >>>>>}
> >>>>>
> >>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> >>>>> +{
> >>>>> +   struct vdpa_device *vdpa = v->vdpa;
> >>>>> +   const struct vdpa_config_ops *ops = vdpa->config;
> >>>>> +
> >>>>> +   if (ops->reset_map)
> >>>>> +   ops->reset_map(vdpa, asid);
> >>>>> +}
> >>>>> +
> >>>>>static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> >>>>>{
> >>>>>   struct vhost_vdpa_as *as = asid_to_as(v, asid);
> >>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa 
> >>>>> *v, u32 asid)
> >>>>>
> >>>>>   hlist_del(&as->hash_link);
> >>>>>   vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
> >>>>> +   /*
> >>>>> +* Devices with vendor specific IOMMU may need to restore
> >>>>> +* iotlb to the initial or default state which is not done
> >>>>> +* through device reset, as the IOTLB mapping manipulation
> >>>>> +* could be decoupled from the virtio device life cycle.
> >>>>> +*/
> >>>> Should we do this according to whether IOTLB_PRESIST is set?
> >>> Well, in theory this seems like so but it's unnecessary code change
> >>> actually, as that is the way how vDPA parent behind platform IOMMU works
> >>> today, and userspace doesn't break as of today. :)
> >> Well, this is one question I've ever asked before. You have explained
> >> that one of the reason that we don't break userspace is that they may
> >> couple IOTLB reset with vDPA reset as well. One example is the Qemu.
> >>
> >>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set
> >>> it doesn't necessarily mean the iotlb will definitely be destroyed
> >>> across reset (think about the platform IOMMU case), so userspace today
> >>> is already tolerating enough with either good or bad IOMMU.

I'm confused, how to define tolerating here? For example, if it has
tolerance, why bother?

> >>This code of
> >>> not checking IOTLB_PERSIST being set is intentional, there's no point to
> >>> emulate bad IOMMU behavior even for older userspace (with improper
> >>> emulation to be done it would result in even worse performance).

I can easily imagine a case:

The old Qemu that works only with a setup like mlx5_vdpa. If we do
this without a negotiation, IOTLB will not be clear but the Qemu will
try to re-program the IOTLB after reset. Which will br

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-15 Thread Jason Wang

On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu  wrote:
>
>
>
> On 10/12/2023 8:01 PM, Jason Wang wrote:
> > On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:
> >> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> >> may need to restore iotlb mapping to the initial or default state
> >> using the .reset_map op, as it's desirable for some parent devices
> >> to solely manipulate mappings by its own, independent of virtio device
> >> state. For instance, device reset does not cause mapping go away on
> >> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> >> is going away, give them a chance to reset iotlb back to the initial
> >> state in vhost_vdpa_cleanup().
> >>
> >> Signed-off-by: Si-Wei Liu 
> >> ---
> >>   drivers/vhost/vdpa.c | 16 
> >>   1 file changed, 16 insertions(+)
> >>
> >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >> index 851535f..a3f8160 100644
> >> --- a/drivers/vhost/vdpa.c
> >> +++ b/drivers/vhost/vdpa.c
> >> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
> >> *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> >>  return vhost_vdpa_alloc_as(v, asid);
> >>   }
> >>
> >> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> >> +{
> >> +   struct vdpa_device *vdpa = v->vdpa;
> >> +   const struct vdpa_config_ops *ops = vdpa->config;
> >> +
> >> +   if (ops->reset_map)
> >> +   ops->reset_map(vdpa, asid);
> >> +}
> >> +
> >>   static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> >>   {
> >>  struct vhost_vdpa_as *as = asid_to_as(v, asid);
> >> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, 
> >> u32 asid)
> >>
> >>  hlist_del(&as->hash_link);
> >>  vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
> >> +   /*
> >> +* Devices with vendor specific IOMMU may need to restore
> >> +* iotlb to the initial or default state which is not done
> >> +* through device reset, as the IOTLB mapping manipulation
> >> +* could be decoupled from the virtio device life cycle.
> >> +*/
> > Should we do this according to whether IOTLB_PRESIST is set?
> Well, in theory this seems like so but it's unnecessary code change
> actually, as that is the way how vDPA parent behind platform IOMMU works
> today, and userspace doesn't break as of today. :)

Well, this is one question I've ever asked before. You have explained
that one of the reason that we don't break userspace is that they may
couple IOTLB reset with vDPA reset as well. One example is the Qemu.

>
> As explained in previous threads [1][2], when IOTLB_PERSIST is not set
> it doesn't necessarily mean the iotlb will definitely be destroyed
> across reset (think about the platform IOMMU case), so userspace today
> is already tolerating enough with either good or bad IOMMU. This code of
> not checking IOTLB_PERSIST being set is intentional, there's no point to
> emulate bad IOMMU behavior even for older userspace (with improper
> emulation to be done it would result in even worse performance).

For two reasons:

1) backend features need acked by userspace this is by design
2) keep the odd behaviour seems to be more safe as we can't audit
every userspace program

Thanks

> I think
> the purpose of the IOTLB_PERSIST flag is just to give userspace 100%
> certainty of persistent iotlb mapping not getting lost across vdpa reset.
>
> Thanks,
> -Siwei
>
> [1]
> https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e4...@oracle.com/
> [2]
> https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1...@oracle.com/
> >   Otherwise
> > we may break old userspace.
> >
> > Thanks
> >
> >> +   vhost_vdpa_reset_map(v, asid);
> >>  kfree(as);
> >>
> >>  return 0;
> >> --
> >> 1.8.3.1
> >>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] vdpa: introduce .reset_map operation callback

2023-10-15 Thread Jason Wang

On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu  wrote:
>
>
>
> On 10/12/2023 7:49 PM, Jason Wang wrote:
> > On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:
> >> Device specific IOMMU parent driver who wishes to see mapping to be
> >> decoupled from virtio or vdpa device life cycle (device reset) can use
> >> it to restore memory mapping in the device IOMMU to the initial or
> >> default state. The reset of mapping is done per address space basis.
> >>
> >> The reason why a separate .reset_map op is introduced is because this
> >> allows a simple on-chip IOMMU model without exposing too much device
> >> implementation details to the upper vdpa layer. The .dma_map/unmap or
> >> .set_map driver API is meant to be used to manipulate the IOTLB mappings,
> >> and has been abstracted in a way similar to how a real IOMMU device maps
> >> or unmaps pages for certain memory ranges. However, apart from this there
> >> also exists other mapping needs, in which case 1:1 passthrough mapping
> >> has to be used by other users (read virtio-vdpa). To ease parent/vendor
> >> driver implementation and to avoid abusing DMA ops in an unexpacted way,
> >> these on-chip IOMMU devices can start with 1:1 passthrough mapping mode
> >> initially at the he time of creation. Then the .reset_map op can be used
> >> to switch iotlb back to this initial state without having to expose a
> >> complex two-dimensional IOMMU device model.
> >>
> >> Signed-off-by: Si-Wei Liu 
> >> ---
> >>   include/linux/vdpa.h | 10 ++
> >>   1 file changed, 10 insertions(+)
> >>
> >> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> >> index d376309..26ae6ae 100644
> >> --- a/include/linux/vdpa.h
> >> +++ b/include/linux/vdpa.h
> >> @@ -327,6 +327,15 @@ struct vdpa_map_file {
> >>* @iova: iova to be unmapped
> >>* @size: size of the area
> >>* Returns integer: success (0) or error (< 
> >> 0)
> >> + * @reset_map: Reset device memory mapping to the default
> >> + * state (optional)
> > I think we need to mention that this is a must for parents that use 
> > set_map()?
> It's not a must IMO, some .set_map() user for e.g. VDUSE or vdpa-sim-blk
> can deliberately choose to implement .reset_map() depending on its own
> need. Those user_va software devices mostly don't care about mapping
> cost during reset, as they don't have to pin kernel memory in general.
> It's just whether or not they care about mapping being decoupled from
> device reset at all.

Ok, let's document this in the changelog at least.

Thanks

>
> And the exact implementation requirement of this interface has been
> documented right below.
>
> -Siwei
> >
> > Other than this:
> >
> > Acked-by: Jason Wang 
> >
> > Thanks
> >
> >> + * Needed for devices that are using device
> >> + * specific DMA translation and prefer mapping
> >> + * to be decoupled from the virtio life cycle,
> >> + * i.e. device .reset op does not reset 
> >> mapping
> >> + * @vdev: vdpa device
> >> + * @asid: address space identifier
> >> + * Returns integer: success (0) or error (< 0)
> >>* @get_vq_dma_dev:Get the dma device for a specific
> >>* virtqueue (optional)
> >>* @vdev: vdpa device
> >> @@ -405,6 +414,7 @@ struct vdpa_config_ops {
> >> u64 iova, u64 size, u64 pa, u32 perm, void 
> >> *opaque);
> >>  int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
> >>   u64 iova, u64 size);
> >> +   int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
> >>  int (*set_group_asid)(struct vdpa_device *vdev, unsigned int 
> >> group,
> >>unsigned int asid);
> >>  struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 
> >> idx);
> >> --
> >> 1.8.3.1
> >>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/4] vdpa/mlx5: implement .reset_map driver op

2023-10-12 Thread Jason Wang

On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:
>
> Since commit 6f5312f80183 ("vdpa/mlx5: Add support for running with
> virtio_vdpa"), mlx5_vdpa starts with preallocate 1:1 DMA MR at device
> creation time. This 1:1 DMA MR will be implicitly destroyed while
> the first .set_map call is invoked, in which case callers like
> vhost-vdpa will start to set up custom mappings. When the .reset
> callback is invoked, the custom mappings will be cleared and the 1:1
> DMA MR will be re-created.
>
> In order to reduce excessive memory mapping cost in live migration,
> it is desirable to decouple the vhost-vdpa IOTLB abstraction from
> the virtio device life cycle, i.e. mappings can be kept around intact
> across virtio device reset. Leverage the .reset_map callback, which
> is meant to destroy the regular MR on the given ASID and recreate the
> initial DMA mapping. That way, the device .reset op can run free from
> having to maintain and clean up memory mappings by itself.
>
> The cvq mapping also needs to be cleared if is in the given ASID.
>
> Co-developed-by: Dragos Tatulea 
> Signed-off-by: Dragos Tatulea 
> Signed-off-by: Si-Wei Liu 

I wonder if the simulator suffers from the exact same issue. If yes,
let's fix the simulator as well?

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-12 Thread Jason Wang

On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:
>
> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> may need to restore iotlb mapping to the initial or default state
> using the .reset_map op, as it's desirable for some parent devices
> to solely manipulate mappings by its own, independent of virtio device
> state. For instance, device reset does not cause mapping go away on
> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> is going away, give them a chance to reset iotlb back to the initial
> state in vhost_vdpa_cleanup().
>
> Signed-off-by: Si-Wei Liu 
> ---
>  drivers/vhost/vdpa.c | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 851535f..a3f8160 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
> *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> return vhost_vdpa_alloc_as(v, asid);
>  }
>
> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> +{
> +   struct vdpa_device *vdpa = v->vdpa;
> +   const struct vdpa_config_ops *ops = vdpa->config;
> +
> +   if (ops->reset_map)
> +   ops->reset_map(vdpa, asid);
> +}
> +
>  static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
>  {
> struct vhost_vdpa_as *as = asid_to_as(v, asid);
> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, 
> u32 asid)
>
> hlist_del(&as->hash_link);
> vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
> +   /*
> +* Devices with vendor specific IOMMU may need to restore
> +* iotlb to the initial or default state which is not done
> +* through device reset, as the IOTLB mapping manipulation
> +* could be decoupled from the virtio device life cycle.
> +*/

Should we do this according to whether IOTLB_PRESIST is set? Otherwise
we may break old userspace.

Thanks

> +   vhost_vdpa_reset_map(v, asid);
> kfree(as);
>
> return 0;
> --
> 1.8.3.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] vdpa: introduce .reset_map operation callback

2023-10-12 Thread Jason Wang

On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:
>
> Device specific IOMMU parent driver who wishes to see mapping to be
> decoupled from virtio or vdpa device life cycle (device reset) can use
> it to restore memory mapping in the device IOMMU to the initial or
> default state. The reset of mapping is done per address space basis.
>
> The reason why a separate .reset_map op is introduced is because this
> allows a simple on-chip IOMMU model without exposing too much device
> implementation details to the upper vdpa layer. The .dma_map/unmap or
> .set_map driver API is meant to be used to manipulate the IOTLB mappings,
> and has been abstracted in a way similar to how a real IOMMU device maps
> or unmaps pages for certain memory ranges. However, apart from this there
> also exists other mapping needs, in which case 1:1 passthrough mapping
> has to be used by other users (read virtio-vdpa). To ease parent/vendor
> driver implementation and to avoid abusing DMA ops in an unexpacted way,
> these on-chip IOMMU devices can start with 1:1 passthrough mapping mode
> initially at the he time of creation. Then the .reset_map op can be used
> to switch iotlb back to this initial state without having to expose a
> complex two-dimensional IOMMU device model.
>
> Signed-off-by: Si-Wei Liu 
> ---
>  include/linux/vdpa.h | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index d376309..26ae6ae 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -327,6 +327,15 @@ struct vdpa_map_file {
>   * @iova: iova to be unmapped
>   * @size: size of the area
>   * Returns integer: success (0) or error (< 0)
> + * @reset_map: Reset device memory mapping to the default
> + * state (optional)

I think we need to mention that this is a must for parents that use set_map()?

Other than this:

Acked-by: Jason Wang 

Thanks

> + * Needed for devices that are using device
> + * specific DMA translation and prefer mapping
> + * to be decoupled from the virtio life cycle,
> + * i.e. device .reset op does not reset mapping
> + * @vdev: vdpa device
> + * @asid: address space identifier
> + * Returns integer: success (0) or error (< 0)
>   * @get_vq_dma_dev:Get the dma device for a specific
>   * virtqueue (optional)
>   * @vdev: vdpa device
> @@ -405,6 +414,7 @@ struct vdpa_config_ops {
>u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
> int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
>  u64 iova, u64 size);
> +   int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
>   unsigned int asid);
> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
> --
> 1.8.3.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 0/5] virtio-net: support dynamic coalescing moderation

2023-10-12 Thread Jason Wang

On Thu, Oct 12, 2023 at 4:29 PM Jason Wang  wrote:
>
> On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
> >
> > Now, virtio-net already supports per-queue moderation parameter
> > setting. Based on this, we use the netdim library of linux to support
> > dynamic coalescing moderation for virtio-net.
> >
> > Due to hardware scheduling issues, we only tested rx dim.
>
> Do you have PPS numbers? And TX numbers are also important as the
> throughput could be misleading due to various reasons.

Another consideration:

We currently have two TX mode, NAPI by default, and orphan. Simple
pktgen test shows NAPI can only achieve 30% of orphan. So we need to
make sure:

1) dim helps for NAPI, and if NAPI can compete with orphan, we can
drop the orphan code completely which is a great release and
simplification of the codes. And it means we can have BQL, and other
good stuff on top easily.
2) dim doesn't cause regression for orphan

Thanks

>
> Thanks
>
> >
> > @Test env
> > rxq0 has affinity to cpu0.
> >
> > @Test cmd
> > client: taskset -c 0 sockperf tp -i ${IP} -t 30 --tcp -m ${msg_size}
> > server: taskset -c 0 sockperf sr --tcp
> >
> > @Test res
> > The second column is the ratio of the result returned by client
> > when rx dim is enabled to the result returned by client when
> > rx dim is disabled.
> > --
> > | msg_size |  rx_dim=on / rx_dim=off |
> > --
> > |   14B| + 3%|
> > --
> > |   100B   | + 16%   |
> > --
> > |   500B   | + 25%   |
> > --
> > |   1400B  | + 28%   |
> > --
> > |   2048B  | + 22%   |
> > --
> > |   4096B  | + 5%|
> > --
> >
> > ---
> > This patch set was part of the previous netdim patch set[1].
> > [1] was split into a merged bugfix set[2] and the current set.
> > The previous relevant commentators have been Cced.
> >
> > [1] 
> > https://lore.kernel.org/all/20230811065512.22190-1-hen...@linux.alibaba.com/
> > [2] 
> > https://lore.kernel.org/all/cover.1696745452.git.hen...@linux.alibaba.com/
> >
> > Heng Qi (5):
> >   virtio-net: returns whether napi is complete
> >   virtio-net: separate rx/tx coalescing moderation cmds
> >   virtio-net: extract virtqueue coalescig cmd for reuse
> >   virtio-net: support rx netdim
> >   virtio-net: support tx netdim
> >
> >  drivers/net/virtio_net.c | 394 ---
> >  1 file changed, 322 insertions(+), 72 deletions(-)
> >
> > --
> > 2.19.1.6.gb485710b
> >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Report a possible vhost bug in stable branches

2023-10-12 Thread Jason Wang

On Thu, Oct 12, 2023 at 6:44 PM Xianting Tian
 wrote:
>
>
> 在 2023/10/12 下午3:55, Jason Wang 写道:
> > On Thu, Oct 12, 2023 at 9:43 AM Xianting Tian
> >  wrote:
> >> cgroup attach work and dev flush work will both be added to dev work
> >> list in vhost_attach_cgroups() when set dev owner:
> >>   static int vhost_attach_cgroups(struct vhost_dev *dev)
> >>   {
> >>   struct vhost_attach_cgroups_struct attach;
> >>
> >>   attach.owner = current;
> >>   vhost_work_init(&attach.work,
> >>  vhost_attach_cgroups_work);
> >>   vhost_work_queue(dev, &attach.work); // add cgroup
> >> attach work
> >>   vhost_work_dev_flush(dev);   // add dev
> >> flush work
> >>   return attach.ret;
> >>   }
> >>
> >> And dev kworker will be waken up to handle the two works in
> >> vhost_worker():
> >>   node = llist_del_all(&dev->work_list);
> >>   node = llist_reverse_order(node);
> >>   llist_for_each_entry_safe{
> >>   work->fn(work);
> >>   }
> >>
> >> As the list is reversed before processing in vhost_worker(), so it is
> >> possible
> >> that dev flush work is processed before cgroup attach work.
> > This sounds weird. It's llist not list so when adding the new entry
> > was added to the head that why we need llist_reverse_order() to
> > recover the order.
> >
> >   Have you ever reproduced these issues?
>
> Sorry for the disturb, No issue now.
>
> It caused by our internal changes.

If it's an optimization or features, you are welcomed to post them.

Developing new features upstream has a lot of benefits.

Thanks


>
> >
> > Thanks
> >
> >> If so,
> >> vhost_attach_cgroups
> >> may return "attach.ret" before cgroup attach work is handled, but
> >> "attach.ret" is random
> >> value as it is in stack.
> >>
> >> The possible fix maybe:
> >>
> >> static int vhost_attach_cgroups(struct vhost_dev *dev)
> >> {
> >>   struct vhost_attach_cgroups_struct attach;
> >>
> >>   attach.ret = 0;
> >>   attach.owner = current;
> >>   vhost_work_init(&attach.work, vhost_attach_cgroups_work);
> >>   vhost_work_queue(dev, &attach.work);
> >>   vhost_work_dev_flush(dev);
> >>   return attach.ret;
> >> }
> >>
> >>So this fix is just to initialize the attach.ret to 0, this fix may
> >> not the final fix,
> >>We just want you experts know this issue exists, and we met it
> >> recently in our test.
> >>
> >> And the issue exists in may stable branches.
> >>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 0/5] virtio-net: support dynamic coalescing moderation

2023-10-12 Thread Jason Wang

On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
>
> Now, virtio-net already supports per-queue moderation parameter
> setting. Based on this, we use the netdim library of linux to support
> dynamic coalescing moderation for virtio-net.
>
> Due to hardware scheduling issues, we only tested rx dim.

Do you have PPS numbers? And TX numbers are also important as the
throughput could be misleading due to various reasons.

Thanks

>
> @Test env
> rxq0 has affinity to cpu0.
>
> @Test cmd
> client: taskset -c 0 sockperf tp -i ${IP} -t 30 --tcp -m ${msg_size}
> server: taskset -c 0 sockperf sr --tcp
>
> @Test res
> The second column is the ratio of the result returned by client
> when rx dim is enabled to the result returned by client when
> rx dim is disabled.
> --
> | msg_size |  rx_dim=on / rx_dim=off |
> --
> |   14B| + 3%|
> --
> |   100B   | + 16%   |
> --
> |   500B   | + 25%   |
> --
> |   1400B  | + 28%   |
> --
> |   2048B  | + 22%   |
> --
> |   4096B  | + 5%|
> --
>
> ---
> This patch set was part of the previous netdim patch set[1].
> [1] was split into a merged bugfix set[2] and the current set.
> The previous relevant commentators have been Cced.
>
> [1] 
> https://lore.kernel.org/all/20230811065512.22190-1-hen...@linux.alibaba.com/
> [2] https://lore.kernel.org/all/cover.1696745452.git.hen...@linux.alibaba.com/
>
> Heng Qi (5):
>   virtio-net: returns whether napi is complete
>   virtio-net: separate rx/tx coalescing moderation cmds
>   virtio-net: extract virtqueue coalescig cmd for reuse
>   virtio-net: support rx netdim
>   virtio-net: support tx netdim
>
>  drivers/net/virtio_net.c | 394 ---
>  1 file changed, 322 insertions(+), 72 deletions(-)
>
> --
> 2.19.1.6.gb485710b
>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Report a possible vhost bug in stable branches

2023-10-12 Thread Jason Wang

On Thu, Oct 12, 2023 at 9:43 AM Xianting Tian
 wrote:
>
> cgroup attach work and dev flush work will both be added to dev work
> list in vhost_attach_cgroups() when set dev owner:
>  static int vhost_attach_cgroups(struct vhost_dev *dev)
>  {
>  struct vhost_attach_cgroups_struct attach;
>
>  attach.owner = current;
>  vhost_work_init(&attach.work,
> vhost_attach_cgroups_work);
>  vhost_work_queue(dev, &attach.work); // add cgroup
> attach work
>  vhost_work_dev_flush(dev);   // add dev
> flush work
>  return attach.ret;
>  }
>
>And dev kworker will be waken up to handle the two works in
> vhost_worker():
>  node = llist_del_all(&dev->work_list);
>  node = llist_reverse_order(node);
>  llist_for_each_entry_safe{
>  work->fn(work);
>  }
>
>As the list is reversed before processing in vhost_worker(), so it is
> possible
>that dev flush work is processed before cgroup attach work.

This sounds weird. It's llist not list so when adding the new entry
was added to the head that why we need llist_reverse_order() to
recover the order.

 Have you ever reproduced these issues?

Thanks

> If so,
> vhost_attach_cgroups
>may return "attach.ret" before cgroup attach work is handled, but
> "attach.ret" is random
>value as it is in stack.
>
> The possible fix maybe:
>
> static int vhost_attach_cgroups(struct vhost_dev *dev)
> {
>  struct vhost_attach_cgroups_struct attach;
>
>  attach.ret = 0;
>  attach.owner = current;
>  vhost_work_init(&attach.work, vhost_attach_cgroups_work);
>  vhost_work_queue(dev, &attach.work);
>  vhost_work_dev_flush(dev);
>  return attach.ret;
> }
>
>   So this fix is just to initialize the attach.ret to 0, this fix may
> not the final fix,
>   We just want you experts know this issue exists, and we met it
> recently in our test.
>
> And the issue exists in may stable branches.
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost 00/22] virtio-net: support AF_XDP zero copy

2023-10-12 Thread Jason Wang

On Thu, Oct 12, 2023 at 9:58 AM Xuan Zhuo  wrote:
>
> On Wed, 11 Oct 2023 10:00:57 -0700, Jakub Kicinski  wrote:
> > On Wed, 11 Oct 2023 17:27:06 +0800 Xuan Zhuo wrote:
> > > ## AF_XDP
> > >
> > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The 
> > > zero
> > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > performance of zero copy is very good. mlx5 and intel ixgbe already 
> > > support
> > > this feature, This patch set allows virtio-net to support xsk's zerocopy 
> > > xmit
> > > feature.
> >
> > You're moving the driver and adding a major feature.
> > This really needs to go via net or bpf.
> > If you have dependencies in other trees please wait for
> > after the merge window.
>
>
> If so, I can remove the first two commits.
>
> Then, the sq uses the premapped mode by default.
> And we can use the api virtqueue_dma_map_single_attrs to replace the
> virtqueue_dma_map_page_attrs.
>
> And then I will fix that on the top.
>
> Hi Micheal and Jason, is that ok for you?

I would go with what looks easy for you but I think Jakub wants the
series to go with next-next (this is what we did in the past for
networking specific features that is done in virtio-net). So we need
to tweak the prefix to use net-next instead of vhost.

Thanks


>
> Thanks.
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 3/4] virtio_pci: add check for common cfg size

2023-10-10 Thread Jason Wang

On Tue, Oct 10, 2023 at 3:58 PM Xuan Zhuo  wrote:
>
> On Tue, 10 Oct 2023 14:41:39 +0800, Jason Wang  wrote:
> > On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo  
> > wrote:
> > >
> > > Some buggy devices, the common cfg size may not match the features.
> > >
> > > This patch checks the common cfg size for the
> > > features(VIRTIO_F_NOTIF_CONFIG_DATA, VIRTIO_F_RING_RESET). When the
> > > common cfg size does not match the corresponding feature, we fail the
> > > probe and print error message.
> > >
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/virtio/virtio_pci_modern.c | 36 ++
> > >  drivers/virtio/virtio_pci_modern_dev.c |  2 +-
> > >  include/linux/virtio_pci_modern.h  |  1 +
> > >  3 files changed, 38 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_modern.c 
> > > b/drivers/virtio/virtio_pci_modern.c
> > > index d6bb68ba84e5..6a8f5ff05636 100644
> > > --- a/drivers/virtio/virtio_pci_modern.c
> > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > @@ -39,6 +39,39 @@ static void vp_transport_features(struct virtio_device 
> > > *vdev, u64 features)
> > > __virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
> > >  }
> > >
> > > +static int __vp_check_common_size_one_feature(struct virtio_device 
> > > *vdev, u32 fbit,
> > > +   u32 offset, const char *fname)
> > > +{
> > > +   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > +
> > > +   if (!__virtio_test_bit(vdev, fbit))
> > > +   return 0;
> > > +
> > > +   if (likely(vp_dev->mdev.common_len >= offset))
> > > +   return 0;
> > > +
> > > +   dev_err(&vdev->dev,
> > > +   "virtio: common cfg size(%ld) does not match the feature 
> > > %s\n",
> > > +   vp_dev->mdev.common_len, fname);
> > > +
> > > +   return -EINVAL;
> > > +}
> > > +
> > > +#define vp_check_common_size_one_feature(vdev, fbit, field) \
> > > +   __vp_check_common_size_one_feature(vdev, fbit, \
> > > +   offsetofend(struct virtio_pci_modern_common_cfg, field), 
> > > #fbit)
> > > +
> > > +static int vp_check_common_size(struct virtio_device *vdev)
> > > +{
> > > +   if (vp_check_common_size_one_feature(vdev, 
> > > VIRTIO_F_NOTIF_CONFIG_DATA, queue_notify_data))
> > > +   return -EINVAL;
> > > +
> > > +   if (vp_check_common_size_one_feature(vdev, VIRTIO_F_RING_RESET, 
> > > queue_reset))
> > > +   return -EINVAL;
> >
> > Do we need to at least check the offset of the queue_device as well here?
>
>
> Not need.
>
>
> /*
>  * vp_modern_map_capability - map a part of virtio pci capability
>  * @mdev: the modern virtio-pci device
>  * @off: offset of the capability
>  * @minlen: minimal length of the capability
>  * @align: align requirement
>  * @start: start from the capability
>  * @size: map size
>  * @len: the length that is actually mapped
>  * @pa: physical address of the capability
>  *
>  * Returns the io address of for the part of the capability
>  */
> static void __iomem *
> vp_modern_map_capability(struct virtio_pci_modern_device *mdev, int off,
>      size_t minlen, u32 align, u32 start, u32 size,
>  size_t *len, resource_size_t *pa)
>
>
>
> caller:
> mdev->common = vp_modern_map_capability(mdev, common,
>   sizeof(struct virtio_pci_common_cfg), 4,
>   0, sizeof(struct 
> virtio_pci_modern_common_cfg),
>   &mdev->common_len, NULL);
>
>
> We pass the sizeof(struct virtio_pci_common_cfg) as the minlen.
>
> So we do not need to check the common cfg size is smaller then
> sizeof(struct virtio_pci_common_cfg).

Great.

Acked-by: Jason Wang 

Thanks

>
> Thanks.
>
> >
> > Thanks
> >
> > > +
> > > +   return 0;
> > > +}
> > > +
> > >  /* virtio config->finalize_features() implementation */
> > >  static int vp_finalize_features(struct virtio_device *vdev)
> > >  {
> > > @@ -57,6 +90,9 @@ static int vp_finalize_features(struct virtio_device 
> > > *vdev

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-10 Thread Jason Wang

On Tue, Oct 10, 2023 at 2:19 PM Akihiko Odaki  wrote:
>
> On 2023/10/10 15:00, Jason Wang wrote:
> > On Tue, Oct 10, 2023 at 1:51 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/10 14:45, Jason Wang wrote:
> >>> On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki  
> >>> wrote:
> >>>>
> >>>> On 2023/10/09 19:44, Willem de Bruijn wrote:
> >>>>> On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  
> >>>>> wrote:
> >>>>>>
> >>>>>> On 2023/10/09 19:06, Willem de Bruijn wrote:
> >>>>>>> On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki 
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> On 2023/10/09 18:57, Willem de Bruijn wrote:
> >>>>>>>>> On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki 
> >>>>>>>>>  wrote:
> >>>>>>>>>>
> >>>>>>>>>> On 2023/10/09 17:04, Willem de Bruijn wrote:
> >>>>>>>>>>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki 
> >>>>>>>>>>>  wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2023/10/09 5:08, Willem de Bruijn wrote:
> >>>>>>>>>>>>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>>>>>>>>>>>>>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> virtio-net have two usage of hashes: one is RSS and another 
> >>>>>>>>>>>>>>>> is hash
> >>>>>>>>>>>>>>>> reporting. Conventionally the hash calculation was done by 
> >>>>>>>>>>>>>>>> the VMM.
> >>>>>>>>>>>>>>>> However, computing the hash after the queue was chosen 
> >>>>>>>>>>>>>>>> defeats the
> >>>>>>>>>>>>>>>> purpose of RSS.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Another approach is to use eBPF steering program. This 
> >>>>>>>>>>>>>>>> approach has
> >>>>>>>>>>>>>>>> another downside: it cannot report the calculated hash due 
> >>>>>>>>>>>>>>>> to the
> >>>>>>>>>>>>>>>> restrictive nature of eBPF.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Introduce the code to compute hashes to the kernel in order 
> >>>>>>>>>>>>>>>> to overcome
> >>>>>>>>>>>>>>>> thse challenges. An alternative solution is to extend the 
> >>>>>>>>>>>>>>>> eBPF steering
> >>>>>>>>>>>>>>>> program so that it will be able to report to the userspace, 
> >>>>>>>>>>>>>>>> but it makes
> >>>>>>>>>>>>>>>> little sense to allow to implement different hashing 
> >>>>>>>>>>>>>>>> algorithms with
> >>>>>>>>>>>>>>>> eBPF since the hash value reported by virtio-net is strictly 
> >>>>>>>>>>>>>>>> defined by
> >>>>>>>>>>>>>>>> the specification.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The hash value already stored in sk_buff is not used and 
> >>>>>>>>>>>>>>>> computed
> >>>>>>>>>>>>>>>> independently since it may have been compute

Re: [PATCH 2/2] virtio-mmio: Support multiple interrupts per device

2023-10-09 Thread Jason Wang

On Sat, Sep 30, 2023 at 4:46 AM Jakub Sitnicki  wrote:
>
> Some virtual devices, such as the virtio network device, can use multiple
> virtqueues (or multiple pairs of virtqueues in the case of a vNIC). In such
> case, when there are multiple vCPUs present, it is possible to process
> virtqueue events in parallel. Each vCPU can service a subset of all
> virtqueues when notified that there is work to carry out.
>
> However, the current virtio-mmio transport implementation poses a
> limitation. Only one vCPU can service notifications from any of the
> virtqueues of a single virtio device. This is because a virtio-mmio device
> driver supports registering just one interrupt per device. With such setup
> we are not able to scale virtqueue event processing among vCPUs.
>
> Now, with more than one IRQ resource registered for a virtio-mmio platform
> device, we can address this limitation.
>
> First, we request multiple IRQs when creating virtqueues for a device.
>
> Then, map each virtqueue to one of the IRQs assigned to the device. The
> mapping is done in a device type specific manner. For instance, a network
> device will want each RX/TX virtqueue pair mapped to a different IRQ
> line. Other device types might require a different mapping scheme. We
> currently provide a mapping for virtio-net device type.
>
> Finally, when handling an interrupt, we service only the virtqueues
> associated with the IRQ line that triggered the event.
>
> Signed-off-by: Jakub Sitnicki 
> ---
>  drivers/virtio/virtio_mmio.c | 102 
> +++
>  1 file changed, 83 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
> index 06a587b23542..180c51c27704 100644
> --- a/drivers/virtio/virtio_mmio.c
> +++ b/drivers/virtio/virtio_mmio.c
> @@ -69,6 +69,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -93,6 +94,10 @@ struct virtio_mmio_device {
> /* a list of queues so we can dispatch IRQs */
> spinlock_t lock;
> struct list_head virtqueues;
> +
> +   /* IRQ range allocated to the device */
> +   unsigned int irq_base;
> +   unsigned int num_irqs;
>  };
>
>  struct virtio_mmio_vq_info {
> @@ -101,6 +106,9 @@ struct virtio_mmio_vq_info {
>
> /* the list node for the virtqueues list */
> struct list_head node;
> +
> +   /* IRQ mapped to virtqueue */
> +   unsigned int irq;
>  };
>
>
> @@ -297,7 +305,7 @@ static bool vm_notify_with_data(struct virtqueue *vq)
> return true;
>  }
>
> -/* Notify all virtqueues on an interrupt. */
> +/* Notify all or some virtqueues on an interrupt. */
>  static irqreturn_t vm_interrupt(int irq, void *opaque)
>  {
> struct virtio_mmio_device *vm_dev = opaque;
> @@ -308,20 +316,31 @@ static irqreturn_t vm_interrupt(int irq, void *opaque)
>
> /* Read and acknowledge interrupts */
> status = readl(vm_dev->base + VIRTIO_MMIO_INTERRUPT_STATUS);
> -   writel(status, vm_dev->base + VIRTIO_MMIO_INTERRUPT_ACK);
>
> if (unlikely(status & VIRTIO_MMIO_INT_CONFIG)) {
> +   writel(status & VIRTIO_MMIO_INT_CONFIG, vm_dev->base + 
> VIRTIO_MMIO_INTERRUPT_ACK);
> virtio_config_changed(&vm_dev->vdev);
> ret = IRQ_HANDLED;
> }
>
> if (likely(status & VIRTIO_MMIO_INT_VRING)) {
> +   writel(status & VIRTIO_MMIO_INT_VRING, vm_dev->base + 
> VIRTIO_MMIO_INTERRUPT_ACK);
> spin_lock_irqsave(&vm_dev->lock, flags);
> list_for_each_entry(info, &vm_dev->virtqueues, node)
> ret |= vring_interrupt(irq, info->vq);
> spin_unlock_irqrestore(&vm_dev->lock, flags);
> }
>
> +   /* Notify only affected vrings if device uses multiple interrupts */
> +   if (vm_dev->num_irqs > 1) {
> +   spin_lock_irqsave(&vm_dev->lock, flags);
> +   list_for_each_entry(info, &vm_dev->virtqueues, node) {
> +   if (info->irq == irq)
> +   ret |= vring_interrupt(irq, info->vq);
> +   }
> +   spin_unlock_irqrestore(&vm_dev->lock, flags);
> +   }
> +
> return ret;
>  }
>
> @@ -356,11 +375,15 @@ static void vm_del_vqs(struct virtio_device *vdev)
>  {
> struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
> struct virtqueue *vq, *n;
> +   int i, irq;
> +
> +   for (i = 0; i < vm_dev->num_irqs; i++) {
> +   irq = vm_dev->irq_base + i;
> +   devm_free_irq(&vdev->dev, irq, vm_dev);
> +   }
>
> list_for_each_entry_safe(vq, n, &vdev->vqs, list)
> vm_del_vq(vq);
> -
> -   free_irq(platform_get_irq(vm_dev->pdev, 0), vm_dev);
>  }
>
>  static void vm_synchronize_cbs(struct virtio_device *vdev)
> @@ -488,6 +511,18 @@ static struct virtqueue *vm_setup_vq(struct 
> virtio_device *vdev, unsigned int in

Re: [PATCH vhost v3 1/4] virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit

2023-10-09 Thread Jason Wang

On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo  wrote:
>
> This patch adds the definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
> in the relevant header file.
>
> This feature indicates that the driver uses the data provided by the
> device as a virtqueue identifier in available buffer notifications.
>
> It comes from here:
> https://github.com/oasis-tcs/virtio-spec/issues/89
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  include/uapi/linux/virtio_config.h | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/uapi/linux/virtio_config.h 
> b/include/uapi/linux/virtio_config.h
> index 2c712c654165..8881aea60f6f 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -105,6 +105,11 @@
>   */
>  #define VIRTIO_F_NOTIFICATION_DATA 38
>
> +/* This feature indicates that the driver uses the data provided by the 
> device
> + * as a virtqueue identifier in available buffer notifications.
> + */
> +#define VIRTIO_F_NOTIF_CONFIG_DATA 39
> +
>  /*
>   * This feature indicates that the driver can reset a queue individually.
>   */
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 2/4] virtio_pci: fix the common cfg map size

2023-10-09 Thread Jason Wang

On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo  wrote:
>
> The function vp_modern_map_capability() takes the size parameter,
> which corresponds to the size of virtio_pci_common_cfg. As a result,
> this indicates the size of memory area to map.
>
> Now the size is the size of virtio_pci_common_cfg, but some feature(such
> as the _F_RING_RESET) needs the virtio_pci_modern_common_cfg, so this
> commit changes the size to the size of virtio_pci_modern_common_cfg.
>
> Fixes: 0b50cece0b78 ("virtio_pci: introduce helper to get/set queue reset")
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_pci_modern_dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index aad7d9296e77..9cb601e16688 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -291,7 +291,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
> err = -EINVAL;
> mdev->common = vp_modern_map_capability(mdev, common,
>   sizeof(struct virtio_pci_common_cfg), 4,
> - 0, sizeof(struct virtio_pci_common_cfg),
> + 0, sizeof(struct 
> virtio_pci_modern_common_cfg),
>   NULL, NULL);
> if (!mdev->common)
> goto err_map_common;
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 3/4] virtio_pci: add check for common cfg size

2023-10-09 Thread Jason Wang

On Tue, Oct 10, 2023 at 11:11 AM Xuan Zhuo  wrote:
>
> Some buggy devices, the common cfg size may not match the features.
>
> This patch checks the common cfg size for the
> features(VIRTIO_F_NOTIF_CONFIG_DATA, VIRTIO_F_RING_RESET). When the
> common cfg size does not match the corresponding feature, we fail the
> probe and print error message.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio_pci_modern.c | 36 ++
>  drivers/virtio/virtio_pci_modern_dev.c |  2 +-
>  include/linux/virtio_pci_modern.h  |  1 +
>  3 files changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern.c 
> b/drivers/virtio/virtio_pci_modern.c
> index d6bb68ba84e5..6a8f5ff05636 100644
> --- a/drivers/virtio/virtio_pci_modern.c
> +++ b/drivers/virtio/virtio_pci_modern.c
> @@ -39,6 +39,39 @@ static void vp_transport_features(struct virtio_device 
> *vdev, u64 features)
> __virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
>  }
>
> +static int __vp_check_common_size_one_feature(struct virtio_device *vdev, 
> u32 fbit,
> +   u32 offset, const char *fname)
> +{
> +   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> +
> +   if (!__virtio_test_bit(vdev, fbit))
> +   return 0;
> +
> +   if (likely(vp_dev->mdev.common_len >= offset))
> +   return 0;
> +
> +   dev_err(&vdev->dev,
> +   "virtio: common cfg size(%ld) does not match the feature 
> %s\n",
> +   vp_dev->mdev.common_len, fname);
> +
> +   return -EINVAL;
> +}
> +
> +#define vp_check_common_size_one_feature(vdev, fbit, field) \
> +   __vp_check_common_size_one_feature(vdev, fbit, \
> +   offsetofend(struct virtio_pci_modern_common_cfg, field), 
> #fbit)
> +
> +static int vp_check_common_size(struct virtio_device *vdev)
> +{
> +   if (vp_check_common_size_one_feature(vdev, 
> VIRTIO_F_NOTIF_CONFIG_DATA, queue_notify_data))
> +   return -EINVAL;
> +
> +   if (vp_check_common_size_one_feature(vdev, VIRTIO_F_RING_RESET, 
> queue_reset))
> +   return -EINVAL;

Do we need to at least check the offset of the queue_device as well here?

Thanks

> +
> +   return 0;
> +}
> +
>  /* virtio config->finalize_features() implementation */
>  static int vp_finalize_features(struct virtio_device *vdev)
>  {
> @@ -57,6 +90,9 @@ static int vp_finalize_features(struct virtio_device *vdev)
> return -EINVAL;
> }
>
> +   if (vp_check_common_size(vdev))
> +   return -EINVAL;
> +
> vp_modern_set_features(&vp_dev->mdev, vdev->features);
>
> return 0;
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index 9cb601e16688..33f319da1558 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -292,7 +292,7 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
> mdev->common = vp_modern_map_capability(mdev, common,
>   sizeof(struct virtio_pci_common_cfg), 4,
>   0, sizeof(struct 
> virtio_pci_modern_common_cfg),
> - NULL, NULL);
> + &mdev->common_len, NULL);
> if (!mdev->common)
> goto err_map_common;
> mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1,
> diff --git a/include/linux/virtio_pci_modern.h 
> b/include/linux/virtio_pci_modern.h
> index 067ac1d789bc..edf62bae0474 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -28,6 +28,7 @@ struct virtio_pci_modern_device {
> /* So we can sanity-check accesses. */
> size_t notify_len;
> size_t device_len;
> +   size_t common_len;
>
> /* Capability for when we need to map notifications per-vq. */
> int notify_map_cap;
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 16/16] vdpa/mlx5: Update cvq iotlb mapping on ASID change

2023-10-09 Thread Jason Wang

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> For the following sequence:
> - cvq group is in ASID 0
> - .set_map(1, cvq_iotlb)
> - .set_group_asid(cvq_group, 1)
>
> ... the cvq mapping from ASID 0 will be used. This is not always correct
> behaviour.
>
> This patch adds support for the above mentioned flow by saving the iotlb
> on each .set_map and updating the cvq iotlb with it on a cvq group change.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 
> ---

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 15/16] vdpa/mlx5: Make iotlb helper functions more generic

2023-10-09 Thread Jason Wang

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> They will be used in a follow-up patch.
>
> For dup_iotlb, avoid the src == dst case. This is an error.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 14/16] vdpa/mlx5: Enable hw support for vq descriptor mapping

2023-10-09 Thread Jason Wang

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> Vq descriptor mappings are supported in hardware by filling in an
> additional mkey which contains the descriptor mappings to the hw vq.
>
> A previous patch in this series added support for hw mkey (mr) creation
> for ASID 1.
>
> This patch fills in both the vq data and vq descriptor mkeys based on
> group ASID mapping.
>
> The feature is signaled to the vdpa core through the presence of the
> .get_vq_desc_group op.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 13/16] vdpa/mlx5: Introduce mr for vq descriptor

2023-10-09 Thread Jason Wang

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> Introduce the vq descriptor group and mr per ASID. Until now
> .set_map on ASID 1 was only updating the cvq iotlb. From now on it also
> creates a mkey for it. The current patch doesn't use it but follow-up
> patches will add hardware support for mapping the vq descriptors.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 
> ---

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 12/16] vdpa/mlx5: Improve mr update flow

2023-10-09 Thread Jason Wang

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> The current flow for updating an mr works directly on mvdev->mr which
> makes it cumbersome to handle multiple new mr structs.
>
> This patch makes the flow more straightforward by having
> mlx5_vdpa_create_mr return a new mr which will update the old mr (if
> any). The old mr will be deleted and unlinked from mvdev.
>
> This change paves the way for adding mrs for different ASIDs.
>
> The initialized bool is no longer needed as mr is now a pointer in the
> mlx5_vdpa_dev struct which will be NULL when not initialized.
>
> Acked-by: Eugenio Pérez 
> Signed-off-by: Dragos Tatulea 
> ---

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 11/16] vdpa/mlx5: Move mr mutex out of mr struct

2023-10-09 Thread Jason Wang

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> The mutex is named like it is supposed to protect only the mkey but in
> reality it is a global lock for all mr resources.
>
> Shift the mutex to it's rightful location (struct mlx5_vdpa_dev) and
> give it a more appropriate name.
>
> Signed-off-by: Dragos Tatulea 
> Acked-by: Eugenio Pérez 

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v3 10/16] vdpa/mlx5: Allow creation/deletion of any given mr struct

2023-10-09 Thread Jason Wang

On Mon, Oct 9, 2023 at 7:25 PM Dragos Tatulea  wrote:
>
> This patch adapts the mr creation/deletion code to be able to work with
> any given mr struct pointer. All the APIs are adapted to take an extra
> parameter for the mr.
>
> mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The
> check is done in the caller instead (mlx5_set_map).
>
> This change is needed for a followup patch which will introduce an
> additional mr for the vq descriptor data.
>
> Signed-off-by: Dragos Tatulea 
> Acked-by: Eugenio Pérez 

Acked-by: Jason Wang 

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Jason Wang

On Tue, Oct 10, 2023 at 1:51 PM Akihiko Odaki  wrote:
>
> On 2023/10/10 14:45, Jason Wang wrote:
> > On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 19:44, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  
> >>> wrote:
> >>>>
> >>>> On 2023/10/09 19:06, Willem de Bruijn wrote:
> >>>>> On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  
> >>>>> wrote:
> >>>>>>
> >>>>>> On 2023/10/09 18:57, Willem de Bruijn wrote:
> >>>>>>> On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki 
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> On 2023/10/09 17:04, Willem de Bruijn wrote:
> >>>>>>>>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki 
> >>>>>>>>>  wrote:
> >>>>>>>>>>
> >>>>>>>>>> On 2023/10/09 5:08, Willem de Bruijn wrote:
> >>>>>>>>>>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >>>>>>>>>>>  wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>>>>>>>>>>>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> virtio-net have two usage of hashes: one is RSS and another is 
> >>>>>>>>>>>>>> hash
> >>>>>>>>>>>>>> reporting. Conventionally the hash calculation was done by the 
> >>>>>>>>>>>>>> VMM.
> >>>>>>>>>>>>>> However, computing the hash after the queue was chosen defeats 
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>> purpose of RSS.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Another approach is to use eBPF steering program. This 
> >>>>>>>>>>>>>> approach has
> >>>>>>>>>>>>>> another downside: it cannot report the calculated hash due to 
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>> restrictive nature of eBPF.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Introduce the code to compute hashes to the kernel in order to 
> >>>>>>>>>>>>>> overcome
> >>>>>>>>>>>>>> thse challenges. An alternative solution is to extend the eBPF 
> >>>>>>>>>>>>>> steering
> >>>>>>>>>>>>>> program so that it will be able to report to the userspace, 
> >>>>>>>>>>>>>> but it makes
> >>>>>>>>>>>>>> little sense to allow to implement different hashing 
> >>>>>>>>>>>>>> algorithms with
> >>>>>>>>>>>>>> eBPF since the hash value reported by virtio-net is strictly 
> >>>>>>>>>>>>>> defined by
> >>>>>>>>>>>>>> the specification.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The hash value already stored in sk_buff is not used and 
> >>>>>>>>>>>>>> computed
> >>>>>>>>>>>>>> independently since it may have been computed in a way not 
> >>>>>>>>>>>>>> conformant
> >>>>>>>>>>>>>> with the specification.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: Akihiko Odaki 
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> >>>>>>>&

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Jason Wang

On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 19:44, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 19:06, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 18:57, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 17:04, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 5:08, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >  wrote:
> >>
> >> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >>>  wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another is 
>  hash
>  reporting. Conventionally the hash calculation was done by the 
>  VMM.
>  However, computing the hash after the queue was chosen defeats 
>  the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This approach 
>  has
>  another downside: it cannot report the calculated hash due to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order to 
>  overcome
>  thse challenges. An alternative solution is to extend the eBPF 
>  steering
>  program so that it will be able to report to the userspace, but 
>  it makes
>  little sense to allow to implement different hashing algorithms 
>  with
>  eBPF since the hash value reported by virtio-net is strictly 
>  defined by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and computed
>  independently since it may have been computed in a way not 
>  conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
>  ---
> >>>
>  +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
>  +   .max_indirection_table_length =
>  +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
>  +
>  +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
>  +};
> >>>
> >>> No need to have explicit capabilities exchange like this? Tun 
> >>> either
> >>> supports all or none.
> >>
> >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
> >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and 
> >> VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> >>
> >> It is because the flow dissector does not support IPv6 extensions. 
> >> The
> >> specification is also vague, and does not tell how many TLVs 
> >> should be
> >> consumed at most when interpreting destination option header so I 
> >> chose
> >> to avoid adding code for these hash types to the flow dissector. I 
> >> doubt
> >> anyone will complain about it since nobody complains for Linux.
> >>
> >> I'm also adding this so that we can extend it later.
> >> max_indirection_table_length may grow for systems with 128+ CPUs, 
> >> or
> >> types may have other bits for new protocols in the future.
> >>
> >>>
>    case TUNSETSTEERINGEBPF:
>  -   ret = tun_set_ebpf(tun, &tun->steering_prog, 
>  argp);
>  +   bpf_ret = tun_set_ebpf(tun, &tun->steering_prog, 
>  argp);
>  +   if (IS_ERR(bpf_ret))
>  +   ret = PTR_ERR(bpf_ret);
>  +   else if (bpf_ret)
>  +   tun->vnet_hash.flags &= 
>  ~TUN_VNET_HASH_RSS;
> >>>
> >>> Don't make one feature disable another.
> >>>
> >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> >>> functions. If one is enabled the other call should fail, with 
> >>> EBUSY
> >>> for instance.
> >>>
>  +   case TUNSETVNETHASH:
>  +   len = sizeof(vnet_hash);
>  +   if (copy_from_user(&vnet_hash, argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
>  +(tun->vnet_hdr_sz < sizeof(struct

Re: [PATCH 09/16] vdpa/mlx5: Allow creation/deletion of any given mr struct

2023-10-08 Thread Jason Wang

On Sun, Oct 8, 2023 at 8:05 PM Dragos Tatulea  wrote:
>
> On Sun, 2023-10-08 at 12:25 +0800, Jason Wang wrote:
> > On Tue, Sep 26, 2023 at 3:21 PM Dragos Tatulea  wrote:
> > >
> > > On Tue, 2023-09-26 at 12:44 +0800, Jason Wang wrote:
> > > > On Tue, Sep 12, 2023 at 9:02 PM Dragos Tatulea 
> > > > wrote:
> > > > >
> > > > > This patch adapts the mr creation/deletion code to be able to work 
> > > > > with
> > > > > any given mr struct pointer. All the APIs are adapted to take an extra
> > > > > parameter for the mr.
> > > > >
> > > > > mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The
> > > > > check is done in the caller instead (mlx5_set_map).
> > > > >
> > > > > This change is needed for a followup patch which will introduce an
> > > > > additional mr for the vq descriptor data.
> > > > >
> > > > > Signed-off-by: Dragos Tatulea 
> > > > > ---
> > > >
> > > > Thinking of this decoupling I think I have a question.
> > > >
> > > > We advertise 2 address spaces and 2 groups. So we actually don't know
> > > > for example which address spaces will be used by dvq.
> > > >
> > > > And actually we allow the user space to do something like
> > > >
> > > > set_group_asid(dvq_group, 0)
> > > > set_map(0)
> > > > set_group_asid(dvq_group, 1)
> > > > set_map(1)
> > > >
> > > > I wonder if the decoupling like this patch can work and why.
> > > >
> > > This scenario could indeed work. Especially if you look at the 13'th patch
> > > [0]
> > > where hw support is added. Are you wondering if this should work at all or
> > > if it
> > > should be blocked?
> >
> > It would be great if it can work with the following patches. But at
> > least for this patch, it seems not:
> >
> > For example, what happens if we switch back to group 0 for dvq?
> >
> > set_group_asid(dvq_group, 0)
> > set_map(0)
> > set_group_asid(dvq_group, 1)
> > set_map(1)
> > // here we destroy the mr created for asid 0
> > set_group_asid(dvq_group, 0)
> >
> If by destroy you mean .reset,

It's not rest. During the second map, the mr is destroyed by
mlx5_vdpa_change_map().

 I think it works: During .reset the mapping in
> ASID 0 is reset back to the DMA/pysical map (mlx5_vdpa_create_dma_mr). Am I
> missing something?
>
> > Btw, if this is a new issue, I haven't checked whether or not it
> > exists before this series (if yes, we can fix on top).
>
> > >
> > > > It looks to me the most easy way is to let each AS be backed by an MR.
> > > > Then we don't even need to care about the dvq, cvq.
> > > That's what this patch series dowes.
> >
> > Good to know this, I will review the series.
> >
> I was planning to spin a v3 with Eugenio's suggestions. Should I wait for your
> feedback before doing that?

You can post v3 and we can move the discussion there if you wish.

Thanks

>
> Thanks,
> Dragos

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vduse: make vduse_class constant

2023-10-07 Thread Jason Wang

On Fri, Oct 6, 2023 at 10:31 PM Greg Kroah-Hartman
 wrote:
>
> Now that the driver core allows for struct class to be in read-only
> memory, we should make all 'class' structures declared at build time
> placing them into read-only memory, instead of having to be dynamically
> allocated at runtime.
>
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Xuan Zhuo 
> Cc: Xie Yongji 
> Signed-off-by: Greg Kroah-Hartman 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 40 --
>  1 file changed, 21 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index df7869537ef1..0ddd4b8abecb 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -134,7 +134,6 @@ static DEFINE_MUTEX(vduse_lock);
>  static DEFINE_IDR(vduse_idr);
>
>  static dev_t vduse_major;
> -static struct class *vduse_class;
>  static struct cdev vduse_ctrl_cdev;
>  static struct cdev vduse_cdev;
>  static struct workqueue_struct *vduse_irq_wq;
> @@ -1528,6 +1527,16 @@ static const struct kobj_type vq_type = {
> .default_groups = vq_groups,
>  };
>
> +static char *vduse_devnode(const struct device *dev, umode_t *mode)
> +{
> +   return kasprintf(GFP_KERNEL, "vduse/%s", dev_name(dev));
> +}
> +
> +static const struct class vduse_class = {
> +   .name = "vduse",
> +   .devnode = vduse_devnode,
> +};
> +
>  static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
>  {
> int i;
> @@ -1638,7 +1647,7 @@ static int vduse_destroy_dev(char *name)
> mutex_unlock(&dev->lock);
>
> vduse_dev_reset(dev);
> -   device_destroy(vduse_class, MKDEV(MAJOR(vduse_major), dev->minor));
> +   device_destroy(&vduse_class, MKDEV(MAJOR(vduse_major), dev->minor));
> idr_remove(&vduse_idr, dev->minor);
> kvfree(dev->config);
> vduse_dev_deinit_vqs(dev);
> @@ -1805,7 +1814,7 @@ static int vduse_create_dev(struct vduse_dev_config 
> *config,
>
> dev->minor = ret;
> dev->msg_timeout = VDUSE_MSG_DEFAULT_TIMEOUT;
> -   dev->dev = device_create_with_groups(vduse_class, NULL,
> +   dev->dev = device_create_with_groups(&vduse_class, NULL,
> MKDEV(MAJOR(vduse_major), dev->minor),
> dev, vduse_dev_groups, "%s", config->name);
> if (IS_ERR(dev->dev)) {
> @@ -1821,7 +1830,7 @@ static int vduse_create_dev(struct vduse_dev_config 
> *config,
>
> return 0;
>  err_vqs:
> -   device_destroy(vduse_class, MKDEV(MAJOR(vduse_major), dev->minor));
> +   device_destroy(&vduse_class, MKDEV(MAJOR(vduse_major), dev->minor));
>  err_dev:
> idr_remove(&vduse_idr, dev->minor);
>  err_idr:
> @@ -1934,11 +1943,6 @@ static const struct file_operations vduse_ctrl_fops = {
> .llseek = noop_llseek,
>  };
>
> -static char *vduse_devnode(const struct device *dev, umode_t *mode)
> -{
> -   return kasprintf(GFP_KERNEL, "vduse/%s", dev_name(dev));
> -}
> -
>  struct vduse_mgmt_dev {
> struct vdpa_mgmt_dev mgmt_dev;
> struct device dev;
> @@ -2082,11 +2086,9 @@ static int vduse_init(void)
> int ret;
> struct device *dev;
>
> -   vduse_class = class_create("vduse");
> -   if (IS_ERR(vduse_class))
> -   return PTR_ERR(vduse_class);
> -
> -   vduse_class->devnode = vduse_devnode;
> +   ret = class_register(&vduse_class);
> +   if (ret)
> +   return ret;
>
> ret = alloc_chrdev_region(&vduse_major, 0, VDUSE_DEV_MAX, "vduse");
> if (ret)
> @@ -2099,7 +2101,7 @@ static int vduse_init(void)
> if (ret)
> goto err_ctrl_cdev;
>
> -   dev = device_create(vduse_class, NULL, vduse_major, NULL, "control");
> +   dev = device_create(&vduse_class, NULL, vduse_major, NULL, "control");
> if (IS_ERR(dev)) {
> ret = PTR_ERR(dev);
> goto err_device;
> @@ -2141,13 +2143,13 @@ static int vduse_init(void)
>  err_wq:
> cdev_del(&vduse_cdev);
>  err_cdev:
> -   device_destroy(vduse_class, vduse_major);
> +   device_destroy(&vduse_class, vduse_major);
>  err_device:
> cdev_del(&vduse_ctrl_cdev);
>  err_ctrl_cdev:
> unregister_chrdev_region(vduse_major, VDUSE_DEV_MAX);
>  err_chardev_region:
> -   class_destroy(

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-10-07 Thread Jason Wang

On Fri, Oct 6, 2023 at 3:35 AM Feng Liu  wrote:
>
>
>
> On 2023-07-24 a.m.2:46, Michael S. Tsirkin wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Fri, Jul 21, 2023 at 10:18:03PM +0200, Maxime Coquelin wrote:
> >>
> >>
> >> On 7/21/23 17:10, Michael S. Tsirkin wrote:
> >>> On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> >>>>
> >>>>
> >>>> On 7/21/23 16:45, Michael S. Tsirkin wrote:
> >>>>> On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 7/20/23 23:02, Michael S. Tsirkin wrote:
> >>>>>>> On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> >>>>>>>> On 7/20/23 1:38 AM, Jason Wang wrote:
> >>>>>>>>>
> >>>>>>>>> Adding cond_resched() to the command waiting loop for a better
> >>>>>>>>> co-operation with the scheduler. This allows to give CPU a breath to
> >>>>>>>>> run other task(workqueue) instead of busy looping when preemption is
> >>>>>>>>> not allowed on a device whose CVQ might be slow.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Jason Wang 
> >>>>>>>>
> >>>>>>>> This still leaves hung processes, but at least it doesn't pin the 
> >>>>>>>> CPU any
> >>>>>>>> more.  Thanks.
> >>>>>>>> Reviewed-by: Shannon Nelson 
> >>>>>>>>
> >>>>>>>
> >>>>>>> I'd like to see a full solution
> >>>>>>> 1- block until interrupt
> >>>>>>
> >>>>>> Would it make sense to also have a timeout?
> >>>>>> And when timeout expires, set FAILED bit in device status?
> >>>>>
> >>>>> virtio spec does not set any limits on the timing of vq
> >>>>> processing.
> >>>>
> >>>> Indeed, but I thought the driver could decide it is too long for it.
> >>>>
> >>>> The issue is we keep waiting with rtnl locked, it can quickly make the
> >>>> system unusable.
> >>>
> >>> if this is a problem we should find a way not to keep rtnl
> >>> locked indefinitely.
> >>
> >>  From the tests I have done, I think it is. With OVS, a reconfiguration is
> >> performed when the VDUSE device is added, and when a MLX5 device is
> >> in the same bridge, it ends up doing an ioctl() that tries to take the
> >> rtnl lock. In this configuration, it is not possible to kill OVS because
> >> it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
> >> net.
> >
> > So for sure, we can queue up the work and process it later.
> > The somewhat tricky part is limiting the memory consumption.
> >
> >
>
>
> Hi Jason
>
> Excuse me, is there any plan for when will v5 patch series be sent out?
> Will the v5 patches solve the problem of ctrlvq's infinite poll for
> buggy devices?

We agree to harden VDUSE and,

It would be hard if we try to solve it at the virtio-net level, see
the discussions before. It might require support from various layers
(e.g networking core etc).

We can use workqueue etc as a mitigation. If Michael is fine with
this, I can post v5.

Thanks

>
> Thanks
> Feng
>
> >>>
> >>>>>>> 2- still handle surprise removal correctly by waking in that case
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>> ---
> >>>>>>>>>   drivers/net/virtio_net.c | 4 +++-
> >>>>>>>>>   1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>>>>>>> index 9f3b1d6ac33d..e7533f29b219 100644
> >>>>>>>>> --- a/drivers/net/virtio_net.c
> >>>>>>>>> +++ b/drivers/net/virtio_net.c
> >>>>>>>>> @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct 
> >>>>>>>>> virtnet_info *vi, u8 class, u8 cmd,
> >>>>>>>>>   * into the hypervisor, so the request should be 
> >>>>>>>>> handled immediately.
> >>>>>>>>>   */
> >>>>>>>>>  while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> >>>>>>>>> -  !virtqueue_is_broken(vi->cvq))
> >>>>>>>>> +  !virtqueue_is_broken(vi->cvq)) {
> >>>>>>>>> +   cond_resched();
> >>>>>>>>>  cpu_relax();
> >>>>>>>>> +   }
> >>>>>>>>>
> >>>>>>>>>  return vi->ctrl->status == VIRTIO_NET_OK;
> >>>>>>>>>   }
> >>>>>>>>> --
> >>>>>>>>> 2.39.3
> >>>>>>>>>
> >>>>>>>>> ___
> >>>>>>>>> Virtualization mailing list
> >>>>>>>>> Virtualization@lists.linux-foundation.org
> >>>>>>>>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> >>>>>>>
> >>>>>
> >>>
> >
> > ___
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net v2 5/6] virtio-net: fix the vq coalescing setting for vq resize

2023-10-07 Thread Jason Wang

On Thu, Oct 5, 2023 at 2:12 PM Heng Qi  wrote:
>
>
>
> 在 2023/10/3 下午6:41, Paolo Abeni 写道:
> > On Mon, 2023-09-25 at 15:53 +0800, Heng Qi wrote:
> >> According to the definition of virtqueue coalescing spec[1]:
> >>
> >>Upon disabling and re-enabling a transmit virtqueue, the device MUST set
> >>the coalescing parameters of the virtqueue to those configured through 
> >> the
> >>VIRTIO_NET_CTRL_NOTF_COAL_TX_SET command, or, if the driver did not set
> >>any TX coalescing parameters, to 0.
> >>
> >>Upon disabling and re-enabling a receive virtqueue, the device MUST set
> >>the coalescing parameters of the virtqueue to those configured through 
> >> the
> >>VIRTIO_NET_CTRL_NOTF_COAL_RX_SET command, or, if the driver did not set
> >>any RX coalescing parameters, to 0.
> >>
> >> We need to add this setting for vq resize (ethtool -G) where vq_reset 
> >> happens.
> >>
> >> [1] https://lists.oasis-open.org/archives/virtio-dev/202303/msg00415.html
> >>
> >> Fixes: 394bd87764b6 ("virtio_net: support per queue interrupt coalesce 
> >> command")
> >> Cc: Gavin Li 
> >> Signed-off-by: Heng Qi 
> > @Jason, since you commented on v1, waiting for your ack.
> >
> >> ---
> >>   drivers/net/virtio_net.c | 27 +++
> >>   1 file changed, 27 insertions(+)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 12ec3ae19b60..cb19b224419b 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -2855,6 +2855,9 @@ static void virtnet_get_ringparam(struct net_device 
> >> *dev,
> >>  ring->tx_pending = virtqueue_get_vring_size(vi->sq[0].vq);
> >>   }
> >>
> >> +static int virtnet_send_ctrl_coal_vq_cmd(struct virtnet_info *vi,
> >> + u16 vqn, u32 max_usecs, u32 
> >> max_packets);
> >> +
> >>   static int virtnet_set_ringparam(struct net_device *dev,
> >>   struct ethtool_ringparam *ring,
> >>   struct kernel_ethtool_ringparam *kernel_ring,
> >> @@ -2890,12 +2893,36 @@ static int virtnet_set_ringparam(struct net_device 
> >> *dev,
> >>  err = virtnet_tx_resize(vi, sq, ring->tx_pending);
> >>  if (err)
> >>  return err;
> >> +
> >> +/* Upon disabling and re-enabling a transmit 
> >> virtqueue, the device must
> >> + * set the coalescing parameters of the virtqueue to 
> >> those configured
> >> + * through the VIRTIO_NET_CTRL_NOTF_COAL_TX_SET 
> >> command, or, if the driver
> >> + * did not set any TX coalescing parameters, to 0.
> >> + */
> >> +err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(i),
> >> +    
> >> vi->intr_coal_tx.max_usecs,
> >> +
> >> vi->intr_coal_tx.max_packets);
> >> +if (err)
> >> +return err;
> >> +/* Save parameters */
> > As a very minor nit, I guess the comment could be dropped here (similar
> > to patch 4/6). @Heng Qi: please don't repost just for this, let's wait
> > for Jason' comments.
>
> Ok. We are currently on the National Day holiday, Jason may reply in a
> few days, thanks!

With the comments removed.

Acked-by: Jason Wang 

Thanks

>
> > Cheers,
> >
> > Paolo
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net v2 4/6] virtio-net: fix per queue coalescing parameter setting

2023-10-07 Thread Jason Wang

On Mon, Sep 25, 2023 at 3:53 PM Heng Qi  wrote:
>
> When the user sets a non-zero coalescing parameter to 0 for a specific
> virtqueue, it does not work as expected, so let's fix this.
>
> Fixes: 394bd87764b6 ("virtio_net: support per queue interrupt coalesce 
> command")
> Reported-by: Xiaoming Zhao 
> Cc: Gavin Li 
> Signed-off-by: Heng Qi 

Acked-by: Jason Wang 

Thanks

> ---
> v1->v2:
> 1. Remove useless comments.
>
>  drivers/net/virtio_net.c | 36 
>  1 file changed, 16 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6120dd5343dd..12ec3ae19b60 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3296,27 +3296,23 @@ static int virtnet_send_notf_coal_vq_cmds(struct 
> virtnet_info *vi,
>  {
> int err;
>
> -   if (ec->rx_coalesce_usecs || ec->rx_max_coalesced_frames) {
> -   err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(queue),
> -   ec->rx_coalesce_usecs,
> -   
> ec->rx_max_coalesced_frames);
> -   if (err)
> -   return err;
> -   /* Save parameters */
> -   vi->rq[queue].intr_coal.max_usecs = ec->rx_coalesce_usecs;
> -   vi->rq[queue].intr_coal.max_packets = 
> ec->rx_max_coalesced_frames;
> -   }
> +   err = virtnet_send_ctrl_coal_vq_cmd(vi, rxq2vq(queue),
> +   ec->rx_coalesce_usecs,
> +   ec->rx_max_coalesced_frames);
> +   if (err)
> +   return err;
>
> -   if (ec->tx_coalesce_usecs || ec->tx_max_coalesced_frames) {
> -   err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(queue),
> -   ec->tx_coalesce_usecs,
> -   
> ec->tx_max_coalesced_frames);
> -   if (err)
> -   return err;
> -   /* Save parameters */
> -   vi->sq[queue].intr_coal.max_usecs = ec->tx_coalesce_usecs;
> -   vi->sq[queue].intr_coal.max_packets = 
> ec->tx_max_coalesced_frames;
> -   }
> +   vi->rq[queue].intr_coal.max_usecs = ec->rx_coalesce_usecs;
> +   vi->rq[queue].intr_coal.max_packets = ec->rx_max_coalesced_frames;
> +
> +   err = virtnet_send_ctrl_coal_vq_cmd(vi, txq2vq(queue),
> +   ec->tx_coalesce_usecs,
> +   ec->tx_max_coalesced_frames);
> +   if (err)
> +   return err;
> +
> +   vi->sq[queue].intr_coal.max_usecs = ec->tx_coalesce_usecs;
> +   vi->sq[queue].intr_coal.max_packets = ec->tx_max_coalesced_frames;
>
> return 0;
>  }
> --
> 2.19.1.6.gb485710b
>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v2 3/4] vduse: update the vq_info in ioctl

2023-10-07 Thread Jason Wang

On Fri, Sep 29, 2023 at 5:12 PM Maxime Coquelin
 wrote:
>
>
>
> On 9/12/23 09:39, Jason Wang wrote:
> > On Tue, Sep 12, 2023 at 11:00 AM Cindy Lu  wrote:
> >>
> >> In VDUSE_VQ_GET_INFO, the driver will sync the last_avail_idx
> >> with reconnect info, After mapping the reconnect pages to userspace
> >> The userspace App will update the reconnect_time in
> >> struct vhost_reconnect_vring, If this is not 0 then it means this
> >> vq is reconnected and will update the last_avail_idx
> >>
> >> Signed-off-by: Cindy Lu 
> >> ---
> >>   drivers/vdpa/vdpa_user/vduse_dev.c | 13 +
> >>   include/uapi/linux/vduse.h |  6 ++
> >>   2 files changed, 19 insertions(+)
> >>
> >> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> >> b/drivers/vdpa/vdpa_user/vduse_dev.c
> >> index 2c69f4004a6e..680b23dbdde2 100644
> >> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> >> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> >> @@ -1221,6 +1221,8 @@ static long vduse_dev_ioctl(struct file *file, 
> >> unsigned int cmd,
> >>  struct vduse_vq_info vq_info;
> >>  struct vduse_virtqueue *vq;
> >>  u32 index;
> >> +   struct vdpa_reconnect_info *area;
> >> +   struct vhost_reconnect_vring *vq_reconnect;
> >>
> >>  ret = -EFAULT;
> >>  if (copy_from_user(&vq_info, argp, sizeof(vq_info)))
> >> @@ -1252,6 +1254,17 @@ static long vduse_dev_ioctl(struct file *file, 
> >> unsigned int cmd,
> >>
> >>  vq_info.ready = vq->ready;
> >>
> >> +   area = &vq->reconnect_info;
> >> +
> >> +   vq_reconnect = (struct vhost_reconnect_vring *)area->vaddr;
> >> +   /*check if the vq is reconnect, if yes then update the 
> >> last_avail_idx*/
> >> +   if ((vq_reconnect->last_avail_idx !=
> >> +vq_info.split.avail_index) &&
> >> +   (vq_reconnect->reconnect_time != 0)) {
> >> +   vq_info.split.avail_index =
> >> +   vq_reconnect->last_avail_idx;
> >> +   }
> >> +
> >>  ret = -EFAULT;
> >>  if (copy_to_user(argp, &vq_info, sizeof(vq_info)))
> >>  break;
> >> diff --git a/include/uapi/linux/vduse.h b/include/uapi/linux/vduse.h
> >> index 11bd48c72c6c..d585425803fd 100644
> >> --- a/include/uapi/linux/vduse.h
> >> +++ b/include/uapi/linux/vduse.h
> >> @@ -350,4 +350,10 @@ struct vduse_dev_response {
> >>  };
> >>   };
> >>
> >> +struct vhost_reconnect_vring {
> >> +   __u16 reconnect_time;
> >> +   __u16 last_avail_idx;
> >> +   _Bool avail_wrap_counter;
> >
> > Please add a comment for each field.
> >
> > And I never saw _Bool is used in uapi before, maybe it's better to
> > pack it with last_avail_idx into a __u32.
>
> Better as two distincts __u16 IMHO.

Fine with me.

Thanks

>
> Thanks,
> Maxime
>
> >
> > Btw, do we need to track inflight descriptors as well?
> >
> > Thanks
> >
> >> +};
> >> +
> >>   #endif /* _UAPI_VDUSE_H_ */
> >> --
> >> 2.34.3
> >>
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost] virtio_net: fix the missing of the dma cpu sync

2023-10-07 Thread Jason Wang

On Wed, Sep 27, 2023 at 1:53 PM Xuan Zhuo  wrote:
>
> Commit 295525e29a5b ("virtio_net: merge dma operations when filling
> mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when
> the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we
> do not want to do the sync for the entire page_frag. But that misses the
> sync for the current area.
>
> This patch does cpu sync regardless of whether the ref is zero or not.
>
> Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable 
> buffers")
> Reported-by: Michael Roth 
> Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs...@amd.com
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/net/virtio_net.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 98dc9b49d56b..9ece27dc5144 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -589,16 +589,16 @@ static void virtnet_rq_unmap(struct receive_queue *rq, 
> void *buf, u32 len)
>
> --dma->ref;
>
> -   if (dma->ref) {
> -   if (dma->need_sync && len) {
> -   offset = buf - (head + sizeof(*dma));
> +   if (dma->need_sync && len) {
> +   offset = buf - (head + sizeof(*dma));
>
> -   virtqueue_dma_sync_single_range_for_cpu(rq->vq, 
> dma->addr, offset,
> -   len, 
> DMA_FROM_DEVICE);
> -   }
> +   virtqueue_dma_sync_single_range_for_cpu(rq->vq, dma->addr,
> +   offset, len,
> +   DMA_FROM_DEVICE);
> +   }
>
> +   if (dma->ref)
> return;
> -   }
>
> virtqueue_dma_unmap_single_attrs(rq->vq, dma->addr, dma->len,
>  DMA_FROM_DEVICE, 
> DMA_ATTR_SKIP_CPU_SYNC);
> --
> 2.32.0.3.g01195cf9f
>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/2] tools/virtio: Add dma sync api for virtio test

2023-10-07 Thread Jason Wang

On Tue, Sep 26, 2023 at 1:00 PM  wrote:
>
> From: Liming Wu 
>
> Fixes: 8bd2f71054bd ("virtio_ring: introduce dma sync api for virtqueue")
> also add dma sync api for virtio test.
>
> Signed-off-by: Liming Wu 

Acked-by: Jason Wang 

Thanks

> ---
>  tools/virtio/linux/dma-mapping.h | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/tools/virtio/linux/dma-mapping.h 
> b/tools/virtio/linux/dma-mapping.h
> index 834a90bd3270..822ecaa8e4df 100644
> --- a/tools/virtio/linux/dma-mapping.h
> +++ b/tools/virtio/linux/dma-mapping.h
> @@ -24,11 +24,23 @@ enum dma_data_direction {
>  #define dma_map_page(d, p, o, s, dir) (page_to_phys(p) + (o))
>
>  #define dma_map_single(d, p, s, dir) (virt_to_phys(p))
> +#define dma_map_single_attrs(d, p, s, dir, a) (virt_to_phys(p))
>  #define dma_mapping_error(...) (0)
>
>  #define dma_unmap_single(d, a, s, r) do { (void)(d); (void)(a); (void)(s); 
> (void)(r); } while (0)
>  #define dma_unmap_page(d, a, s, r) do { (void)(d); (void)(a); (void)(s); 
> (void)(r); } while (0)
>
> +#define sg_dma_address(sg) (0)
> +#define dma_need_sync(v, a) (0)
> +#define dma_unmap_single_attrs(d, a, s, r, t) do { \
> +   (void)(d); (void)(a); (void)(s); (void)(r); (void)(t); \
> +} while (0)
> +#define dma_sync_single_range_for_cpu(d, a, o, s, r) do { \
> +   (void)(d); (void)(a); (void)(o); (void)(s); (void)(r); \
> +} while (0)
> +#define dma_sync_single_range_for_device(d, a, o, s, r) do { \
> +   (void)(d); (void)(a); (void)(o); (void)(s); (void)(r); \
> +} while (0)
>  #define dma_max_mapping_size(...) SIZE_MAX
>
>  #endif
> --
> 2.34.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/2] tools/virtio: Add hints when module is not installed

2023-10-07 Thread Jason Wang

On Tue, Sep 26, 2023 at 1:00 PM  wrote:
>
> From: Liming Wu 
>
> Need to insmod vhost_test.ko before run virtio_test.
> Give some hints to users.
>
> Signed-off-by: Liming Wu 
> ---
>  tools/virtio/virtio_test.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
> index 028f54e6854a..ce2c4d93d735 100644
> --- a/tools/virtio/virtio_test.c
> +++ b/tools/virtio/virtio_test.c
> @@ -135,6 +135,10 @@ static void vdev_info_init(struct vdev_info* dev, 
> unsigned long long features)
> dev->buf = malloc(dev->buf_size);
> assert(dev->buf);
> dev->control = open("/dev/vhost-test", O_RDWR);
> +
> +   if (dev->control < 0)
> +   fprintf(stderr, "Install vhost_test module" \
> +   "(./vhost_test/vhost_test.ko) firstly\n");

There should be many other reasons to fail for open().

Let's use strerror()?

Thanks


> assert(dev->control >= 0);
> r = ioctl(dev->control, VHOST_SET_OWNER, NULL);
> assert(r >= 0);
> --
> 2.34.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/2] virtio_pci: add build offset check for the new common cfg items

2023-10-07 Thread Jason Wang

On Mon, Sep 25, 2023 at 11:58 AM Xuan Zhuo  wrote:
>
> Add checks to the check_offsets(void) for queue_notify_data and
> queue_reset.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_pci_modern_dev.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index ef6667de1b38..47cb41160e1a 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -203,6 +203,10 @@ static inline void check_offsets(void)
>  offsetof(struct virtio_pci_common_cfg, queue_used_lo));
> BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_USEDHI !=
>  offsetof(struct virtio_pci_common_cfg, queue_used_hi));
> +   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_NDATA !=
> +offsetof(struct virtio_pci_modern_common_cfg, 
> queue_notify_data));
> +   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_RESET !=
> +offsetof(struct virtio_pci_modern_common_cfg, 
> queue_reset));
>  }
>
>  /*
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/2] virtio_pci: fix the common map size and add check for vq-reset

2023-10-07 Thread Jason Wang

On Mon, Sep 25, 2023 at 11:58 AM Xuan Zhuo  wrote:
>
> Now, the function vp_modern_map_capability() takes the size parameter,
> which corresponds to the size of virtio_pci_common_cfg. As a result, the
> pci_iomap_range() function maps the memory area for
> virtio_pci_common_cfg.
>
> However, if the _F_RING_RESET is negotiated, the extra items will be
> used. Therefore, we need to use the size of virtio_pci_modre_common_cfg
> to map more space.
>
> Meanwhile, I have introduced a new variable called common_len in the
> mdev. This allows us to check common_len when accessing the new item of
> virtio_pci_modre_common_cfg.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/virtio/virtio_pci_modern_dev.c | 8 ++--
>  include/linux/virtio_pci_modern.h  | 1 +
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index aad7d9296e77..ef6667de1b38 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -291,8 +291,8 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
> err = -EINVAL;
> mdev->common = vp_modern_map_capability(mdev, common,
>   sizeof(struct virtio_pci_common_cfg), 4,
> - 0, sizeof(struct virtio_pci_common_cfg),
> - NULL, NULL);
> + 0, sizeof(struct 
> virtio_pci_modern_common_cfg),
> + &mdev->common_len, NULL);
> if (!mdev->common)
> goto err_map_common;
> mdev->isr = vp_modern_map_capability(mdev, isr, sizeof(u8), 1,
> @@ -493,6 +493,8 @@ int vp_modern_get_queue_reset(struct 
> virtio_pci_modern_device *mdev, u16 index)
>  {
> struct virtio_pci_modern_common_cfg __iomem *cfg;
>
> +   BUG_ON(mdev->common_len < offsetofend(struct 
> virtio_pci_modern_common_cfg, queue_reset));

Instead of using BUG for buggy hardware, why not simply fail the probe
in this case?

Thanks

> +
> cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
>
> vp_iowrite16(index, &cfg->cfg.queue_select);
> @@ -509,6 +511,8 @@ void vp_modern_set_queue_reset(struct 
> virtio_pci_modern_device *mdev, u16 index)
>  {
> struct virtio_pci_modern_common_cfg __iomem *cfg;
>
> +   BUG_ON(mdev->common_len < offsetofend(struct 
> virtio_pci_modern_common_cfg, queue_reset));
> +
> cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common;
>
> vp_iowrite16(index, &cfg->cfg.queue_select);
> diff --git a/include/linux/virtio_pci_modern.h 
> b/include/linux/virtio_pci_modern.h
> index 067ac1d789bc..edf62bae0474 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -28,6 +28,7 @@ struct virtio_pci_modern_device {
> /* So we can sanity-check accesses. */
> size_t notify_len;
> size_t device_len;
> +   size_t common_len;
>
> /* Capability for when we need to map notifications per-vq. */
> int notify_map_cap;
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-07 Thread Jason Wang

On Tue, Sep 26, 2023 at 7:49 PM Michael S. Tsirkin  wrote:
>
> On Tue, Sep 26, 2023 at 10:32:39AM +0800, Jason Wang wrote:
> > It's the implementation details in legacy. The device needs to make
> > sure (reset) the driver can work (is done before get_status return).
>
> I think that there's no way to make it reliably work for all legacy drivers.

Yes, we may have ancient drivers.

>
> They just assumed a software backend and did not bother with DMA
> ordering. You can try to avoid resets, they are not that common so
> things will tend to mostly work if you don't stress them to much with
> things like hot plug/unplug in a loop.  Or you can try to use a driver
> after 2011 which is more aware of hardware ordering and flushes the
> reset write with a read.  One of these two tricks, I think, is the magic
> behind the device exposing memory bar 0 that you mention.

Right this is what I see for hardware legacy devices shipped by some
cloud vendors.

Thanks

>
> --
> MST
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 09/16] vdpa/mlx5: Allow creation/deletion of any given mr struct

2023-10-07 Thread Jason Wang

On Tue, Sep 26, 2023 at 3:21 PM Dragos Tatulea  wrote:
>
> On Tue, 2023-09-26 at 12:44 +0800, Jason Wang wrote:
> > On Tue, Sep 12, 2023 at 9:02 PM Dragos Tatulea  wrote:
> > >
> > > This patch adapts the mr creation/deletion code to be able to work with
> > > any given mr struct pointer. All the APIs are adapted to take an extra
> > > parameter for the mr.
> > >
> > > mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The
> > > check is done in the caller instead (mlx5_set_map).
> > >
> > > This change is needed for a followup patch which will introduce an
> > > additional mr for the vq descriptor data.
> > >
> > > Signed-off-by: Dragos Tatulea 
> > > ---
> >
> > Thinking of this decoupling I think I have a question.
> >
> > We advertise 2 address spaces and 2 groups. So we actually don't know
> > for example which address spaces will be used by dvq.
> >
> > And actually we allow the user space to do something like
> >
> > set_group_asid(dvq_group, 0)
> > set_map(0)
> > set_group_asid(dvq_group, 1)
> > set_map(1)
> >
> > I wonder if the decoupling like this patch can work and why.
> >
> This scenario could indeed work. Especially if you look at the 13'th patch [0]
> where hw support is added. Are you wondering if this should work at all or if 
> it
> should be blocked?

It would be great if it can work with the following patches. But at
least for this patch, it seems not:

For example, what happens if we switch back to group 0 for dvq?

set_group_asid(dvq_group, 0)
set_map(0)
set_group_asid(dvq_group, 1)
set_map(1)
// here we destroy the mr created for asid 0
set_group_asid(dvq_group, 0)

Btw, if this is a new issue, I haven't checked whether or not it
exists before this series (if yes, we can fix on top).

>
> > It looks to me the most easy way is to let each AS be backed by an MR.
> > Then we don't even need to care about the dvq, cvq.
> That's what this patch series dowes.

Good to know this, I will review the series.

Thanks

>
> Thanks,
> Dragos
>
> [0]https://lore.kernel.org/virtualization/20230912130132.561193-14-dtatu...@nvidia.com/T/#u

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 09/16] vdpa/mlx5: Allow creation/deletion of any given mr struct

2023-09-25 Thread Jason Wang

On Tue, Sep 12, 2023 at 9:02 PM Dragos Tatulea  wrote:
>
> This patch adapts the mr creation/deletion code to be able to work with
> any given mr struct pointer. All the APIs are adapted to take an extra
> parameter for the mr.
>
> mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The
> check is done in the caller instead (mlx5_set_map).
>
> This change is needed for a followup patch which will introduce an
> additional mr for the vq descriptor data.
>
> Signed-off-by: Dragos Tatulea 
> ---

Thinking of this decoupling I think I have a question.

We advertise 2 address spaces and 2 groups. So we actually don't know
for example which address spaces will be used by dvq.

And actually we allow the user space to do something like

set_group_asid(dvq_group, 0)
set_map(0)
set_group_asid(dvq_group, 1)
set_map(1)

I wonder if the decoupling like this patch can work and why.

It looks to me the most easy way is to let each AS be backed by an MR.
Then we don't even need to care about the dvq, cvq.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-25 Thread Jason Wang

On Mon, Sep 25, 2023 at 8:26 PM Jason Gunthorpe  wrote:
>
> On Mon, Sep 25, 2023 at 10:34:54AM +0800, Jason Wang wrote:
>
> > > Cloud vendors will similarly use DPUs to create a PCI functions that
> > > meet the cloud vendor's internal specification.
> >
> > This can only work if:
> >
> > 1) the internal specification has finer garin than virtio spec
> > 2) so it can define what is not implemented in the virtio spec (like
> > migration and compatibility)
>
> Yes, and that is what is happening. Realistically the "spec" isjust a
> piece of software that the Cloud vendor owns which is simply ported to
> multiple DPU vendors.
>
> It is the same as VDPA. If VDPA can make multiple NIC vendors
> consistent then why do you have a hard time believing we can do the
> same thing just on the ARM side of a DPU?

I don't. We all know vDPA can do more than virtio.

>
> > All of the above doesn't seem to be possible or realistic now, and it
> > actually has a risk to be not compatible with virtio spec. In the
> > future when virtio has live migration supported, they want to be able
> > to migrate between virtio and vDPA.
>
> Well, that is for the spec to design.

Right, so if we'd consider migration from virtio to vDPA, it needs to
be designed in a way that allows more involvement from hypervisor
other than coupling it with a specific interface (like admin
virtqueues).

>
> > > So, as I keep saying, in this scenario the goal is no mediation in the
> > > hypervisor.
> >
> > That's pretty fine, but I don't think trapping + relying is not
> > mediation. Does it really matter what happens after trapping?
>
> It is not mediation in the sense that the kernel driver does not in
> any way make decisions on the behavior of the device. It simply
> transforms an IO operation into a device command and relays it to the
> device. The device still fully controls its own behavior.
>
> VDPA is very different from this. You might call them both mediation,
> sure, but then you need another word to describe the additional
> changes VPDA is doing.
>
> > > It is pointless, everything you think you need to do there
> > > is actually already being done in the DPU.
> >
> > Well, migration or even Qemu could be offloaded to DPU as well. If
> > that's the direction that's pretty fine.
>
> That's silly, of course qemu/kvm can't run in the DPU.

KVM can't for sure but part of Qemu could. This model has been used.

>
> However, we can empty qemu and the hypervisor out so all it does is
> run kvm and run vfio. In this model the DPU does all the OVS, storage,
> "VPDA", etc. qemu is just a passive relay of the DPU PCI functions
> into VM's vPCI functions.
>
> So, everything VDPA was doing in the environment is migrated into the
> DPU.

It really depends on the use cases. For example, in the case of DPU
what if we want to provide multiple virtio devices through a single
VF?

>
> In this model the DPU is an extension of the hypervisor/qemu
> environment and we shift code from x86 side to arm side to increase
> security, save power and increase total system performance.

That's pretty fine.

Thanks

>
> Jason
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-25 Thread Jason Wang

On Tue, Sep 26, 2023 at 11:45 AM Parav Pandit  wrote:
>
>
>
> > From: Michael S. Tsirkin 
> > Sent: Tuesday, September 26, 2023 12:06 AM
>
> > One can thinkably do that wait in hardware, though. Just defer completion 
> > until
> > read is done.
> >
> Once OASIS does such new interface and if some hw vendor _actually_ wants to 
> do such complex hw, may be vfio driver can adopt to it.

It is you that tries to revive legacy in the spec. We all know legacy
is tricky but work.

> When we worked with you, we discussed that there such hw does not have enough 
> returns and hence technical committee choose to proceed with admin commands.

I don't think my questions regarding the legacy transport get good
answers at that time. What's more, we all know spec allows to fix,
workaround or even deprecate a feature.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-25 Thread Jason Wang

On Tue, Sep 26, 2023 at 12:01 PM Parav Pandit  wrote:
>
>
>
> > From: Jason Wang 
> > Sent: Tuesday, September 26, 2023 8:03 AM
> >
> > It's the implementation details in legacy. The device needs to make sure 
> > (reset)
> > the driver can work (is done before get_status return).
> It is part of the 0.9.5 and 1.x specification as I quoted those text above.

What I meant is: legacy devices need to find their way to make legacy
drivers work. That's how legacy works.

It's too late to add any normative to the 0.95 spec. So the device
behaviour is actually defined by the legacy drivers. That is why it is
tricky.

If you can't find a way to make legacy drivers work, use modern.

That's it.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1001 matches

Mail list logo