Re: [dpdk-dev] [PATCH v2 2/2] vhost: introduce async enqueue for split ring

2020-07-02 Thread Fu, Patrick
Thanks Marvin, my comments inline:

> -Original Message-
> From: Liu, Yong 
> Sent: Wednesday, July 1, 2020 4:51 PM
> To: Fu, Patrick ; dev@dpdk.org;
> maxime.coque...@redhat.com; Xia, Chenbo ; Wang,
> Zhihong 
> Cc: Fu, Patrick ; Wang, Yinan
> ; Jiang, Cheng1 ; Liang,
> Cunming 
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] vhost: introduce async enqueue for
> split ring
> 
> >
> > +#define VHOST_ASYNC_BATCH_THRESHOLD 8
> > +
> 
> Not very clear about why batch number is 8. It is better to save it in
> rte_vhost_async_features if the value come from hardware requirement.
> 
We are in the progress of benchmarking how this value will have impact to the 
final performance,
and we will have a more reasonable manner to handle this macro. 

> > +
> > +static __rte_noinline uint32_t
> > +virtio_dev_rx_async_submit_split(struct virtio_net *dev,
> > +   struct vhost_virtqueue *vq, uint16_t queue_id,
> > +   struct rte_mbuf **pkts, uint32_t count) {
> > +   uint32_t pkt_idx = 0, pkt_burst_idx = 0;
> > +   uint16_t num_buffers;
> > +   struct buf_vector buf_vec[BUF_VECTOR_MAX];
> > +   uint16_t avail_head, last_idx, shadow_idx;
> > +
> > +   struct rte_vhost_iov_iter *it_pool = vq->it_pool;
> > +   struct iovec *vec_pool = vq->vec_pool;
> > +   struct rte_vhost_async_desc tdes[MAX_PKT_BURST];
> > +   struct iovec *src_iovec = vec_pool;
> > +   struct iovec *dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1);
> > +   struct rte_vhost_iov_iter *src_it = it_pool;
> > +   struct rte_vhost_iov_iter *dst_it = it_pool + 1;
> > +   uint16_t n_free_slot, slot_idx;
> > +   int n_pkts = 0;
> > +
> > +   avail_head = *((volatile uint16_t *)&vq->avail->idx);
> > +   last_idx = vq->last_avail_idx;
> > +   shadow_idx = vq->shadow_used_idx;
> > +
> > +   /*
> > +* The ordering between avail index and
> > +* desc reads needs to be enforced.
> > +*/
> > +   rte_smp_rmb();
> > +
> > +   rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size -
> > +1)]);
> > +
> > +   for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
> > +   uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
> > +   uint16_t nr_vec = 0;
> > +
> > +   if (unlikely(reserve_avail_buf_split(dev, vq,
> > +   pkt_len, buf_vec,
> > &num_buffers,
> > +   avail_head, &nr_vec) < 0)) {
> > +   VHOST_LOG_DATA(DEBUG,
> > +   "(%d) failed to get enough desc from
> > vring\n",
> > +   dev->vid);
> > +   vq->shadow_used_idx -= num_buffers;
> > +   break;
> > +   }
> > +
> > +   VHOST_LOG_DATA(DEBUG, "(%d) current index %d | end
> > index %d\n",
> > +   dev->vid, vq->last_avail_idx,
> > +   vq->last_avail_idx + num_buffers);
> > +
> > +   if (async_mbuf_to_desc(dev, vq, pkts[pkt_idx],
> > +   buf_vec, nr_vec, num_buffers,
> > +   src_iovec, dst_iovec, src_it, dst_it) < 0) {
> > +   vq->shadow_used_idx -= num_buffers;
> > +   break;
> > +   }
> > +
> > +   slot_idx = (vq->async_pkts_idx + pkt_idx) & (vq->size - 1);
> > +   if (src_it->count) {
> > +   async_fill_des(&tdes[pkt_burst_idx], src_it, dst_it);
> > +   pkt_burst_idx++;
> > +   vq->async_pending_info[slot_idx] =
> > +   num_buffers | (src_it->nr_segs << 16);
> > +   src_iovec += src_it->nr_segs;
> > +   dst_iovec += dst_it->nr_segs;
> > +   src_it += 2;
> > +   dst_it += 2;
> 
> Patrick,
> In my understanding, nr_segs type definition can follow nr_vec type
> definition (uint16_t). By that can short the data saved in async_pkts_pending
> from 64bit to 32bit.
> Since those information will be used in datapath, the smaller size will get 
> the
> better perf.
> 
> It is better to replace integer 2 with macro.
> 
will update the code as you suggested.

Thanks,

Patrick



Re: [dpdk-dev] [PATCH v2 2/2] vhost: introduce async enqueue for split ring

2020-07-01 Thread Liu, Yong
> 
> +#define VHOST_ASYNC_BATCH_THRESHOLD 8
> +

Not very clear about why batch number is 8. It is better to save it in 
rte_vhost_async_features if the value come from hardware requirement. 

> +
> +static __rte_noinline uint32_t
> +virtio_dev_rx_async_submit_split(struct virtio_net *dev,
> + struct vhost_virtqueue *vq, uint16_t queue_id,
> + struct rte_mbuf **pkts, uint32_t count)
> +{
> + uint32_t pkt_idx = 0, pkt_burst_idx = 0;
> + uint16_t num_buffers;
> + struct buf_vector buf_vec[BUF_VECTOR_MAX];
> + uint16_t avail_head, last_idx, shadow_idx;
> +
> + struct rte_vhost_iov_iter *it_pool = vq->it_pool;
> + struct iovec *vec_pool = vq->vec_pool;
> + struct rte_vhost_async_desc tdes[MAX_PKT_BURST];
> + struct iovec *src_iovec = vec_pool;
> + struct iovec *dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1);
> + struct rte_vhost_iov_iter *src_it = it_pool;
> + struct rte_vhost_iov_iter *dst_it = it_pool + 1;
> + uint16_t n_free_slot, slot_idx;
> + int n_pkts = 0;
> +
> + avail_head = *((volatile uint16_t *)&vq->avail->idx);
> + last_idx = vq->last_avail_idx;
> + shadow_idx = vq->shadow_used_idx;
> +
> + /*
> +  * The ordering between avail index and
> +  * desc reads needs to be enforced.
> +  */
> + rte_smp_rmb();
> +
> + rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
> +
> + for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
> + uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
> + uint16_t nr_vec = 0;
> +
> + if (unlikely(reserve_avail_buf_split(dev, vq,
> + pkt_len, buf_vec,
> &num_buffers,
> + avail_head, &nr_vec) < 0)) {
> + VHOST_LOG_DATA(DEBUG,
> + "(%d) failed to get enough desc from
> vring\n",
> + dev->vid);
> + vq->shadow_used_idx -= num_buffers;
> + break;
> + }
> +
> + VHOST_LOG_DATA(DEBUG, "(%d) current index %d | end
> index %d\n",
> + dev->vid, vq->last_avail_idx,
> + vq->last_avail_idx + num_buffers);
> +
> + if (async_mbuf_to_desc(dev, vq, pkts[pkt_idx],
> + buf_vec, nr_vec, num_buffers,
> + src_iovec, dst_iovec, src_it, dst_it) < 0) {
> + vq->shadow_used_idx -= num_buffers;
> + break;
> + }
> +
> + slot_idx = (vq->async_pkts_idx + pkt_idx) & (vq->size - 1);
> + if (src_it->count) {
> + async_fill_des(&tdes[pkt_burst_idx], src_it, dst_it);
> + pkt_burst_idx++;
> + vq->async_pending_info[slot_idx] =
> + num_buffers | (src_it->nr_segs << 16);
> + src_iovec += src_it->nr_segs;
> + dst_iovec += dst_it->nr_segs;
> + src_it += 2;
> + dst_it += 2;

Patrick, 
In my understanding, nr_segs type definition can follow nr_vec type definition 
(uint16_t). By that can short the data saved in async_pkts_pending from 64bit 
to 32bit. 
Since those information will be used in datapath, the smaller size will get the 
better perf. 

It is better to replace integer 2 with macro. 

Thanks,
Marvin