Re: [PATCH net-next 2/3] virtio-net: batch dim request
在 2024/1/17 上午3:49, Simon Horman 写道: On Tue, Jan 16, 2024 at 09:11:32PM +0800, Heng Qi wrote: Currently, when each time the driver attempts to update the coalescing parameters for a vq, it needs to kick the device. The following path is observed: 1. Driver kicks the device; 2. After the device receives the kick, CPU scheduling occurs and DMA multiple buffers multiple times; 3. The device completes processing and replies with a response. When large-queue devices issue multiple requests and kick the device frequently, this often interrupt the work of the device-side CPU. In addition, each vq request is processed separately, causing more delays for the CPU to wait for the DMA request to complete. These interruptions and overhead will strain the CPU responsible for controlling the path of the DPU, especially in multi-device and large-queue scenarios. To solve the above problems, we internally tried batch request, which merges requests from multiple queues and sends them at once. We conservatively tested 8 queue commands and sent them together. The DPU processing efficiency can be improved by 8 times, which greatly eases the DPU's support for multi-device and multi-queue DIM. Suggested-by: Xiaoming Zhao Signed-off-by: Heng Qi ... @@ -3546,16 +3552,32 @@ static void virtnet_rx_dim_work(struct work_struct *work) update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix); if (update_moder.usec != rq->intr_coal.max_usecs || update_moder.pkts != rq->intr_coal.max_packets) { - err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum, - update_moder.usec, - update_moder.pkts); - if (err) - pr_debug("%s: Failed to send dim parameters on rxq%d\n", -dev->name, qnum); - dim->state = DIM_START_MEASURE; + coal->coal_vqs[j].vqn = cpu_to_le16(rxq2vq(i)); + coal->coal_vqs[j].coal.max_usecs = cpu_to_le32(update_moder.usec); + coal->coal_vqs[j].coal.max_packets = cpu_to_le32(update_moder.pkts); + rq->intr_coal.max_usecs = update_moder.usec; + rq->intr_coal.max_packets = update_moder.pkts; + j++; } } + if (!j) + goto ret; + + coal->num_entries = cpu_to_le32(j); + sg_init_one(&sgs, coal, sizeof(struct virtnet_batch_coal) + + j * sizeof(struct virtio_net_ctrl_coal_vq)); + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL, + VIRTIO_NET_CTRL_NOTF_COAL_VQS_SET, + &sgs)) + dev_warn(&vi->vdev->dev, "Failed to add dim command\n."); + + for (i = 0; i < j; i++) { + rq = &vi->rq[(coal->coal_vqs[i].vqn) / 2]; Hi Heng Qi, The type of .vqn is __le16, but here it is used as an integer in host byte order. Perhaps this should be (completely untested!): rq = &vi->rq[le16_to_cpu(coal->coal_vqs[i].vqn) / 2]; Hi Simon, Thanks for the catch, I will check this out. + rq->dim.state = DIM_START_MEASURE; + } + +ret: rtnl_unlock(); }
Re: [PATCH net-next 2/3] virtio-net: batch dim request
On Tue, Jan 16, 2024 at 09:11:32PM +0800, Heng Qi wrote: > Currently, when each time the driver attempts to update the coalescing > parameters for a vq, it needs to kick the device. > The following path is observed: > 1. Driver kicks the device; > 2. After the device receives the kick, CPU scheduling occurs and DMA > multiple buffers multiple times; > 3. The device completes processing and replies with a response. > > When large-queue devices issue multiple requests and kick the device > frequently, this often interrupt the work of the device-side CPU. > In addition, each vq request is processed separately, causing more > delays for the CPU to wait for the DMA request to complete. > > These interruptions and overhead will strain the CPU responsible for > controlling the path of the DPU, especially in multi-device and > large-queue scenarios. > > To solve the above problems, we internally tried batch request, > which merges requests from multiple queues and sends them at once. > We conservatively tested 8 queue commands and sent them together. > The DPU processing efficiency can be improved by 8 times, which > greatly eases the DPU's support for multi-device and multi-queue DIM. > > Suggested-by: Xiaoming Zhao > Signed-off-by: Heng Qi ... > @@ -3546,16 +3552,32 @@ static void virtnet_rx_dim_work(struct work_struct > *work) > update_moder = net_dim_get_rx_moderation(dim->mode, > dim->profile_ix); > if (update_moder.usec != rq->intr_coal.max_usecs || > update_moder.pkts != rq->intr_coal.max_packets) { > - err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum, > - > update_moder.usec, > - > update_moder.pkts); > - if (err) > - pr_debug("%s: Failed to send dim parameters on > rxq%d\n", > - dev->name, qnum); > - dim->state = DIM_START_MEASURE; > + coal->coal_vqs[j].vqn = cpu_to_le16(rxq2vq(i)); > + coal->coal_vqs[j].coal.max_usecs = > cpu_to_le32(update_moder.usec); > + coal->coal_vqs[j].coal.max_packets = > cpu_to_le32(update_moder.pkts); > + rq->intr_coal.max_usecs = update_moder.usec; > + rq->intr_coal.max_packets = update_moder.pkts; > + j++; > } > } > > + if (!j) > + goto ret; > + > + coal->num_entries = cpu_to_le32(j); > + sg_init_one(&sgs, coal, sizeof(struct virtnet_batch_coal) + > + j * sizeof(struct virtio_net_ctrl_coal_vq)); > + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL, > + VIRTIO_NET_CTRL_NOTF_COAL_VQS_SET, > + &sgs)) > + dev_warn(&vi->vdev->dev, "Failed to add dim command\n."); > + > + for (i = 0; i < j; i++) { > + rq = &vi->rq[(coal->coal_vqs[i].vqn) / 2]; Hi Heng Qi, The type of .vqn is __le16, but here it is used as an integer in host byte order. Perhaps this should be (completely untested!): rq = &vi->rq[le16_to_cpu(coal->coal_vqs[i].vqn) / 2]; > + rq->dim.state = DIM_START_MEASURE; > + } > + > +ret: > rtnl_unlock(); > } >