Few more comments inline.
On 10.05.2019 2:54, William Tu wrote:
> The patch introduces experimental AF_XDP support for OVS netdev.
> AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket type
> built upon the eBPF and XDP technology. It is aims to have comparable
> performance to DPDK but cooperate better with existing kernel's networking
> stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
> attached to the netdev, by-passing a couple of Linux kernel's subsystems
> As a result, AF_XDP socket shows much better performance than AF_PACKET
> For more details about AF_XDP, please see linux kernel's
> Documentation/networking/af_xdp.rst. Note that by default, this is not
> compiled in.
>
> Signed-off-by: William Tu <[email protected]>
>
> ---
<snip>
> +Setup AF_XDP netdev
> +-------------------
> +Before running OVS with AF_XDP, make sure the libbpf and libelf are
> +set-up right::
> +
> + ldd vswitchd/ovs-vswitchd
> +
> +Open vSwitch should be started using userspace datapath as described
> +in :doc:`general`::
> +
> + ovs-vswitchd --disable-system
> + ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> +
> +.. note::
> + OVS AF_XDP netdev is using the userspace datapath, the same datapath
> + as used by OVS-DPDK. So it requires --disable-system for ovs-vswitchd
> + and datapath_type=netdev when adding a new bridge.
I don't think that '--disable-system' is needed. It doesn't affect anything.
<snip>
> +int
> +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> + struct dp_packet_batch *batch)
> +{
One important issue here. netdev_linux_send() is thread-safe, because
all the syscalls and memory allocations there are thread-safe.
However, all the xsk_ring_* APIs are not thread safe and if two
threads will try to send packets to the same tx queue they might
destroy the rings. So, it's necessary to start using 'concurrent_txq'
flag with per-queue locks.
Note that 'concurrent_txq' == 'false' only if 'n_txq' > 'n_pmd_threads'.
> + struct umem_elem *elems_pop[BATCH_SIZE];
> + struct umem_elem *elems_push[BATCH_SIZE];
> + uint32_t tx_done, idx_cq = 0;
> + struct dp_packet *packet;
> + uint32_t idx = 0;
> + int j, ret, retry_count = 0;
> + const int max_retry = 4;
> +
> + ret = umem_elem_pop_n(&xsk->umem->mpool, batch->count, (void
> **)elems_pop);
> + if (OVS_UNLIKELY(ret)) {
> + return EAGAIN;
> + }
> +
> + /* Make sure we have enough TX descs */
> + ret = xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx);
> + if (OVS_UNLIKELY(ret == 0)) {
> + umem_elem_push_n(&xsk->umem->mpool, batch->count, (void
> **)elems_pop);
> + return EAGAIN;
> + }
> +
> + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> + struct umem_elem *elem;
> + uint64_t index;
> +
> + elem = elems_pop[i];
> + /* Copy the packet to the umem we just pop from umem pool.
> + * We can avoid this copy if the packet and the pop umem
> + * are located in the same umem.
> + */
> + memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
> +
> + index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> + xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
> + xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
> + = dp_packet_size(packet);
> + }
> + xsk_ring_prod__submit(&xsk->tx, batch->count);
> + xsk->outstanding_tx += batch->count;
> +
> + ret = kick_tx(xsk);
> + if (OVS_UNLIKELY(ret)) {
> + umem_elem_push_n(&xsk->umem->mpool, batch->count, (void
> **)elems_pop);
> + VLOG_WARN_RL(&rl, "error sending AF_XDP packet: %s",
> + ovs_strerror(ret));
> + return ret;
> + }
> +
> +retry:
> + /* Process CQ */
> + tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count, &idx_cq);
> + if (tx_done > 0) {
> + xsk->outstanding_tx -= tx_done;
> + xsk->tx_npkts += tx_done;
> + }
> +
> + /* Recycle back to umem pool */
> + for (j = 0; j < tx_done; j++) {
> + struct umem_elem *elem;
> + uint64_t addr;
> +
> + addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
> +
> + elem = ALIGNED_CAST(struct umem_elem *,
> + (char *)xsk->umem->buffer + addr);
> + elems_push[j] = elem;
> + }
> +
> + ret = umem_elem_push_n(&xsk->umem->mpool, tx_done, (void **)elems_push);
> + ovs_assert(ret == 0);
> +
> + xsk_ring_cons__release(&xsk->umem->cq, tx_done);
> +
> + if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) {
> + /* If there are still a lot not transmitted, try harder. */
> + if (retry_count++ > max_retry) {
> + return 0;
> + }
> + goto retry;
> + }
> +
> + return 0;
> +}
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev