Hi Hao,

The current design of the application very roughly is following:
1. There is one main thread which pumps out the packets from the NIC queues
using rte_eth_rx_burst(), as you said. In the future we may need several
main threads to be able to scale the application. Each one of them will
work on separate groups of RX queues. The main thread distributes the
received packets to N other threads using single producer single consumer
rings provided by DPDK (rte_ring).
2. Each one of these N other threads runs a separate F-stack version. As I
said we use a networking stack per thread and share nothing design. Let's
call them worker threads for now.
3. Each worker thread has its own spsc_ring for the incoming packets and
uses a separate NIC queue to send the outgoing packets using
rte_eth_tx_burst.
The main loop of such worker thread looks roughly in the following way
(pseudo code):
while (not stopped) {

    if (X_milliseconds_have_passed)

        call_fstack_tick_functionality();



    send_queued_fstack_packets(); // using rte_eth_tx_burst



    dequeue_incoming_packets_from_spsc_ring();



    enqueue_the_incoming_packets_to_the_fstack();



    if (Y_milliseconds_have_passed)

        process_fstack_socket_events_using_epoll();

}
You may not the following things from the above code.
- The packets are sent (rte_eth_tx_burst) in the same thread where the
socket events are processed. The outgoing packets are also sent if we queue
enough of them while processing socket write events but this will
complicate the explanation here.
- The timer events and the socket events are not processed on each
iteration of the loop. These milliseconds come from a config file and are
measured using rte_rdtsc.
- The loop is very similar to the one present in the F-stack itself -
https://github.com/F-Stack/f-stack/blob/dev/lib/ff_dpdk_if.c#L1817. It's
just that in our case this loop is decoupled from the F-stack because we
removed the DPDK from the F-stack in order to use the latter as a separate
library and use a separate networking stack per thread.
4. The number of worker threads is configurable via the application config
file and the application sets up the NIC with the same number of RX/TX
queues as the number of worker threads. This way the main thread pumps out
packets from N RX queues and each worker thread enqueues packets to each
own TX queue i.e. there is no sharing. So the application may run with
single RX/TX queue and then it'll have one main thread and one worker
thread. Or may run with 10 RX/TX queues and then it'll have 1 main thread
and 10 worker threads. It depends on the traffic amount that we expect to
handle, the NIC capabilities, etc.

Regards,
Pavel.

Reply via email to