Hi Ed,

On 05.04.2025 01:00, Lombardo, Ed wrote:

Hi,

I have an application where we receive packets and transmit them.  The packet data is inspected and later mbuf is freed to mempool.

The pipeline is such that the Rx packet mbuf is saved to rx worker ring, then the application threads process the packets and decides if to transmit the packet and if true then increments the mbuf to a value of 2.

Do I understand the pipeline correctly?

Rx thread:

    receive mbuf
    put mbuf into the ring
    inspect mbuf
    free mbuf

Worker thread:

    take mbuf from the ring
    if decided to transmit it,
        increment refcnt
        transmit mbuf

If so, there's a problem that after Rx thread puts mbuf into the ring, mbuf is owned by Rx thread and the ring, so its refcnt must be 2 when it enters the ring:

Rx thread:

    receive mbuf
    increment refcnt
    put mbuf into the ring
    inspect mbuf
    free mbuf (just decrements refcnt if > 1)

Worker thread:

    take mbuf from the ring
    if decided to transmit it,
        transmit (or put into the bulk transmitted later)
    else
        free mbuf (just decrements refcnt if > 1)

The batch of mbufs to transmit are put in a Tx ring queue for the Tx thread to pull from and call the DPDK rte_eth_tx_burst() with the batch of mbufs (limited to 400 mbufs).  In theory the transmit operation will decrement the mbuf refcnt.  In our application we could see the tx of the mbuf followed by another application thread that calls to free the mbufs, or vice versa.  We have no way to synchronize these threads.

Is the mbuf refcnt updates thread safe to allow un-deterministic handling of the mbufs among multiple threads?  The decision to transmit the mbuf and increment the mbuf refcnt and load in the tx ring is completed before the application says it is finished and frees the mbufs.

Have you validated this assumption?
If my understanding above is correct, there's no synchronization and thus no guarantees.

I am seeing in my error checking code the mbuf refcnt contains large values like 65520, 65529, 65530, 65534, 65535 in the early pipeline stage refcnt checks.

I read online and in the DPDK code that the mbuf refcnt update is atomic, and is thread safe; so, this is good.

Now this part is unclear to me and that is when the rte_eth_tx_burst() is called and returns the number of packets transmitted , does this  mean that transmit of the packets are completed and mbuf refcnt is decremented by 1 on return, or maybe the Tx engine queue is populated and mbuf refcnt is not decremented until it is actually transmitted, or much worse later in time.

Is the DPDK Tx operation intended to be the last stage of any pipeline that will free the mbuf if successfully transmitted?

Return from rte_eth_tx_burst() means that mbufs are queued for transmission.
Hardware completes transmission asynchronously.
The next call to rte_eth_tx_burst() will poll HW,
learn status of mbufs *previously* queued,
and calls rte_pktmbuf_free() for those that are transmitted.
The latter will free mbufs to mempool if and only if refcnt == 1.

Reply via email to