Hi Hui,

Sorry I couldn’t update, In my case issue was happening due to overheating of 
NIC (temperature was close to 85 degree Celsius). After setting system FAN to 
full speed I could able to pump traffic at rate of 40gpbs for more the 4 days 
without any failure.

Regards,
Amar
From: Hui Liu <[email protected]>
Date: Thursday, 26 July 2018 at 8:01 AM
To: Amarnath Nallapothula <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: Re: [dpdk-users] occasionally traffic stalls due to rx and tx 
descriptor not available

Hi Amar,

I finally reproduced my problem in my own testbed and not surprisingly saw 
almost the same problem as you, except it is on Intel 82599 ixgbe port:


75 static __rte_always_inline int

 76 ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)

 77 {

 78         struct ixgbe_tx_entry_v *txep;

 79         uint32_t status;

 80         uint32_t n;

 81         uint32_t i;

 82         int nb_free = 0;

 83         struct rte_mbuf *m, *free[RTE_IXGBE_TX_MAX_FREE_BUF_SZ];

 84

 85         /* check DD bit on threshold descriptor */

 86         status = txq->tx_ring[txq->tx_next_dd].wb.status;

 87         if (!(status & IXGBE_ADVTXD_STAT_DD))

 88                 return 0;

 89





(gdb) n

87

in 
/home/admin/hui/monitor_platform_new/Common/dpdk-stable-18.02.2/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h

(gdb) p/x status

$10 = 0x138000

and:

#define IXGBE_ADVTXD_STAT_DD IXGBE_TXD_STAT_DD

#define IXGBE_TXD_STAT_DD 0x00000001

When everything goes fine:

(gdb) p/x status

$5 = 0x1038001


I assume we might encounter the same problem.. There might be some code to eat 
these tx descriptors and never put them back to "DONE" status so that they 
would be freed any more, but might need some time to figure it out.. Do you 
have any luck to get updates on your case?


Regards,
Hui

On Fri, Jul 6, 2018 at 4:49 AM, Amarnath Nallapothula 
<[email protected]<mailto:[email protected]>> 
wrote:
I debugged further by attaching my process to gdb and as I suspected 
transmission is failing due to no free descriptor available and code is unable 
to free as well due to following condition in fm10k driver.

static inline int __attribute__((always_inline))
fm10k_tx_free_bufs(struct fm10k_tx_queue *txq)
{
        struct rte_mbuf **txep;
        uint8_t flags;
        uint32_t n;
        uint32_t i;
        int nb_free = 0;
        struct rte_mbuf *m, *free[RTE_FM10K_TX_MAX_FREE_BUF_SZ];

        /* check DD bit on threshold descriptor */
        flags = txq->hw_ring[txq->next_dd].flags;
        if (!(flags & FM10K_TXD_FLAG_DONE))
                return 0; <== returns from here.

Breakpoint 5, fm10k_xmit_pkts_vec (tx_queue=0x7fde3e430040, 
tx_pkts=0x7fde913eda40, nb_pkts=32)
    at /src/dpdk/drivers/net/fm10k/fm10k_rxtx_vec.c:826
826     in /src/dpdk/drivers/net/fm10k/fm10k_rxtx_vec.c
(gdb) p *(struct fm10k_tx_queue *)tx_queue
$19 = {sw_ring = 0x7fde3e42f000, hw_ring = 0x7fde3e3aef80, hw_ring_phys_addr = 
14490988416,
  rs_tracker = {list = 0x7fde3e3aef00, head = 0x0, tail = 0x0, endp = 0x0},
  ops = 0x8adec8 <vec_txq_ops>, last_free = 0, next_free = 191, nb_free = 0, 
nb_used = 0,
  free_thresh = 32, rs_thresh = 32, next_rs = 191, next_dd = 223, tail_ptr = 
0x7fde52020014,
  txq_flags = 3841, nb_desc = 512, port_id = 0 '\000', tx_deferred_start = 0 
'\000', queue_id = 0,
  tx_ftag_en = 0}
(gdb) p /x ((struct fm10k_tx_queue *)tx_queue)->hw_ring[223].flags
$21 = 0x60
(gdb) p 0x80 & ((struct fm10k_tx_queue *)tx_queue)->hw_ring[223].flags
$22 = 0
(gdb)

Looks like driver/NIC is unable to transmit packet and hence flags is still not 
set to FM10K_TXD_FLAG_DONE. But I am still not sure where is the problem.

Regards,
Amar

From: Hui Liu <[email protected]<mailto:[email protected]>>
Date: Friday, 6 July 2018 at 9:06 AM
To: Amarnath Nallapothula 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [dpdk-users] occasionally traffic stalls due to rx and tx 
descriptor not available

Hi Amar,

I'm a DPDK newbie and I saw a similar problem recently on one 82599 port. My 
app is doing a job like this:
1. TX thread calls rte_pktmbuf_alloc() to allocate buffers from mbuf_pool and 
fills it as ICMP packet and sends out, with speed of around 400,000 
packets/sec, 1.6Gbps;
2. RX thread receives ICMP responses and worker threads work with the responses.

This app was running fine for some time, typically from 8 hours to 5 days 
randomly, then it goes into a bad state, that TX thread could not send packets 
out any more via rte_eth_tx_buffer() or rte_eth_tx_buffer_flush() while 
rte_eth_tx_buffer_count_callback() is called for all packets flush. I'm highly 
suspecting the problem with descriptor exhausted but not get it clear yet..

In my app, I set max pkt burst as 256, rx descriptor as 2048, tx descriptor as 
4096 with single rx/tx queue for one port to get good performance, not sure if 
they are the best combination. Just FYI. For descriptor problem, I'm still 
investigating on what kind of behavior/condition takes descriptors and never 
release it, just as your Query 2. If applicable, would you please let me know 
if there is a way to get the number of available tx/rx descriptor of ports and 
I could see when descriptors are really taken without being released time by 
time?

Due to my system environment limit, I'm not able to directly attach gdb to 
debug... While I'm investigating this problem, would you please update me when 
you have any clue on your issue and I might get some inspiration from you?

Thank you very much!

Regards,
Hui

On Thu, Jul 5, 2018 at 4:34 AM, Amarnath Nallapothula 
<[email protected]<mailto:[email protected]>> 
wrote:
Hi Experts,

I am testing performance of my dpdk based application which forwards packets 
from port 1 to port 2 of 40G NIC card and via versa.Occasionally we see that 
packets rx and tx stops on one of the port. I looked through the dpdk’s fm10k 
driver’s code and found out that this could happen if rx/tx descriptors are not 
available.

To improve performance, I am using RSS functionality and created five rx and tx 
queue. Dedicated lcores are assigned to forward packets from port1 queue 0 to 
port2 queue 0 and via versa.

During port initialization rx_queue is initialized with 128 Rx ring descriptor 
size and tx_queue  is initialized 512 Tx ring descriptor. Threshold values are 
left default.

I have few queries here:

  1.  Is above initialization value for rx and tx descriptor is good for each 
queue for given port.
  2.  Under what conditions rx and tx descriptor gets exhausted?
  3.  Any suggestion or information you can provide to debug this issue?

Regards,
Amar


Reply via email to