On 1/14/2020 4:21 PM, Eelco Chaudron wrote:


On 14 Jan 2020, at 16:21, Stokes, Ian wrote:

On 1/14/2020 2:13 PM, Eelco Chaudron wrote:


On 14 Jan 2020, at 12:23, Stokes, Ian wrote:

On 1/13/2020 8:32 PM, Stokes, Ian wrote:

<SNIP>

Hi Eelco, I'm seeing a crash in OVS while running this with just a port and a default queue 0 (phy to phy setup). It seems related to the call to rte_meter_trtcm_rfc4115_color_blind_check. I've provided more detail below in the trtcm_policer_run_single_packet function, just wondering if you've come across it?

Heres the output for the qos configuration I'm using

-bash-4.4$ovs-appctl -t ovs-vswitchd qos/show dpdk1
QoS: dpdk1 trtcm-policer
eir: 52000
cbs: 2048
ebs: 2048
cir: 52000

Default:
  eir: 52000
  cbs: 2048
  ebs: 2048
  cir: 52000
  tx_packets: 672150
  tx_bytes: 30918900
  tx_errors: 489562233

I'll try to investigate further with DPDK and GDB also.

I tried to replicate this, but I’m not able to do so. How did you test? Reconfiguring it and start, etc. etc.?

Starting a fresh instance of OVS (cleared previous OVSDB etc.).

dpdk-socket-mem="1024,0"
dpdk-lcore-mask="0x2"
pmd-cpu-mask="0xC"

2 phy ports only, 1 rxq per phy port.

Flow rules are basic (in port 1 out port 2)

Traffic profile is IPv4 UDP 64 byte packets at line rate (10G)

QoS Setup with the following

sudo $OVS_DIR/utilities/ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
--id=@myqos create qos type=trtcm-policer \
other-config:cir=52000 other-config:cbs=2048 \
other-config:eir=52000 other-config:ebs=2048

From there it's a case of leaving traffic run (between 10 to 15 mins) before the segfault occurs.

Tried a couple of runs, but no luck…

We've tried on a second system as well, but were not able to reproduce it, it may be specific to the first test board I've used in this case. Shouldn't be a blocker. I'll look at the v5 but think we're close to merging.

Regards
Ian



<SNIP>


A few times during testing I have seen OVS crash with the following

./launch_vswitch.sh: line 66: 11694 Floating point exception sudo $OVS_DIR/vswitchd/ovs-vswitchd unix:$DB_SOCK --pidfile

Looking into it with GDB it sems related to the rte_meter_trtcm_rfc4115_color_blind_check above. See GDB output below.

Thread 12 "pmd-c03/id:9" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7f3dce734700 (LWP 26465)]
0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599
599             n_periods_te = time_diff_te / p->eir_period;
(gdb) bt
#0  0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599 #1  0x0000000000d4abc1 in trtcm_policer_run_single_packet (policer=0x2774200, pkt=0x1508fbd40, time=29107058565136113) at lib/netdev-dpdk.c:4649 #2  0x0000000000d4ad24 in trtcm_policer_run (conf=0x2774200, pkts=0x7f3db8005100, pkt_cnt=32, should_steal=true) at lib/netdev-dpdk.c:4691 #3  0x0000000000d45299 in netdev_dpdk_qos_run (dev=0x17fd68840, pkts=0x7f3db8005100, cnt=32, should_steal=true) at lib/netdev-dpdk.c:2421 #4  0x0000000000d45db0 in netdev_dpdk_send__ (dev=0x17fd68840, qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev-dpdk.c:2683 #5  0x0000000000d45ee9 in netdev_dpdk_eth_send (netdev=0x17fd688c0, qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev-dpdk.c:2710 #6  0x0000000000c342ba in netdev_send (netdev=0x17fd688c0, qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev.c:814 #7  0x0000000000beb3de in dp_netdev_pmd_flush_output_on_port (pmd=0x7f3dce735010, p=0x7f3db80050c0) at lib/dpif-netdev.c:4224 #8  0x0000000000beb5c4 in dp_netdev_pmd_flush_output_packets (pmd=0x7f3dce735010, force=false) at lib/dpif-netdev.c:4264 #9  0x0000000000beb814 in dp_netdev_process_rxq_port (pmd=0x7f3dce735010, rxq=0x328d930, port_no=2) at lib/dpif-netdev.c:4319 #10 0x0000000000bef432 in pmd_thread_main (f_=0x7f3dce735010) at lib/dpif-netdev.c:5556 #11 0x0000000000cb24e5 in ovsthread_wrapper (aux_=0x326d220) at lib/ovs-thread.c:383 #12 0x00007f3de11d236d in start_thread (arg=0x7f3dce734700) at pthread_create.c:456 #13 0x00007f3de06bab4f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Looks like a divide by zero, but that was fixed and seems to be in DPDK v19.11, ebe3a769911071450acb808153ec2a2496726906

I've confirmed I'm testing against 19.11.0 and that commit is present.



So for some reason rte_meter_get_tb_params() might return 0 in eir_period. Looking at the code, I would say this could only really happen if rte_get_tsc_hz() returns 0, which seems odd… Could this happen in your system for some reason?

I don't think rte_get_tsc_hz is returning 0, at east the values for the function call in GDB don't seem to suggest this. Snippet below from rte_meter_trtcm_rfc4115_color_blind_check that I'm checking with GDB.

rte_meter_trtcm_rfc4115_color_blind_check(struct rte_meter_trtcm_rfc4115 *m, struct rte_meter_trtcm_rfc4115_profile *p, uint64_t time, uint32_t pkt_len)

{



    uint64_t time_diff_tc, time_diff_te, n_periods_tc, n_periods_te,
    tc, te;






    /* Bucket update */
    time_diff_tc = time - m->time_tc;
    time_diff_te = time - m->time_te;



    n_periods_tc = time_diff_tc / p->cir_period;
    n_periods_te = time_diff_te / p->eir_period;

Looking at values with GDB gives the following

(gdb) p *p
$1 = {cbs = 2048, ebs = 2048, cir_period = 44230, cir_bytes_per_period = 1, eir_period = 44230, eir_bytes_per_period = 1}
(gdb) p time_diff_tc
$2 = 29137292739849292
(gdb) p n_periods_tc
$3 = 13937399
(gdb) p time
$4 = 29137292739849292
(gdb) p m->time_tc
$5 = 29137292739858117
(gdb) p *m
$6 = {time_tc = 29137292739858117, time_te = 29137292739858117, tc = 2048, te = 2048}
(gdb) p time_diff_te
$7 = 29137292739849289
(gdb) p p->eir_period
$8 = 44230
(gdb) p n_periods_te
$9 = 140260075883992

I don't have another board to test on at the moment but will try.

Odd, do not see how this would create a SIGFPE…

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to