Hi all,


We find a core dump file in our environment with ovs+dpdk, the following is 
call stack:



#0  0x00007f354d6eb237 in raise () from /lib64/libc.so.6

#1  0x00007f354d6ec928 in abort () from /lib64/libc.so.6

#2  0x00000000006a6d99 in PAT_abort ()

#3  0x00000000006a3edd in patchIllInsHandler ()

#4  <signal handler called>

#5  0x00000000004c2275 in rehash (hash=1661256324, impl=0x0) at lib/ccmap.c:135

#6  ccmap_find (ccmap=ccmap@entry=0x26b654b8, hash=1661256324) at 
lib/ccmap.c:229

#7  0x00000000004bf73b in find_match_wc (wc=0x7f350ce67370, n_tries=0, 
trie_ctx=0x7f350ce63e20, flow=0x7f350ce66dc0,

    version=18446744073709551614, subtable=<optimized out>) at 
lib/classifier.c:1676

#8  classifier_lookup__ (cls=cls@entry=0xee2360 <cls>, 
version=version@entry=18446744073709551614,

    flow=flow@entry=0x7f350ce66dc0, wc=wc@entry=0x7f350ce67370, 
allow_conjunctive_matches=allow_conjunctive_matches@entry=true)

    at lib/classifier.c:972

#9  0x00000000004c193b in classifier_lookup (cls=cls@entry=0xee2360 <cls>, 
version=version@entry=18446744073709551614,

    flow=flow@entry=0x7f350ce66dc0, wc=wc@entry=0x7f350ce67370) at 
lib/classifier.c:1166

#10 0x0000000000575acb in tnl_port_map_lookup (flow=flow@entry=0x7f350ce66dc0, 
wc=wc@entry=0x7f350ce67370)

    at lib/tnl_ports.c:287

#11 0x00000000004a7049 in compose_output_action__ 
(ctx=ctx@entry=0x7f350ce65d60, ofp_port=1, xr=0x0,

---Type <return> to continue, or q <return> to quit---

    check_stp=check_stp@entry=true) at ofproto/ofproto_dpif_xlate.c:3753

#12 0x00000000004a7967 in compose_output_action (xr=<optimized out>, 
ofp_port=<optimized out>, ctx=0x7f350ce65d60)

    at ofproto/ofproto_dpif_xlate.c:3832

#13 output_normal (ctx=ctx@entry=0x7f350ce65d60, 
out_xbundle=out_xbundle@entry=0x2cfe8120, xvlan=xvlan@entry=0x7f350ce655e0)

    at ofproto/ofproto_dpif_xlate.c:2288

#14 0x00000000004a7df5 in xlate_normal_flood (ctx=ctx@entry=0x7f350ce65d60, 
in_xbundle=in_xbundle@entry=0x9b547c0,

    xvlan=xvlan@entry=0x7f350ce655e0) at ofproto/ofproto_dpif_xlate.c:2757

#15 0x00000000004aaf39 in xlate_normal (ctx=0x7f350ce65d60) at 
ofproto/ofproto_dpif_xlate.c:2987

#16 xlate_output_action (ctx=ctx@entry=0x7f350ce65d60, port=<optimized out>, 
max_len=<optimized out>,

    may_packet_in=may_packet_in@entry=true) at ofproto/ofproto_dpif_xlate.c:4577

#17 0x00000000004a920e in do_xlate_actions (ofpacts=<optimized out>, 
ofpacts_len=<optimized out>, ctx=ctx@entry=0x7f350ce65d60)

    at ofproto/ofproto_dpif_xlate.c:5565

#18 0x00000000004ac25f in xlate_actions (xin=xin@entry=0x7f350ce66db0, 
xout=xout@entry=0x7f350ce675f0)

   at ofproto/ofproto_dpif_xlate.c:6456

#19 0x000000000049d4b7 in xlate_key (key=<optimized out>, len=<optimized out>, 
push=push@entry=0x7f350ce670f0,

    ctx=ctx@entry=0x7f350ce675d0, udpif=<optimized out>) at 
ofproto/ofproto_dpif_upcall.c:1974

#20 0x000000000049da48 in xlate_ukey (ukey=0x7f34ec032a20, ukey=0x7f34ec032a20, 
ctx=0x7f350ce675d0, tcp_flags=<optimized out>,

---Type <return> to continue, or q <return> to quit---

    udpif=0x6b6c550) at ofproto/ofproto_dpif_upcall.c:1986

#21 revalidate_ukey__ (udpif=udpif@entry=0x6b6c550, 
ukey=ukey@entry=0x7f34ec032a20, tcp_flags=<optimized out>,

    odp_actions=0x7f350ce679e0, recircs=recircs@entry=0x7f350ce679d0, 
xcache=<optimized out>)

   at ofproto/ofproto_dpif_upcall.c:2032

#22 0x000000000049dc8d in revalidate_ukey (udpif=udpif@entry=0x6b6c550, 
ukey=ukey@entry=0x7f34ec032a20,

    stats=stats@entry=0x7f350ce688b8, 
odp_actions=odp_actions@entry=0x7f350ce679e0, 
reval_seq=reval_seq@entry=268144954,

    recircs=recircs@entry=0x7f350ce679d0) at ofproto/ofproto_dpif_upcall.c:2128

#23 0x00000000004a0853 in revalidate (revalidator=0x75715d0) at 
ofproto/ofproto_dpif_upcall.c:2488

#24 0x00000000004a0a4b in udpif_revalidator (arg=0x75715d0) at 
ofproto/ofproto_dpif_upcall.c:958

#25 0x000000000054ca48 in ovsthread_wrapper (aux_=<optimized out>) at 
lib/ovs_thread.c:682

#26 0x00007f354f303dd5 in start_thread () from /lib64/libpthread.so.0

#27 0x00007f354d7b359d in clone () from /lib64/libc.so.6



It is raised by the wrong use of rcu_postpone when deleting subtable from 
classifier->subtables(pvector), the following is the triggering steps:

1.       Modify a hnic port's ip(by ifconfig hnic x.x.x.x/24)

2.       Ovs route's reset will flush all entries in the tnl_ports's 
classifier(cls)

3.       Ovs main thread call destroy_subtable(): call ovsrcu_postpone() to 
free the subtale,it occurs the ovsrcu_flush_cbset() by accident. But 
cls->subtables->impl  still stores the deleted subtable's pointer,it will be 
moved out in pvector_publish(in step 7)

4.       Rcu thread start to ovsrcu_synchronize() before free the subtable in 
flush_list.( support target_seqno is 2, main thread's seq_no is smaller than 
2,support it is 1.)

5.       The pmd thread is still working. It adds the global_seqno to 3.

6.       Revalidator thread wakes up form poll_block with seq_no 3. Then, It 
acquires the subtable's pointer in cls->sutables which shoud not be used.

7.       Ovs main thread call pvector_publish and go to poll_block(), moves out 
from ovsrcu_threads

8.       ovsrcu_synchronize in step 4 find all thread's seq_no is larger than 
2,free the subtable

9.       revalidator is still using the subtable's pointer(has been freed),then 
coredump!!!



Is there any useful patch to fix this bug? Can we call the postpone(to free 
subtable) after  pvector_publish?



Looking forward to your reply



Thanks.

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to