Hi,

Add some updates:

1.
We added a debug attribute `state_before ` to the ukey to record more
life cycle details of  a ukey:
state_where = 0x55576027b868 "ofproto/ofproto-dpif-upcall.c:XXXX",
[1], it is UKEY_DELETED.
state_before = 0x55576027b630 "ofproto/ofproto-dpif-upcall.c:XXXX",
[2], it was UKEY_EVICTED.

[1] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897
[2] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2470

Still, all of the ukeys did the replace action.

2. The ukey circular buffer [1] does not work well, the buffer still
has {0} after a long time run, and the number is absolutely less than
`counter_upcall_ukey_free`.
[1] 
https://github.com/gotostack/ovs/commit/939d88c3c5fcdb446b01f2afa8f1e80c3929db46
And, can not add an `allocate` entry to this buffer for "ukey
xmalloc". The circular buffer
mutex seems not to work well, core many times at
`ovs_mutex_unlock(&ukey_free_buffer.mutex)`.

3. Ilya's patch [2] was applied, but I have not seen the abort log for now.
[2] 
https://github.com/igsilya/ovs/commit/8268347a159b5afa884f5b3008897878b5b520f5

4. dump all ukeys from the core file, we noticed that almost all
UKEY_EVICTED ukeys are changed state at `transition_ukey_at` by the
revalidator thread.
(The `state_thread` attribute of the ukey)
But, the core bt shows the related ukey was changed state at PMD thread.
For instance:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> normal ukey and the revalidator thread:
(struct umap *) 0x55cce9556140:
  (struct udpif_key *) 0x7f3aad584a80:
    state = UKEY_EVICTED
     state_thread = 5
  (struct udpif_key *) 0x7f3aac24ce20:
    state = UKEY_EVICTED
     state_thread = 5
  (struct udpif_key *) 0x7f3aac6526e0:
    state = UKEY_EVICTED
     state_thread = 5
  (struct udpif_key *) 0x7f3aad731970:
    state = UKEY_EVICTED
     state_thread = 5
  (struct udpif_key *) 0x7f3aac91ce50:
    state = UKEY_EVICTED
     state_thread = 5
  (struct udpif_key *) 0x7f3aadd69be0:
    state = UKEY_EVICTED
     state_thread = 5
  (struct udpif_key *) 0x7f3aad759040:
    state = UKEY_EVICTED
     state_thread = 5
  (struct udpif_key *) 0x7f3a8c0d6d50:
    state = UKEY_EVICTED
     state_thread = 5
  (struct udpif_key *) 0x7f3a8c851300:
    state = UKEY_EVICTED
     state_thread = 5

#8  0x000055cce5d7005f in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:422
        auxp = <optimized out>
        aux = {start = 0x55cce5c9c0d0 <udpif_revalidator>, arg =
0x55cce9595780, name = "revalidator\000\000\000\000"}
        id = 5
        subprogram_name = 0x7f3ad00008c0 "\020 "
#9  0x00007f3af2afee65 in start_thread (arg=0x7f3ae2986700) at
pthread_create.c:307
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>>>>>>>>>>>>>>>> core ukey and PMD thread

p * (struct udpif_key *) 0x7f3aac156e80
$2 = {cmap_node = {next = {p = 0x7f3aaec2b3a0}}, key = 0x7f3aac402810,
key_len = 0, mask = 0x0, mask_len = 172, ufid = {u32 = {2563111187,
2445823588, 3143939231, 3011838433}, u64 = {lo = 10504732324808489235,
      hi = 12935747573714826399}}, ufid_present = true, hash =
2623373230, pmd_id = 35, mutex = {lock = {__data = {__lock = 0,
__count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0,
__elision = 0, __list = {
          __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 16
times>, "\377\377\377\377", '\000' <repeats 19 times>, __align = 0},
where = 0x0}, stats = {n_packets = 3, n_bytes = 852, used = 871199854,
tcp_flags = 16},
  created = 871199014, dump_seq = 8822382946, reval_seq = 8822381178,
state = UKEY_DELETED, state_thread = 8

PMD thread ID:
#6  0x000055cce5d7005f in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:422
        auxp = <optimized out>
        aux = {start = 0x55cce5ce2460 <pmd_thread_main>, arg =
0x7f3ab2e6a010, name = "pmd-c35/id:\000:\177\000"}
        id = 8
        subprogram_name = 0x7f3aac0008c0 "p\v\"\255:\177"
#7  0x00007f3af2afee65 in start_thread (arg=0x7f3ae0582700) at
pthread_create.c:307
>>>>>>>>>>>>>>>>>>>>

The running threads are:

# ps -T -o spid,comm  $(pidof ovs-vswitchd)
  SPID COMMAND
100866 ovs-vswitchd
100867 eal-intr-thread
100868 rte_mp_handle
100872 ovs-vswitchd
100873 dpdk_watchdog1
100876 urcu2
100888 ct_clean7
100889 ipf_clean6
100890 hw_offload3
100891 handler4
100892 revalidator5     # 1 revalidator thread
100893 pmd-c03/id:9
100894 pmd-c35/id:8   # Mostly 1 PMD thread is working! Another is idle forever.
100925 vhost_reconn
100926 vhost-events

So, this can prove that there are two threads were trying to
manipulate the same ukey?

* PMD thread replaced the old_ukey and transitioned the state.
* RCU thread freed the ukey mutex.
* The revalidator thread tries to lock the old_ukey mutex.

https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052960.html
Any idea to simulate the race contion?

Thank you.

Regards,
LIU Yulong


On Tue, Feb 27, 2024 at 6:14 PM Eelco Chaudron <echau...@redhat.com> wrote:
>
>
>
> On 27 Feb 2024, at 9:49, LIU Yulong wrote:
>
> > Yes, that makes sense.
> >
> > Another question is how to distinguish the core at line of
> > ovs_mutex_trylock in revalidator_sweep__ is after the free(ukey),
> > since the core trace has no timestamp.
>
> This is hard to figure out without adding a time variable (and make sure it’s 
> not optimized out) in revalidator_sweep__()
>
> As you are using OVS-DPDK, you can read the values from the pmd->ctx.now 
> context, which should be close.
>
> > This line in the function 'ukey_create__' should be the only place
> > where ovs allocated the memory for ukey:
> > https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-upcall.c#L1777
> >
> > Right?
>
> Yes, this should be the only place.
>
> >
> > If it is true, I will update the buffer structure, and a
> > counter_upcall_ukey_allocate as well.
> >
> > On Tue, Feb 27, 2024 at 3:34 PM Eelco Chaudron <echau...@redhat.com> wrote:
> >>
> >>
> >>
> >> On 27 Feb 2024, at 4:44, LIU Yulong wrote:
> >>
> >>> @Eelco, as you suggested, added such circular buffer to my local OVS:
> >>> https://github.com/gotostack/ovs/commit/939d88c3c5fcdb446b01f2afa8f1e80c3929db46
> >>
> >> I should also add allocate logging, or else you might not know if a buffer 
> >> was allocated at the same address.
> >> Maybe add a bool to the record structure to indicate if it’s an allocate 
> >> or free.
> >>
> >> //Eelco
> >>
> >>> gdb shows such data structure:
> >>> 2232     ukey_free_buffer.index = (ukey_free_buffer.index + 1) % (1024
> >>> * 1024);  // Circular buffer
> >>> (gdb) p ukey_free_buffer
> >>> $1 = {
> >>>   records = {{
> >>>       ukey_addr = 0x7f8a0d871700,
> >>>       timestamp = 1709003328
> >>>     }, {
> >>>       ukey_addr = 0x7f8a0f969120,
> >>>       timestamp = 1709003365
> >>>     }, {
> >>>       ukey_addr = 0x7f8a0defe190,
> >>>       timestamp = 1709003393
> >>>     }, {
> >>>       ukey_addr = 0x7f8a0984aea0,
> >>>       timestamp = 1709003452
> >>>     }...},
> >>>   index = 3,
> >>>   mutex = {
> >>>     lock = {
> >>>       __data = {
> >>>         __lock = 1,
> >>>         __count = 0,
> >>>         __owner = 45210,
> >>>         __nusers = 1,
> >>>         __kind = 2,
> >>>         __spins = 0,
> >>>         __elision = 0,
> >>>         __list = {
> >>>           __prev = 0x0,
> >>>           __next = 0x0
> >>>         }
> >>>       },
> >>>       __size = 
> >>> "\001\000\000\000\000\000\000\000\232\260\000\000\001\000\000\000\002",
> >>> '\000' <repeats 22 times>,
> >>>       __align = 1
> >>>     },
> >>>     where = 0x55c35a347d18 "ofproto/ofproto-dpif-upcall.c:2229"
> >>>   }
> >>> }
> >>>
> >>> and counter_upcall_ukey_free is:
> >>> $2 = {name = 0x5622b448f612 "upcall_ukey_free", count = 0x5622b41047f0
> >>> <upcall_ukey_free_count>, total = 79785, last_total = 79785, min = {0,
> >>> 0, 0, 0, 0, 55, 22681, 11703, 13877, 12750, 0, 18719}, hr = {79785,
> >>>     0 <repeats 59 times>}}
> >>>
> >>> Let's see how this goes.
> >>>
> >>> Thank you.
> >>>
> >>> On Tue, Feb 27, 2024 at 9:05 AM LIU Yulong <liuyulong...@gmail.com> wrote:
> >>>>
> >>>> @Ilya, thank you, I will add that patch.
> >>>>
> >>>> @Eelco, thank you again, I will add a RL log to the free(ukey). Hope
> >>>> we can get something useful.
> >>>>
> >>>>
> >>>> On Mon, Feb 26, 2024 at 7:55 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
> >>>>>
> >>>>> On 2/26/24 11:20, Eelco Chaudron wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 26 Feb 2024, at 11:10, LIU Yulong wrote:
> >>>>>>
> >>>>>>> Hi Eelco,
> >>>>>>>
> >>>>>>> Thank you for the quick response.
> >>>>>>>
> >>>>>>> I did not add those logs, because in order to reproduce the issue, we
> >>>>>>> have to send lots of packets to the host.
> >>>>>>> So there are too many ukeys created/deleted to do logging.
> >>>>>>
> >>>>>> Maybe a circular buffer with all alloc/free (+ 1ukey address, and 
> >>>>>> timestamp), 1 or 2 Mb of memory can hold a lot.
> >>>>>>
> >>>>>>> And can we ensure that this [1] is the only place for ovs to free the 
> >>>>>>> ukey?
> >>>>>>>
> >>>>>>> [1] 
> >>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2084
> >>>>>>
> >>>>>> Yes, this should be the only place, and should always be done through 
> >>>>>> an RCU delayed delete.
> >>>>>>
> >>>>>>> For last mail, can this issue be concurrent read-and-update/delete?
> >>>>>>> The revalidator_sweep__ is trying to lock the ukey->mutex, while
> >>>>>>> another thread is updating the ukey->mutex to NULL and free ukey.
> >>>>>>
> >>>>>> This should not happen as the delete should happen by the delayed RCU 
> >>>>>> delete, and if the ukey is still in the cmap after the delayed delete 
> >>>>>> (quiescent state) something is wrong.
> >>>>>
> >>>>>
> >>>>> I agree with Eelco and I don't see any abvious issues with the current
> >>>>> implementation.
> >>>>>
> >>>>> However, the ususal suspect for RCU problems is entering quiescent state
> >>>>> while iterating RCU-protected structure.  Though I'm not sure how that 
> >>>>> can
> >>>>> happen in the revalidator, usually such issues are hiding somewhere way
> >>>>> down the call stack.  I made a small patch that can help to be sure that
> >>>>> this doesn't actually happen in your setup:
> >>>>> https://github.com/igsilya/ovs/commit/8268347a159b5afa884f5b3008897878b5b520f5
> >>>>>
> >>>>> Could you try it?
> >>>>>
> >>>>> The change will log an error message and abort the process if we happen
> >>>>> to enter quiescent state while iterating over the hash map.  Core dump
> >>>>> will point to a problematic call.
> >>>>>
> >>>>> Best regards, Ilya Maximets.
> >>>>>
> >>>>>>
> >>>>>>> LIU Yulong
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Feb 26, 2024 at 5:41 PM Eelco Chaudron <echau...@redhat.com> 
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 26 Feb 2024, at 9:33, LIU Yulong wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have read the code by comparing the call stack of the core files
> >>>>>>>>> carefully, and found
> >>>>>>>>> a potential race condition. Please confirm whether the following 3 
> >>>>>>>>> threads
> >>>>>>>>> have a race condition. Just did some code trace, can such
> >>>>>>>>> race condition happen?
> >>>>>>>>>
> >>>>>>>>> * PMD thread1 ===================================:
> >>>>>>>>> -> pmd_thread_main
> >>>>>>>>> -> dp_netdev_process_rxq_port
> >>>>>>>>> -> dp_netdev_input
> >>>>>>>>> -> dp_netdev_input__
> >>>>>>>>> -> handle_packet_upcall
> >>>>>>>>> -> dp_netdev_upcall
> >>>>>>>>> -> upcall_cb
> >>>>>>>>> -> ukey_install
> >>>>>>>>> -> ukey_install__
> >>>>>>>>> -> try_ukey_replace:
> >>>>>>>>>             ovs_mutex_lock(&new_ukey->mutex);
> >>>>>>>>> <---------- the CMAP_FOR_EACH loop in the revalidator_sweep__ run a
> >>>>>>>>> bit earlier than the cmap_replace next line, so the old_ukey can be
> >>>>>>>>> iterated. [1]
> >>>>>>>>>             cmap_replace(&umap->cmap, &old_ukey->cmap_node,
> >>>>>>>>>                          &new_ukey->cmap_node, new_ukey->hash);
> >>>>>>>>>             ovsrcu_postpone(ukey_delete__, old_ukey);
> >>>>>>>>> <-------- delete the ukey asynchronously. [2]
> >>>>>>>>>             transition_ukey(old_ukey, UKEY_DELETED);       <--------
> >>>>>>>>> transition the ukey state to UKEY_DELETED, most core files show that
> >>>>>>>>> the ukey last state change was at this line. [3]
> >>>>>>>>>             transition_ukey(new_ukey, UKEY_VISIBLE);
> >>>>>>>>>
> >>>>>>>>> [1] 
> >>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1892
> >>>>>>>>> [2] 
> >>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1896
> >>>>>>>>> [3] 
> >>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897
> >>>>>>>>>
> >>>>>>>>> This function try_ukey_replace was called many times, because the
> >>>>>>>>> `counter_upcall_ukey_replace` is not zero.
> >>>>>>>>> For instance:
> >>>>>>>>> {
> >>>>>>>>>   name = 0x55ba9755206b "upcall_ukey_replace",
> >>>>>>>>>   count = 0x55ba971c7610 <upcall_ukey_replace_count>,
> >>>>>>>>>   total = 2287997,
> >>>>>>>>>   last_total = 2287997,
> >>>>>>>>>   min = {221, 247, 444, 278, 324, 570, 379, 464, 283, 280, 0, 427},
> >>>>>>>>>   hr = {3300, 4378, 3557, 4554, 3748, 3710, 4340, 3559, 4296, 3759,
> >>>>>>>>> 3522, 4136, 3660, 4428, 3802, 3652, 3880, 3375, 4806, 4221, 4158,
> >>>>>>>>> 3816, 3750, 3846, 3761, 3653, 4293, 3816, 3723, 3691, 4033, 468, 
> >>>>>>>>> 4117,
> >>>>>>>>> 3659, 4007, 3536,
> >>>>>>>>>     3439, 4440, 3388, 4079, 3876, 3865, 4339, 3757, 3481, 4027, 
> >>>>>>>>> 3989,
> >>>>>>>>> 3633, 3737, 3564, 3403, 3992, 3793, 4390, 4124, 4354, 4164, 4383,
> >>>>>>>>> 4237, 3667}
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> * RCU thread2 ===================================:
> >>>>>>>>> -> ovsrcu_postpone_thread
> >>>>>>>>> -> ovsrcu_call_postponed
> >>>>>>>>> -> ukey_delete__                                <------------ This
> >>>>>>>>> function is not thead safe IMO, it has mark
> >>>>>>>>> OVS_NO_THREAD_SAFETY_ANALYSIS. [4]
> >>>>>>>>>
> >>>>>>>>>         recirc_refs_unref(&ukey->recircs);
> >>>>>>>>>         xlate_cache_delete(ukey->xcache);
> >>>>>>>>>         ofpbuf_delete(ovsrcu_get(struct ofpbuf *, &ukey->actions));
> >>>>>>>>>         ovs_mutex_destroy(&ukey->mutex);         <-------------- 
> >>>>>>>>> Just
> >>>>>>>>> set ukey mutex to NULL. [5][6][7]
> >>>>>>>>>         free(ukey);
> >>>>>>>>>
> >>>>>>>>> [4] 
> >>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2074
> >>>>>>>>> [5] 
> >>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2083
> >>>>>>>>> [6] 
> >>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L131
> >>>>>>>>> [7] 
> >>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L124
> >>>>>>>>>
> >>>>>>>>> * revalidator thread3 ===================================:
> >>>>>>>>>
> >>>>>>>>> -> udpif_revalidator
> >>>>>>>>> -> revalidator_sweep
> >>>>>>>>> -> revalidator_sweep__
> >>>>>>>>>
> >>>>>>>>>         CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
> >>>>>>>>>             enum ukey_state ukey_state;
> >>>>>>>>>
> >>>>>>>>>             if (ovs_mutex_trylock(&ukey->mutex)) {  <--------------
> >>>>>>>>> Core at here, because of the NULL pointer. [8]
> >>>>>>>>>                 continue;
> >>>>>>>>>             }
> >>>>>>>>> [8] 
> >>>>>>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2900
> >>>>>>>>>
> >>>>>>>>> CMIIW, if this race condition can happen, IMO, it is mostly because
> >>>>>>>>> the umap is not locked during the sweep CMAP_FOR_EACH loop.
> >>>>>>>>> Or some RCU protection did not work properly.
> >>>>>>>>
> >>>>>>>> The code looks fine to me, we are taking the lock when we update the 
> >>>>>>>> cmap, which is a requirement, however iterating over the cmap as 
> >>>>>>>> done above does not require a lock.
> >>>>>>>>
> >>>>>>>> If the RCU protection would have been broken, we would have seen a 
> >>>>>>>> lot more errors. Did you have any luck adding the traces I suggested 
> >>>>>>>> earlier, to see if it’s a use after free?
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>>
> >>>>>>>> Eelco
> >>>>>>>>
> >>>>>>>>> On Wed, Feb 21, 2024 at 6:40 PM Eelco Chaudron 
> >>>>>>>>> <echau...@redhat.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 21 Feb 2024, at 4:26, LIU Yulong wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Thank you very much for your reply.
> >>>>>>>>>>>
> >>>>>>>>>>> The problem is not easy to reproduce, we have to wait a random 
> >>>>>>>>>>> long time to see
> >>>>>>>>>>> if the issue happens again. It can be more than one day or longer.
> >>>>>>>>>>> OVS 2.17 with dpdk 20.11 had run to core before, so it's hard to 
> >>>>>>>>>>> say
> >>>>>>>>>>> if it is related to DPDK.
> >>>>>>>>>>> I'm running the ovs without offload to see if the issue can 
> >>>>>>>>>>> happen in
> >>>>>>>>>>> recent days.
> >>>>>>>>>>>
> >>>>>>>>>>> And again, TLDR, paste more thread call stacks.
> >>>>>>>>>>> Most of the threads are in the state of sched_yield, nanosleep,
> >>>>>>>>>>> epoll_wait and  poll.
> >>>>>>>>>>
> >>>>>>>>>> If this looks like a memory trash issue, it might be hard to 
> >>>>>>>>>> figure out. Does the ukey show any kind of pattern, i.e. does the 
> >>>>>>>>>> trashed data look like anything known?
> >>>>>>>>>> Maybe it’s a use after free, so you could add some debugging code 
> >>>>>>>>>> logging/recording all free and xmalloc of the ukey structure, to 
> >>>>>>>>>> see that when it crashes it was actually allocated?
> >>>>>>>>>>
> >>>>>>>>>> Hope this helps you getting started.
> >>>>>>>>>>
> >>>>>>>>>> //Eelco
> >>>>>>>>>>
> >>>>>>>>>>> The following threads are in working state. So hope this can have
> >>>>>>>>>>> clues for investigation.
> >>>>>>>>>>>
> >>>>>>>>>>> Thread 14 (Thread 0x7fd34002b700 (LWP 91928)):
> >>>>>>>>>>> #0  0x00007fd344487b6d in recvmsg () at 
> >>>>>>>>>>> ../sysdeps/unix/syscall-template.S:81
> >>>>>>>>>>> #1  0x0000562773cb8d03 in mp_handle ()
> >>>>>>>>>>> #2  0x00007fd344480e65 in start_thread (arg=0x7fd34002b700) at
> >>>>>>>>>>> pthread_create.c:307
> >>>>>>>>>>> #3  0x00007fd34260988d in clone () at
> >>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >>>>>>>>>>>
> >>>>>>>>>>> Thread 13 (Thread 0x7fd3359d7700 (LWP 91929)):
> >>>>>>>>>>> #0  0x00007fd34448799d in accept () at 
> >>>>>>>>>>> ../sysdeps/unix/syscall-template.S:81
> >>>>>>>>>>> #1  0x0000562773cd8f3c in socket_listener ()
> >>>>>>>>>>> #2  0x00007fd344480e65 in start_thread (arg=0x7fd3359d7700) at
> >>>>>>>>>>> pthread_create.c:307
> >>>>>>>>>>> #3  0x00007fd34260988d in clone () at
> >>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >>>>>>>>>>>
> >>>>>>>>>>> Thread 6 (Thread 0x7fd304663700 (LWP 91965)):
> >>>>>>>>>>> #0  0x00007fd34448771d in read () at 
> >>>>>>>>>>> ../sysdeps/unix/syscall-template.S:81
> >>>>>>>>>>> #1  0x00007fd343b42bfb in _mlx5dv_devx_get_event () from 
> >>>>>>>>>>> /lib64/libmlx5.so.1
> >>>>>>>>>>> #2  0x0000562773936d86 in mlx5_vdpa_event_handle ()
> >>>>>>>>>>> #3  0x00007fd344480e65 in start_thread (arg=0x7fd304663700) at
> >>>>>>>>>>> pthread_create.c:307
> >>>>>>>>>>> #4  0x00007fd34260988d in clone () at
> >>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >>>>>>>>>>>
> >>>>>>>>>>> Thread 2 (Thread 0x7fd305730700 (LWP 91943)):
> >>>>>>>>>>> #0  ccmap_find_slot_protected (count=<synthetic pointer>,
> >>>>>>>>>>> hash=hash@entry=1669671676, b=b@entry=0x7fd2f8012a80) at
> >>>>>>>>>>> lib/ccmap.c:278
> >>>>>>>>>>> #1  ccmap_inc_bucket_existing (b=b@entry=0x7fd2f8012a80,
> >>>>>>>>>>> hash=hash@entry=1669671676, inc=inc@entry=1) at lib/ccmap.c:281
> >>>>>>>>>>> #2  0x0000562773d4b015 in ccmap_try_inc
> >>>>>>>>>>> (impl=impl@entry=0x7fd2f8012a40, hash=hash@entry=1669671676,
> >>>>>>>>>>> inc=inc@entry=1) at lib/ccmap.c:464
> >>>>>>>>>>> #3  0x0000562773d4b224 in ccmap_inc 
> >>>>>>>>>>> (ccmap=ccmap@entry=0x7fd2f802a7e8,
> >>>>>>>>>>> hash=1669671676) at lib/ccmap.c:485
> >>>>>>>>>>> #4  0x0000562773d4975a in classifier_replace (cls=<optimized out>,
> >>>>>>>>>>> rule=rule@entry=0x7fd2fac70e28, version=<optimized out>,
> >>>>>>>>>>> conjs=<optimized out>, n_conjs=<optimized out>)
> >>>>>>>>>>>     at lib/classifier.c:579
> >>>>>>>>>>> #5  0x0000562773d49e99 in classifier_insert (cls=<optimized out>,
> >>>>>>>>>>> rule=rule@entry=0x7fd2fac70e28, version=<optimized out>,
> >>>>>>>>>>> conj=<optimized out>, n_conj=<optimized out>)
> >>>>>>>>>>>     at lib/classifier.c:694
> >>>>>>>>>>> #6  0x0000562773d00fc8 in replace_rule_start
> >>>>>>>>>>> (ofproto=ofproto@entry=0x5627778cc420, 
> >>>>>>>>>>> ofm=ofm@entry=0x7fd3057235f0,
> >>>>>>>>>>> old_rule=<optimized out>, new_rule=new_rule@entry=0x7fd2fac70e20)
> >>>>>>>>>>>     at ofproto/ofproto.c:5645
> >>>>>>>>>>> #7  0x0000562773d010e4 in add_flow_start (ofproto=0x5627778cc420,
> >>>>>>>>>>> ofm=0x7fd3057235f0) at ofproto/ofproto.c:5256
> >>>>>>>>>>> #8  0x0000562773d0122d in modify_flows_start__
> >>>>>>>>>>> (ofproto=ofproto@entry=0x5627778cc420, 
> >>>>>>>>>>> ofm=ofm@entry=0x7fd3057235f0)
> >>>>>>>>>>> at ofproto/ofproto.c:5824
> >>>>>>>>>>> #9  0x0000562773d01eac in modify_flow_start_strict
> >>>>>>>>>>> (ofm=0x7fd3057235f0, ofproto=0x5627778cc420) at 
> >>>>>>>>>>> ofproto/ofproto.c:5953
> >>>>>>>>>>> #10 ofproto_flow_mod_start (ofproto=0x5627778cc420,
> >>>>>>>>>>> ofm=ofm@entry=0x7fd3057235f0) at ofproto/ofproto.c:8112
> >>>>>>>>>>> #11 0x0000562773d0225a in ofproto_flow_mod_learn_start
> >>>>>>>>>>> (ofm=ofm@entry=0x7fd3057235f0) at ofproto/ofproto.c:5491
> >>>>>>>>>>> #12 0x0000562773d040ad in ofproto_flow_mod_learn
> >>>>>>>>>>> (ofm=ofm@entry=0x7fd3057235f0, keep_ref=<optimized out>,
> >>>>>>>>>>> limit=<optimized out>, 
> >>>>>>>>>>> below_limitp=below_limitp@entry=0x7fd305723510)
> >>>>>>>>>>>     at ofproto/ofproto.c:5576
> >>>>>>>>>>> #13 0x0000562773d2641e in xlate_learn_action
> >>>>>>>>>>> (ctx=ctx@entry=0x7fd305729a60, learn=learn@entry=0x562777db4618) 
> >>>>>>>>>>> at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:5547
> >>>>>>>>>>> #14 0x0000562773d2aafb in do_xlate_actions (ofpacts=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60,
> >>>>>>>>>>> is_last_action=<optimized out>, group_bucket_action=<optimized 
> >>>>>>>>>>> out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:7232
> >>>>>>>>>>> #15 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777db4470,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #16 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #17 0x0000562773d2ab8d in xlate_ofpact_resubmit
> >>>>>>>>>>> (resubmit=0x56277781db28, resubmit=0x56277781db28,
> >>>>>>>>>>> resubmit=0x56277781db28, is_last_action=true, ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4823
> >>>>>>>>>>> #18 do_xlate_actions (ofpacts=<optimized out>, 
> >>>>>>>>>>> ofpacts_len=<optimized
> >>>>>>>>>>> out>, ctx=0x7fd305729a60, is_last_action=<optimized out>,
> >>>>>>>>>>> group_bucket_action=<optimized out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:7107
> >>>>>>>>>>> #19 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777ab9220,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #20 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #21 0x0000562773d2ab8d in xlate_ofpact_resubmit
> >>>>>>>>>>> (resubmit=0x562777b9f3c8, resubmit=0x562777b9f3c8,
> >>>>>>>>>>> resubmit=0x562777b9f3c8, is_last_action=true, ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4823
> >>>>>>>>>>> #22 do_xlate_actions (ofpacts=<optimized out>, 
> >>>>>>>>>>> ofpacts_len=<optimized
> >>>>>>>>>>> out>, ctx=0x7fd305729a60, is_last_action=<optimized out>,
> >>>>>>>>>>> group_bucket_action=<optimized out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:7107
> >>>>>>>>>>> #23 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777d6dc90,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #24 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>> ---Type <return> to continue, or q <return> to quit---
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #25 0x0000562773d2ab8d in xlate_ofpact_resubmit
> >>>>>>>>>>> (resubmit=0x562777c13c58, resubmit=0x562777c13c58,
> >>>>>>>>>>> resubmit=0x562777c13c58, is_last_action=true, ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4823
> >>>>>>>>>>> #26 do_xlate_actions (ofpacts=<optimized out>, 
> >>>>>>>>>>> ofpacts_len=<optimized
> >>>>>>>>>>> out>, ctx=0x7fd305729a60, is_last_action=<optimized out>,
> >>>>>>>>>>> group_bucket_action=<optimized out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:7107
> >>>>>>>>>>> #27 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562778046000,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #28 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #29 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60,
> >>>>>>>>>>> is_last_action=<optimized out>, group_bucket_action=<optimized 
> >>>>>>>>>>> out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:7110
> >>>>>>>>>>> #30 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777ad0ac0,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #31 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #32 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60,
> >>>>>>>>>>> is_last_action=<optimized out>, group_bucket_action=<optimized 
> >>>>>>>>>>> out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:7110
> >>>>>>>>>>> #33 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x56277785c0d0,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #34 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #35 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60,
> >>>>>>>>>>> is_last_action=<optimized out>, group_bucket_action=<optimized 
> >>>>>>>>>>> out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:7110
> >>>>>>>>>>> #36 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777a86ce0,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #37 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #38 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60,
> >>>>>>>>>>> is_last_action=<optimized out>, group_bucket_action=<optimized 
> >>>>>>>>>>> out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:7110
> >>>>>>>>>>> #39 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x56277781b710,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #40 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #41 0x0000562773d2ac1c in do_xlate_actions
> >>>>>>>>>>> (ofpacts=ofpacts@entry=0x562777833a38,
> >>>>>>>>>>> ofpacts_len=ofpacts_len@entry=32, ctx=ctx@entry=0x7fd305729a60,
> >>>>>>>>>>> is_last_action=is_last_action@entry=true,
> >>>>>>>>>>>     group_bucket_action=group_bucket_action@entry=false) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:7110
> >>>>>>>>>>> #42 0x0000562773d30f68 in clone_xlate_actions 
> >>>>>>>>>>> (actions=0x562777833a38,
> >>>>>>>>>>> actions_len=32, ctx=0x7fd305729a60, is_last_action=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> group_bucket_action=<optimized out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:5809
> >>>>>>>>>>> #43 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d30d70 <clone_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=true, rule=0x562777ad5640,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #44 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d30d70 
> >>>>>>>>>>> <clone_xlate_actions>)
> >>>>>>>>>>> at ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #45 0x0000562773d2cf20 in patch_port_output
> >>>>>>>>>>> (ctx=ctx@entry=0x7fd305729a60, out_dev=0x562777ebeec0,
> >>>>>>>>>>> is_last_action=is_last_action@entry=true, in_dev=0x562777f43ee0,
> >>>>>>>>>>> in_dev=0x562777f43ee0)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:3890
> >>>>>>>>>>> #46 0x0000562773d2d2f7 in compose_output_action__
> >>>>>>>>>>> (ctx=ctx@entry=0x7fd305729a60, ofp_port=2, xr=xr@entry=0x0,
> >>>>>>>>>>> check_stp=check_stp@entry=true,
> >>>>>>>>>>> is_last_action=is_last_action@entry=true,
> >>>>>>>>>>>     truncate=truncate@entry=false) at 
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4205
> >>>>>>>>>>> #47 0x0000562773d2fdd0 in compose_output_action (truncate=false,
> >>>>>>>>>>> is_last_action=true, xr=0x0, ofp_port=<optimized out>,
> >>>>>>>>>>> ctx=0x7fd305729a60) at ofproto/ofproto-dpif-xlate.c:4360
> >>>>>>>>>>> #48 xlate_output_action (ctx=ctx@entry=0x7fd305729a60, 
> >>>>>>>>>>> port=<optimized
> >>>>>>>>>>> out>, controller_len=<optimized out>,
> >>>>>>>>>>> may_packet_in=may_packet_in@entry=true,
> >>>>>>>>>>> is_last_action=is_last_action@entry=true,
> >>>>>>>>>>>     truncate=truncate@entry=false,
> >>>>>>>>>>> group_bucket_action=group_bucket_action@entry=false) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:5305
> >>>>>>>>>>> ---Type <return> to continue, or q <return> to quit---
> >>>>>>>>>>> #49 0x0000562773d2972f in do_xlate_actions (ofpacts=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60,
> >>>>>>>>>>> is_last_action=<optimized out>, group_bucket_action=<optimized 
> >>>>>>>>>>> out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:6960
> >>>>>>>>>>> #50 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777ad5430,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #51 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #52 0x0000562773d2ac1c in do_xlate_actions (ofpacts=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> ofpacts_len=<optimized out>, ctx=0x7fd305729a60,
> >>>>>>>>>>> is_last_action=<optimized out>, group_bucket_action=<optimized 
> >>>>>>>>>>> out>)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:7110
> >>>>>>>>>>> #53 0x0000562773d26c85 in xlate_recursively
> >>>>>>>>>>> (actions_xlator=0x562773d29490 <do_xlate_actions>,
> >>>>>>>>>>> is_last_action=true, deepens=false, rule=0x562777ac2fb0,
> >>>>>>>>>>> ctx=0x7fd305729a60)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-xlate.c:4383
> >>>>>>>>>>> #54 xlate_table_action (ctx=0x7fd305729a60, in_port=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> table_id=<optimized out>, may_packet_in=<optimized out>,
> >>>>>>>>>>> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> >>>>>>>>>>>     is_last_action=true, xlator=0x562773d29490 
> >>>>>>>>>>> <do_xlate_actions>) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:4512
> >>>>>>>>>>> #55 0x0000562773d2ac1c in do_xlate_actions
> >>>>>>>>>>> (ofpacts=ofpacts@entry=0x56277797b318,
> >>>>>>>>>>> ofpacts_len=ofpacts_len@entry=8, ctx=ctx@entry=0x7fd305729a60,
> >>>>>>>>>>> is_last_action=is_last_action@entry=true,
> >>>>>>>>>>>     group_bucket_action=group_bucket_action@entry=false) at
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:7110
> >>>>>>>>>>> #56 0x0000562773d330d6 in xlate_actions 
> >>>>>>>>>>> (xin=xin@entry=0x7fd30572a920,
> >>>>>>>>>>> xout=xout@entry=0x7fd30572ad38) at 
> >>>>>>>>>>> ofproto/ofproto-dpif-xlate.c:7924
> >>>>>>>>>>> #57 0x0000562773d2241b in upcall_xlate (wc=0x7fd30572bfe0,
> >>>>>>>>>>> odp_actions=0x7fd30572b7b0, upcall=0x7fd30572acd0,
> >>>>>>>>>>> udpif=0x562777850cf0) at ofproto/ofproto-dpif-upcall.c:1340
> >>>>>>>>>>> #58 process_upcall (udpif=udpif@entry=0x562777850cf0,
> >>>>>>>>>>> upcall=upcall@entry=0x7fd30572acd0,
> >>>>>>>>>>> odp_actions=odp_actions@entry=0x7fd30572b7b0,
> >>>>>>>>>>> wc=wc@entry=0x7fd30572bfe0)
> >>>>>>>>>>>     at ofproto/ofproto-dpif-upcall.c:1602
> >>>>>>>>>>> #59 0x0000562773d22c39 in upcall_cb (packet=<optimized out>,
> >>>>>>>>>>> flow=0x7fd30572bd40, ufid=<optimized out>, pmd_id=<optimized out>,
> >>>>>>>>>>> type=<optimized out>, userdata=<optimized out>,
> >>>>>>>>>>>     actions=0x7fd30572b7b0, wc=0x7fd30572bfe0,
> >>>>>>>>>>> put_actions=0x7fd30572b7f0, aux=0x562777850cf0) at
> >>>>>>>>>>> ofproto/ofproto-dpif-upcall.c:1461
> >>>>>>>>>>> #60 0x0000562773d52198 in dp_netdev_upcall
> >>>>>>>>>>> (pmd=pmd@entry=0x7fd305731010, packet_=packet_@entry=0x21152c180,
> >>>>>>>>>>> flow=flow@entry=0x7fd30572bd40, wc=wc@entry=0x7fd30572bfe0,
> >>>>>>>>>>>     ufid=ufid@entry=0x7fd30572b790, type=type@entry=DPIF_UC_MISS,
> >>>>>>>>>>> userdata=userdata@entry=0x0, actions=actions@entry=0x7fd30572b7b0,
> >>>>>>>>>>> put_actions=put_actions@entry=0x7fd30572b7f0)
> >>>>>>>>>>>     at lib/dpif-netdev.c:9141
> >>>>>>>>>>> #61 0x0000562773d66f13 in handle_packet_upcall
> >>>>>>>>>>> (put_actions=0x7fd30572b7f0, actions=0x7fd30572b7b0,
> >>>>>>>>>>> key=0x7fd30572cc40, packet=0x21152c180, pmd=0x7fd305731010) at
> >>>>>>>>>>> lib/dpif-netdev.c:11303
> >>>>>>>>>>> #62 fast_path_processing (pmd=pmd@entry=0x7fd305731010,
> >>>>>>>>>>> packets_=packets_@entry=0x7fd30572d0c0,
> >>>>>>>>>>> keys=keys@entry=0x7fd30572cc30,
> >>>>>>>>>>> flow_map=flow_map@entry=0x7fd30572cae0,
> >>>>>>>>>>>     index_map=index_map@entry=0x7fd30572cad0 "", 
> >>>>>>>>>>> in_port=<optimized
> >>>>>>>>>>> out>) at lib/dpif-netdev.c:11426
> >>>>>>>>>>> #63 0x0000562773d680c1 in dp_netdev_input__ (pmd=<optimized out>,
> >>>>>>>>>>> packets=<optimized out>, md_is_valid=md_is_valid@entry=false,
> >>>>>>>>>>> port_no=<optimized out>) at lib/dpif-netdev.c:11520
> >>>>>>>>>>> #64 0x0000562773d6a15d in dp_netdev_input (pmd=<optimized out>,
> >>>>>>>>>>> packets=<optimized out>, port_no=<optimized out>) at
> >>>>>>>>>>> lib/dpif-netdev.c:11558
> >>>>>>>>>>> #65 0x0000562773d6a2cf in dp_netdev_process_rxq_port
> >>>>>>>>>>> (pmd=pmd@entry=0x7fd305731010, rxq=0x56277796a5e0, port_no=3) at
> >>>>>>>>>>> lib/dpif-netdev.c:6660
> >>>>>>>>>>> #66 0x0000562773d6a759 in pmd_thread_main (f_=<optimized out>) at
> >>>>>>>>>>> lib/dpif-netdev.c:8267
> >>>>>>>>>>> #67 0x0000562773df805f in ovsthread_wrapper (aux_=<optimized 
> >>>>>>>>>>> out>) at
> >>>>>>>>>>> lib/ovs-thread.c:422
> >>>>>>>>>>> #68 0x00007fd344480e65 in start_thread (arg=0x7fd305730700) at
> >>>>>>>>>>> pthread_create.c:307
> >>>>>>>>>>> #69 0x00007fd34260988d in clone () at
> >>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >>>>>>>>>>>
> >>>>>>>>>>> Thread 1 (Thread 0x7fd334307700 (LWP 91942)):
> >>>>>>>>>>> #0  0x00007fd342541337 in __GI_raise (sig=sig@entry=6) at
> >>>>>>>>>>> ../nptl/sysdeps/unix/sysv/linux/raise.c:55
> >>>>>>>>>>> #1  0x00007fd342542a28 in __GI_abort () at abort.c:90
> >>>>>>>>>>> #2  0x0000562773e2c4ee in ovs_abort_valist (err_no=<optimized 
> >>>>>>>>>>> out>,
> >>>>>>>>>>> format=<optimized out>, args=args@entry=0x7fd334302340) at
> >>>>>>>>>>> lib/util.c:499
> >>>>>>>>>>> #3  0x0000562773e2c584 in ovs_abort (err_no=err_no@entry=0,
> >>>>>>>>>>> format=format@entry=0x5627740cad18 "%s: %s() passed uninitialized
> >>>>>>>>>>> ovs_mutex") at lib/util.c:491
> >>>>>>>>>>> #4  0x0000562773df72e1 in ovs_mutex_lock_at
> >>>>>>>>>>> (l_=l_@entry=0x7fd2f907df68, where=where@entry=0x5627740a7fc0
> >>>>>>>>>>> "ofproto/ofproto-dpif-upcall.c:2214") at lib/ovs-thread.c:75
> >>>>>>>>>>> #5  0x0000562773d1e98d in ukey_delete 
> >>>>>>>>>>> (umap=umap@entry=0x562777853400,
> >>>>>>>>>>> ukey=ukey@entry=0x7fd2f907df20) at 
> >>>>>>>>>>> ofproto/ofproto-dpif-upcall.c:2214
> >>>>>>>>>>> #6  0x0000562773d202da in revalidator_sweep__
> >>>>>>>>>>> (revalidator=revalidator@entry=0x562777897b00,
> >>>>>>>>>>> purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:3048
> >>>>>>>>>>> #7  0x0000562773d241a6 in revalidator_sweep
> >>>>>>>>>>> (revalidator=0x562777897b00) at ofproto/ofproto-dpif-upcall.c:3072
> >>>>>>>>>>> #8  udpif_revalidator (arg=0x562777897b00) at 
> >>>>>>>>>>> ofproto/ofproto-dpif-upcall.c:1086
> >>>>>>>>>>> #9  0x0000562773df805f in ovsthread_wrapper (aux_=<optimized 
> >>>>>>>>>>> out>) at
> >>>>>>>>>>> lib/ovs-thread.c:422
> >>>>>>>>>>> #10 0x00007fd344480e65 in start_thread (arg=0x7fd334307700) at
> >>>>>>>>>>> pthread_create.c:307
> >>>>>>>>>>> #11 0x00007fd34260988d in clone () at
> >>>>>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> LIU Yulong
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Feb 19, 2024 at 8:12 PM Eelco Chaudron 
> >>>>>>>>>>> <echau...@redhat.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 19 Feb 2024, at 13:09, Ilya Maximets wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 2/19/24 11:14, Eelco Chaudron wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 19 Feb 2024, at 10:34, LIU Yulong wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi OVS experts,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Our ovs-vswitchd runs to core at the 
> >>>>>>>>>>>>>>> ovs_mutex_trylock(&ukey->mutex) in the
> >>>>>>>>>>>>>>> function revalidator_sweep__.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I've sent the mail before but have no response.
> >>>>>>>>>>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-August/052604.html
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> So I'm trying to send this mail again. And I may apologize in 
> >>>>>>>>>>>>>>> advance because
> >>>>>>>>>>>>>>> I would like to post as much useful information as possible 
> >>>>>>>>>>>>>>> to help identify
> >>>>>>>>>>>>>>> potential issues. So this mail will have a really long text.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Compared to the mail 2023-August/052604.html, we upgrade the 
> >>>>>>>>>>>>>>> OVS to 2.17.8
> >>>>>>>>>>>>>>> and DPDK to 22.11 to pray for good luck that maybe the 
> >>>>>>>>>>>>>>> community has potential
> >>>>>>>>>>>>>>> fixes for this issue. But unfortunately, the ovs-vswitchd 
> >>>>>>>>>>>>>>> still runs to core.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As you mentioned it looks like some memory corruption, which I 
> >>>>>>>>>>>>>> have not seen before.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Have you tried this without rte offload? This is the only 
> >>>>>>>>>>>>>> feature I never used.
> >>>>>>>>>>>>>> There is a 2.17.9 with DPDK 22.11.6 you could try.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> OVS 2.17 is not supposed to work with DPDK 22.11, it's supposed 
> >>>>>>>>>>>>> to work with 21.11.
> >>>>>>>>>>>>> See the compatibility table here:
> >>>>>>>>>>>>> https://docs.openvswitch.org/en/latest/faq/releases/
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Though it's hard to tell if DPDK version is anyhow related to 
> >>>>>>>>>>>>> the issue.
> >>>>>>>>>>>>
> >>>>>>>>>>>> My mistake, I was supposed to type 21.11.6 :( But yes if they 
> >>>>>>>>>>>> are using 22.11, that could also be the problem. I would suggest 
> >>>>>>>>>>>> using the supported version and see if the problem goes away.
> >>>>>>>>>>>>
> >>>>>>>>>>>> //Eelco
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Best regards, Ilya Maximets.
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>
>

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to