Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-03-13 Thread LIU Yulong via discuss
Hi Eelco,

Thank you.

Patch sent to the mail list:
https://mail.openvswitch.org/pipermail/ovs-dev/2024-March/412474.html



On Wed, Mar 13, 2024 at 5:34 PM Eelco Chaudron  wrote:
>
>
>
> On 13 Mar 2024, at 10:19, LIU Yulong wrote:
>
> > Hi guys,
> >
> > Send a pull request with that try_lock movement fix based on the former 
> > tests:
> > https://github.com/openvswitch/ovs/pull/421
> >
> > Does that make sense to you?
>
> I’m a bit behind emails, etc. so did not look at your emails yet. But for OVS 
> we use an email-based workflow, see here; 
> https://docs.openvswitch.org/en/latest/internals/contributing/submitting-patches/.
>  If you use this more people will see your patch and can review it.
>
> Cheers,
>
> Eelco
>
> >
> > On Tue, Mar 12, 2024 at 3:11 PM LIU Yulong  wrote:
> >>
> >> Updates:
> >>
> >> Ukey attributes we already have:
> >>
> >> long long int created OVS_GUARDED;/* Estimate of creation 
> >> time. */
> >> unsigned int state_thread OVS_GUARDED;/* Thread that transitions. 
> >> */
> >>
> >> Added more attributes [1] to the ukey:
> >>
> >> const char *state_before OVS_GUARDED;  /* locator state before
> >> (last) transition. */
> >> long long int modified; /* Time of last transition. */
> >> unsigned create_tid;/* Ukey created thread id. */
> >>
> >> [1] 
> >> https://github.com/gotostack/ovs/commit/8ddc4f512783e6b883b102b821e0f05916a9c255
> >>
> >> After that, a core file shows:
> >>
> >> 1) The pmd ctx-> now:
> >> p  ((struct dp_netdev_pmd_thread *) 0x7f804b733010)->ctx
> >> $10 = {now = 12529082556818, last_rxq = 0x55f009029720, emc_insert_min
> >> = 42949672, smc_enable_db = false}
> >>
> >> 2)ukey in the core code call stack
> >> p * (struct udpif_key *) 0x7f803c360710
> >> $11 = { created = 12529082056, modified = 12529082553, create_tid = 9}
> >>
> >> 3) Circular buffer same address for free action
> >> ukey_addr = 0x7f803c360710, timestamp = 12529082556703
> >>
> >> PMD cxt->now 12529082556818 is near the ukey free time 12529082556703,
> >> it's about 115us.
> >>
> >> Adding more timesmap [2] to every ukey state to record the ukey state
> >> transition:
> >> long long int ukey_create_time;/* Time of ukey creation. */
> >> long long int ukey_visible_time; /* Time of ukey visible. */
> >> long long int ukey_operational_time; /* Time of ukey operational. */
> >> long long int ukey_evicting_time;/* Time of ukey evicting. */
> >> long long int ukey_evicted_time; /* Time of ukey evicted. */
> >> long long int ukey_deleted_time; /* Time of ukey deleted. */
> >> long long int ukey_destroy_time; /* Time of ukey destroy. */
> >> long long int ukey_replace_time; /* Time of ukey replace. */
> >>
> >> [2] 
> >> https://github.com/gotostack/ovs/commit/38a2b73af4442aa741930b3e4cff32ab7b559050
> >>
> >> And a core file shows:
> >>
> >>   ukey_create_time = 13217283578366,
> >>   ukey_visible_time = 13217283578366,
> >>   ukey_operational_time = 13217283583044,
> >>   ukey_evicting_time = 13217289145192,
> >>   ukey_evicted_time = 13217289145245,
> >>   ukey_deleted_time = 13217289154654,
> >>   ukey_destroy_time = 13217289156490,  This is set just before the
> >> ovs_mutex_destroy(>mutex);
> >>   ukey_replace_time = 13217289154654
> >>
> >> pmd->ctx:
> >> $4 = {
> >>   now = 13217289156482,
> >>   last_rxq = 0x55b34db74f50,
> >>   emc_insert_min = 42949672,
> >>   smc_enable_db = false
> >> }
> >>
> >> ukey_replace_time and ukey_deleted_time are the same.
> >>
> >> ukey_destroy_time  - pmd-ctx.now = 8 (13217289156490 - 13217289156482)
> >>
> >> And also added a seep_now just before the mostly core code line:
> >> https://github.com/gotostack/ovs/commit/38a2b73af4442aa741930b3e4cff32ab7b559050#diff-be6e2339300cb2a7efa8eca531a668a94ce9f06dd717ba73bb1b508fee27e887R3030
> >> sweep_now = time_usec();
> >> if (ovs_mutex_trylock(>mutex)) {
> >> continue;
> >> }
> >>
> >> ukey_destroy_time  - sweep_now = -78 (13217289156490 - 13217289156568)
> >>
> >> Means that ukey_destory i

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-03-13 Thread LIU Yulong via discuss
Hi guys,

Send a pull request with that try_lock movement fix based on the former tests:
https://github.com/openvswitch/ovs/pull/421

Does that make sense to you?

Thank you.

LIU Yulong

On Tue, Mar 12, 2024 at 3:11 PM LIU Yulong  wrote:
>
> Updates:
>
> Ukey attributes we already have:
>
> long long int created OVS_GUARDED;/* Estimate of creation time. */
> unsigned int state_thread OVS_GUARDED;/* Thread that transitions. */
>
> Added more attributes [1] to the ukey:
>
> const char *state_before OVS_GUARDED;  /* locator state before
> (last) transition. */
> long long int modified; /* Time of last transition. */
> unsigned create_tid;/* Ukey created thread id. */
>
> [1] 
> https://github.com/gotostack/ovs/commit/8ddc4f512783e6b883b102b821e0f05916a9c255
>
> After that, a core file shows:
>
> 1) The pmd ctx-> now:
> p  ((struct dp_netdev_pmd_thread *) 0x7f804b733010)->ctx
> $10 = {now = 12529082556818, last_rxq = 0x55f009029720, emc_insert_min
> = 42949672, smc_enable_db = false}
>
> 2)ukey in the core code call stack
> p * (struct udpif_key *) 0x7f803c360710
> $11 = { created = 12529082056, modified = 12529082553, create_tid = 9}
>
> 3) Circular buffer same address for free action
> ukey_addr = 0x7f803c360710, timestamp = 12529082556703
>
> PMD cxt->now 12529082556818 is near the ukey free time 12529082556703,
> it's about 115us.
>
> Adding more timesmap [2] to every ukey state to record the ukey state
> transition:
> long long int ukey_create_time;/* Time of ukey creation. */
> long long int ukey_visible_time; /* Time of ukey visible. */
> long long int ukey_operational_time; /* Time of ukey operational. */
> long long int ukey_evicting_time;/* Time of ukey evicting. */
> long long int ukey_evicted_time; /* Time of ukey evicted. */
> long long int ukey_deleted_time; /* Time of ukey deleted. */
> long long int ukey_destroy_time; /* Time of ukey destroy. */
> long long int ukey_replace_time; /* Time of ukey replace. */
>
> [2] 
> https://github.com/gotostack/ovs/commit/38a2b73af4442aa741930b3e4cff32ab7b559050
>
> And a core file shows:
>
>   ukey_create_time = 13217283578366,
>   ukey_visible_time = 13217283578366,
>   ukey_operational_time = 13217283583044,
>   ukey_evicting_time = 13217289145192,
>   ukey_evicted_time = 13217289145245,
>   ukey_deleted_time = 13217289154654,
>   ukey_destroy_time = 13217289156490,  This is set just before the
> ovs_mutex_destroy(>mutex);
>   ukey_replace_time = 13217289154654
>
> pmd->ctx:
> $4 = {
>   now = 13217289156482,
>   last_rxq = 0x55b34db74f50,
>   emc_insert_min = 42949672,
>   smc_enable_db = false
> }
>
> ukey_replace_time and ukey_deleted_time are the same.
>
> ukey_destroy_time  - pmd-ctx.now = 8 (13217289156490 - 13217289156482)
>
> And also added a seep_now just before the mostly core code line:
> https://github.com/gotostack/ovs/commit/38a2b73af4442aa741930b3e4cff32ab7b559050#diff-be6e2339300cb2a7efa8eca531a668a94ce9f06dd717ba73bb1b508fee27e887R3030
> sweep_now = time_usec();
> if (ovs_mutex_trylock(>mutex)) {
> continue;
> }
>
> ukey_destroy_time  - sweep_now = -78 (13217289156490 - 13217289156568)
>
> Means that ukey_destory is a bit earlier than revalidator_sweep__ try_lock.
>
>
>
> According to these informations, I assume that the umap and ukey
> iteration has race condition between
> PMD thread, RCU thread and the revalidator thread. And based on the
> core/abort point in the code
> stack. I moved the umap lock to outside of CMAP_FOR_EACH loop [3].
> [3] 
> https://github.com/gotostack/ovs/commit/2919a242be7d0ee079c278a8488188694f20f827
>
> No more core was seen during that revalidator_sweep__ procedure for 4 days 
> now.
>
> But if I revert this lock movement, the core can show again in a few hours.
>
> So, please take a look at this lock movement patch, if it make sense to you.
>
>
> Regards,
>
> LIU Yulong
>
>
> On Fri, Mar 1, 2024 at 6:06 PM LIU Yulong  wrote:
> >
> > Hi,
> >
> > Add some updates:
> >
> > 1.
> > We added a debug attribute `state_before ` to the ukey to record more
> > life cycle details of  a ukey:
> > state_where = 0x55576027b868 "ofproto/ofproto-dpif-upcall.c:",
> > [1], it is UKEY_DELETED.
> > state_before = 0x55576027b630 "ofproto/ofproto-dpif-upcall.c:",
> > [2], it was UKEY_EVICTED.
> >
> > [1] 
> > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897
> > [2] 
>

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-03-12 Thread LIU Yulong via discuss
Updates:

Ukey attributes we already have:

long long int created OVS_GUARDED;/* Estimate of creation time. */
unsigned int state_thread OVS_GUARDED;/* Thread that transitions. */

Added more attributes [1] to the ukey:

const char *state_before OVS_GUARDED;  /* locator state before
(last) transition. */
long long int modified; /* Time of last transition. */
unsigned create_tid;/* Ukey created thread id. */

[1] 
https://github.com/gotostack/ovs/commit/8ddc4f512783e6b883b102b821e0f05916a9c255

After that, a core file shows:

1) The pmd ctx-> now:
p  ((struct dp_netdev_pmd_thread *) 0x7f804b733010)->ctx
$10 = {now = 12529082556818, last_rxq = 0x55f009029720, emc_insert_min
= 42949672, smc_enable_db = false}

2)ukey in the core code call stack
p * (struct udpif_key *) 0x7f803c360710
$11 = { created = 12529082056, modified = 12529082553, create_tid = 9}

3) Circular buffer same address for free action
ukey_addr = 0x7f803c360710, timestamp = 12529082556703

PMD cxt->now 12529082556818 is near the ukey free time 12529082556703,
it's about 115us.

Adding more timesmap [2] to every ukey state to record the ukey state
transition:
long long int ukey_create_time;/* Time of ukey creation. */
long long int ukey_visible_time; /* Time of ukey visible. */
long long int ukey_operational_time; /* Time of ukey operational. */
long long int ukey_evicting_time;/* Time of ukey evicting. */
long long int ukey_evicted_time; /* Time of ukey evicted. */
long long int ukey_deleted_time; /* Time of ukey deleted. */
long long int ukey_destroy_time; /* Time of ukey destroy. */
long long int ukey_replace_time; /* Time of ukey replace. */

[2] 
https://github.com/gotostack/ovs/commit/38a2b73af4442aa741930b3e4cff32ab7b559050

And a core file shows:

  ukey_create_time = 13217283578366,
  ukey_visible_time = 13217283578366,
  ukey_operational_time = 13217283583044,
  ukey_evicting_time = 13217289145192,
  ukey_evicted_time = 13217289145245,
  ukey_deleted_time = 13217289154654,
  ukey_destroy_time = 13217289156490,  This is set just before the
ovs_mutex_destroy(>mutex);
  ukey_replace_time = 13217289154654

pmd->ctx:
$4 = {
  now = 13217289156482,
  last_rxq = 0x55b34db74f50,
  emc_insert_min = 42949672,
  smc_enable_db = false
}

ukey_replace_time and ukey_deleted_time are the same.

ukey_destroy_time  - pmd-ctx.now = 8 (13217289156490 - 13217289156482)

And also added a seep_now just before the mostly core code line:
https://github.com/gotostack/ovs/commit/38a2b73af4442aa741930b3e4cff32ab7b559050#diff-be6e2339300cb2a7efa8eca531a668a94ce9f06dd717ba73bb1b508fee27e887R3030
sweep_now = time_usec();
if (ovs_mutex_trylock(>mutex)) {
continue;
}

ukey_destroy_time  - sweep_now = -78 (13217289156490 - 13217289156568)

Means that ukey_destory is a bit earlier than revalidator_sweep__ try_lock.



According to these informations, I assume that the umap and ukey
iteration has race condition between
PMD thread, RCU thread and the revalidator thread. And based on the
core/abort point in the code
stack. I moved the umap lock to outside of CMAP_FOR_EACH loop [3].
[3] 
https://github.com/gotostack/ovs/commit/2919a242be7d0ee079c278a8488188694f20f827

No more core was seen during that revalidator_sweep__ procedure for 4 days now.

But if I revert this lock movement, the core can show again in a few hours.

So, please take a look at this lock movement patch, if it make sense to you.


Regards,

LIU Yulong


On Fri, Mar 1, 2024 at 6:06 PM LIU Yulong  wrote:
>
> Hi,
>
> Add some updates:
>
> 1.
> We added a debug attribute `state_before ` to the ukey to record more
> life cycle details of  a ukey:
> state_where = 0x55576027b868 "ofproto/ofproto-dpif-upcall.c:",
> [1], it is UKEY_DELETED.
> state_before = 0x55576027b630 "ofproto/ofproto-dpif-upcall.c:",
> [2], it was UKEY_EVICTED.
>
> [1] 
> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897
> [2] 
> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2470
>
> Still, all of the ukeys did the replace action.
>
> 2. The ukey circular buffer [1] does not work well, the buffer still
> has {0} after a long time run, and the number is absolutely less than
> `counter_upcall_ukey_free`.
> [1] 
> https://github.com/gotostack/ovs/commit/939d88c3c5fcdb446b01f2afa8f1e80c3929db46
> And, can not add an `allocate` entry to this buffer for "ukey
> xmalloc". The circular buffer
> mutex seems not to work well, core many times at
> `ovs_mutex_unlock(_free_buffer.mutex)`.
>
> 3. Ilya's patch [2] was applied, but I have not seen the abort log for now.
> [2] 
> https://github.com/igsilya/ovs/commit/8268347a159b5afa884f5b3008897878b5b520f5
>
> 4.

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-03-01 Thread LIU Yulong via discuss
Hi,

Add some updates:

1.
We added a debug attribute `state_before ` to the ukey to record more
life cycle details of  a ukey:
state_where = 0x55576027b868 "ofproto/ofproto-dpif-upcall.c:",
[1], it is UKEY_DELETED.
state_before = 0x55576027b630 "ofproto/ofproto-dpif-upcall.c:",
[2], it was UKEY_EVICTED.

[1] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897
[2] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2470

Still, all of the ukeys did the replace action.

2. The ukey circular buffer [1] does not work well, the buffer still
has {0} after a long time run, and the number is absolutely less than
`counter_upcall_ukey_free`.
[1] 
https://github.com/gotostack/ovs/commit/939d88c3c5fcdb446b01f2afa8f1e80c3929db46
And, can not add an `allocate` entry to this buffer for "ukey
xmalloc". The circular buffer
mutex seems not to work well, core many times at
`ovs_mutex_unlock(_free_buffer.mutex)`.

3. Ilya's patch [2] was applied, but I have not seen the abort log for now.
[2] 
https://github.com/igsilya/ovs/commit/8268347a159b5afa884f5b3008897878b5b520f5

4. dump all ukeys from the core file, we noticed that almost all
UKEY_EVICTED ukeys are changed state at `transition_ukey_at` by the
revalidator thread.
(The `state_thread` attribute of the ukey)
But, the core bt shows the related ukey was changed state at PMD thread.
For instance:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> normal ukey and the revalidator thread:
(struct umap *) 0x55cce9556140:
  (struct udpif_key *) 0x7f3aad584a80:
state = UKEY_EVICTED
 state_thread = 5
  (struct udpif_key *) 0x7f3aac24ce20:
state = UKEY_EVICTED
 state_thread = 5
  (struct udpif_key *) 0x7f3aac6526e0:
state = UKEY_EVICTED
 state_thread = 5
  (struct udpif_key *) 0x7f3aad731970:
state = UKEY_EVICTED
 state_thread = 5
  (struct udpif_key *) 0x7f3aac91ce50:
state = UKEY_EVICTED
 state_thread = 5
  (struct udpif_key *) 0x7f3aadd69be0:
state = UKEY_EVICTED
 state_thread = 5
  (struct udpif_key *) 0x7f3aad759040:
state = UKEY_EVICTED
 state_thread = 5
  (struct udpif_key *) 0x7f3a8c0d6d50:
state = UKEY_EVICTED
 state_thread = 5
  (struct udpif_key *) 0x7f3a8c851300:
state = UKEY_EVICTED
 state_thread = 5

#8  0x55cce5d7005f in ovsthread_wrapper (aux_=) at
lib/ovs-thread.c:422
auxp = 
aux = {start = 0x55cce5c9c0d0 , arg =
0x55cce9595780, name = "revalidator\000\000\000\000"}
id = 5
subprogram_name = 0x7f3ad8c0 "\020 "
#9  0x7f3af2afee65 in start_thread (arg=0x7f3ae2986700) at
pthread_create.c:307
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

>>>>>>>>>>>>>>>> core ukey and PMD thread

p * (struct udpif_key *) 0x7f3aac156e80
$2 = {cmap_node = {next = {p = 0x7f3aaec2b3a0}}, key = 0x7f3aac402810,
key_len = 0, mask = 0x0, mask_len = 172, ufid = {u32 = {256387,
2445823588, 3143939231, 3011838433}, u64 = {lo = 10504732324808489235,
  hi = 12935747573714826399}}, ufid_present = true, hash =
2623373230, pmd_id = 35, mutex = {lock = {__data = {__lock = 0,
__count = 0, __owner = 0, __nusers = 0, __kind = -1, __spins = 0,
__elision = 0, __list = {
  __prev = 0x0, __next = 0x0}}, __size = '\000' , "\377\377\377\377", '\000' , __align = 0},
where = 0x0}, stats = {n_packets = 3, n_bytes = 852, used = 871199854,
tcp_flags = 16},
  created = 871199014, dump_seq = 8822382946, reval_seq = 8822381178,
state = UKEY_DELETED, state_thread = 8

PMD thread ID:
#6  0x55cce5d7005f in ovsthread_wrapper (aux_=) at
lib/ovs-thread.c:422
auxp = 
aux = {start = 0x55cce5ce2460 , arg =
0x7f3ab2e6a010, name = "pmd-c35/id:\000:\177\000"}
id = 8
subprogram_name = 0x7f3aac0008c0 "p\v\"\255:\177"
#7  0x7f3af2afee65 in start_thread (arg=0x7f3ae0582700) at
pthread_create.c:307
>>>>>>>>>>>>>>>>>>>>

The running threads are:

# ps -T -o spid,comm  $(pidof ovs-vswitchd)
  SPID COMMAND
100866 ovs-vswitchd
100867 eal-intr-thread
100868 rte_mp_handle
100872 ovs-vswitchd
100873 dpdk_watchdog1
100876 urcu2
100888 ct_clean7
100889 ipf_clean6
100890 hw_offload3
100891 handler4
100892 revalidator5 # 1 revalidator thread
100893 pmd-c03/id:9
100894 pmd-c35/id:8   # Mostly 1 PMD thread is working! Another is idle forever.
100925 vhost_reconn
100926 vhost-events

So, this can prove that there are two threads were trying to
manipulate the same ukey?

* PMD thread replaced the old_ukey and transitioned the state.
* RCU thread freed the ukey mutex.
* The revalidator thread tries to lock the old_ukey mutex.

https://mail.openvswi

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-02-27 Thread LIU Yulong via discuss
Yes, that makes sense.

Another question is how to distinguish the core at line of
ovs_mutex_trylock in revalidator_sweep__ is after the free(ukey),
since the core trace has no timestamp.

This line in the function 'ukey_create__' should be the only place
where ovs allocated the memory for ukey:
https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-upcall.c#L1777

Right?

If it is true, I will update the buffer structure, and a
counter_upcall_ukey_allocate as well.

On Tue, Feb 27, 2024 at 3:34 PM Eelco Chaudron  wrote:
>
>
>
> On 27 Feb 2024, at 4:44, LIU Yulong wrote:
>
> > @Eelco, as you suggested, added such circular buffer to my local OVS:
> > https://github.com/gotostack/ovs/commit/939d88c3c5fcdb446b01f2afa8f1e80c3929db46
>
> I should also add allocate logging, or else you might not know if a buffer 
> was allocated at the same address.
> Maybe add a bool to the record structure to indicate if it’s an allocate or 
> free.
>
> //Eelco
>
> > gdb shows such data structure:
> > 2232 ukey_free_buffer.index = (ukey_free_buffer.index + 1) % (1024
> > * 1024);  // Circular buffer
> > (gdb) p ukey_free_buffer
> > $1 = {
> >   records = {{
> >   ukey_addr = 0x7f8a0d871700,
> >   timestamp = 1709003328
> > }, {
> >   ukey_addr = 0x7f8a0f969120,
> >   timestamp = 1709003365
> > }, {
> >   ukey_addr = 0x7f8a0defe190,
> >   timestamp = 1709003393
> > }, {
> >   ukey_addr = 0x7f8a0984aea0,
> >   timestamp = 1709003452
> > }...},
> >   index = 3,
> >   mutex = {
> > lock = {
> >   __data = {
> > __lock = 1,
> > __count = 0,
> > __owner = 45210,
> > __nusers = 1,
> > __kind = 2,
> > __spins = 0,
> > __elision = 0,
> > __list = {
> >   __prev = 0x0,
> >   __next = 0x0
> > }
> >   },
> >   __size = 
> > "\001\000\000\000\000\000\000\000\232\260\000\000\001\000\000\000\002",
> > '\000' ,
> >   __align = 1
> > },
> > where = 0x55c35a347d18 "ofproto/ofproto-dpif-upcall.c:2229"
> >   }
> > }
> >
> > and counter_upcall_ukey_free is:
> > $2 = {name = 0x5622b448f612 "upcall_ukey_free", count = 0x5622b41047f0
> > , total = 79785, last_total = 79785, min = {0,
> > 0, 0, 0, 0, 55, 22681, 11703, 13877, 12750, 0, 18719}, hr = {79785,
> > 0 }}
> >
> > Let's see how this goes.
> >
> > Thank you.
> >
> > On Tue, Feb 27, 2024 at 9:05 AM LIU Yulong  wrote:
> >>
> >> @Ilya, thank you, I will add that patch.
> >>
> >> @Eelco, thank you again, I will add a RL log to the free(ukey). Hope
> >> we can get something useful.
> >>
> >>
> >> On Mon, Feb 26, 2024 at 7:55 PM Ilya Maximets  wrote:
> >>>
> >>> On 2/26/24 11:20, Eelco Chaudron wrote:
> >>>>
> >>>>
> >>>> On 26 Feb 2024, at 11:10, LIU Yulong wrote:
> >>>>
> >>>>> Hi Eelco,
> >>>>>
> >>>>> Thank you for the quick response.
> >>>>>
> >>>>> I did not add those logs, because in order to reproduce the issue, we
> >>>>> have to send lots of packets to the host.
> >>>>> So there are too many ukeys created/deleted to do logging.
> >>>>
> >>>> Maybe a circular buffer with all alloc/free (+ 1ukey address, and 
> >>>> timestamp), 1 or 2 Mb of memory can hold a lot.
> >>>>
> >>>>> And can we ensure that this [1] is the only place for ovs to free the 
> >>>>> ukey?
> >>>>>
> >>>>> [1] 
> >>>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2084
> >>>>
> >>>> Yes, this should be the only place, and should always be done through an 
> >>>> RCU delayed delete.
> >>>>
> >>>>> For last mail, can this issue be concurrent read-and-update/delete?
> >>>>> The revalidator_sweep__ is trying to lock the ukey->mutex, while
> >>>>> another thread is updating the ukey->mutex to NULL and free ukey.
> >>>>
> >>>> This should not happen as the delete should happen by the delayed RCU 
> >>>> delete, and if the ukey is still in the cmap after the delayed delete 
> >>>> (quiescent state) something is wron

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-02-26 Thread LIU Yulong via discuss
@Eelco, as you suggested, added such circular buffer to my local OVS:
https://github.com/gotostack/ovs/commit/939d88c3c5fcdb446b01f2afa8f1e80c3929db46

gdb shows such data structure:
2232 ukey_free_buffer.index = (ukey_free_buffer.index + 1) % (1024
* 1024);  // Circular buffer
(gdb) p ukey_free_buffer
$1 = {
  records = {{
  ukey_addr = 0x7f8a0d871700,
  timestamp = 1709003328
}, {
  ukey_addr = 0x7f8a0f969120,
  timestamp = 1709003365
}, {
  ukey_addr = 0x7f8a0defe190,
  timestamp = 1709003393
}, {
  ukey_addr = 0x7f8a0984aea0,
  timestamp = 1709003452
}...},
  index = 3,
  mutex = {
lock = {
  __data = {
__lock = 1,
__count = 0,
__owner = 45210,
__nusers = 1,
__kind = 2,
__spins = 0,
__elision = 0,
__list = {
  __prev = 0x0,
  __next = 0x0
}
  },
  __size = 
"\001\000\000\000\000\000\000\000\232\260\000\000\001\000\000\000\002",
'\000' ,
  __align = 1
},
where = 0x55c35a347d18 "ofproto/ofproto-dpif-upcall.c:2229"
  }
}

and counter_upcall_ukey_free is:
$2 = {name = 0x5622b448f612 "upcall_ukey_free", count = 0x5622b41047f0
, total = 79785, last_total = 79785, min = {0,
0, 0, 0, 0, 55, 22681, 11703, 13877, 12750, 0, 18719}, hr = {79785,
0 }}

Let's see how this goes.

Thank you.

On Tue, Feb 27, 2024 at 9:05 AM LIU Yulong  wrote:
>
> @Ilya, thank you, I will add that patch.
>
> @Eelco, thank you again, I will add a RL log to the free(ukey). Hope
> we can get something useful.
>
>
> On Mon, Feb 26, 2024 at 7:55 PM Ilya Maximets  wrote:
> >
> > On 2/26/24 11:20, Eelco Chaudron wrote:
> > >
> > >
> > > On 26 Feb 2024, at 11:10, LIU Yulong wrote:
> > >
> > >> Hi Eelco,
> > >>
> > >> Thank you for the quick response.
> > >>
> > >> I did not add those logs, because in order to reproduce the issue, we
> > >> have to send lots of packets to the host.
> > >> So there are too many ukeys created/deleted to do logging.
> > >
> > > Maybe a circular buffer with all alloc/free (+ 1ukey address, and 
> > > timestamp), 1 or 2 Mb of memory can hold a lot.
> > >
> > >> And can we ensure that this [1] is the only place for ovs to free the 
> > >> ukey?
> > >>
> > >> [1] 
> > >> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2084
> > >
> > > Yes, this should be the only place, and should always be done through an 
> > > RCU delayed delete.
> > >
> > >> For last mail, can this issue be concurrent read-and-update/delete?
> > >> The revalidator_sweep__ is trying to lock the ukey->mutex, while
> > >> another thread is updating the ukey->mutex to NULL and free ukey.
> > >
> > > This should not happen as the delete should happen by the delayed RCU 
> > > delete, and if the ukey is still in the cmap after the delayed delete 
> > > (quiescent state) something is wrong.
> >
> >
> > I agree with Eelco and I don't see any abvious issues with the current
> > implementation.
> >
> > However, the ususal suspect for RCU problems is entering quiescent state
> > while iterating RCU-protected structure.  Though I'm not sure how that can
> > happen in the revalidator, usually such issues are hiding somewhere way
> > down the call stack.  I made a small patch that can help to be sure that
> > this doesn't actually happen in your setup:
> >   
> > https://github.com/igsilya/ovs/commit/8268347a159b5afa884f5b3008897878b5b520f5
> >
> > Could you try it?
> >
> > The change will log an error message and abort the process if we happen
> > to enter quiescent state while iterating over the hash map.  Core dump
> > will point to a problematic call.
> >
> > Best regards, Ilya Maximets.
> >
> > >
> > >> LIU Yulong
> > >>
> > >>
> > >> On Mon, Feb 26, 2024 at 5:41 PM Eelco Chaudron  
> > >> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On 26 Feb 2024, at 9:33, LIU Yulong wrote:
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> I have read the code by comparing the call stack of the core files
> > >>>> carefully, and found
> > >>>> a potential race condition. Please confirm whether the following 3 
> > >>>> threads
> > >>>> have a race condition. Just did some code trace, can such
> > >&g

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-02-26 Thread LIU Yulong via discuss
@Ilya, thank you, I will add that patch.

@Eelco, thank you again, I will add a RL log to the free(ukey). Hope
we can get something useful.


On Mon, Feb 26, 2024 at 7:55 PM Ilya Maximets  wrote:
>
> On 2/26/24 11:20, Eelco Chaudron wrote:
> >
> >
> > On 26 Feb 2024, at 11:10, LIU Yulong wrote:
> >
> >> Hi Eelco,
> >>
> >> Thank you for the quick response.
> >>
> >> I did not add those logs, because in order to reproduce the issue, we
> >> have to send lots of packets to the host.
> >> So there are too many ukeys created/deleted to do logging.
> >
> > Maybe a circular buffer with all alloc/free (+ 1ukey address, and 
> > timestamp), 1 or 2 Mb of memory can hold a lot.
> >
> >> And can we ensure that this [1] is the only place for ovs to free the ukey?
> >>
> >> [1] 
> >> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2084
> >
> > Yes, this should be the only place, and should always be done through an 
> > RCU delayed delete.
> >
> >> For last mail, can this issue be concurrent read-and-update/delete?
> >> The revalidator_sweep__ is trying to lock the ukey->mutex, while
> >> another thread is updating the ukey->mutex to NULL and free ukey.
> >
> > This should not happen as the delete should happen by the delayed RCU 
> > delete, and if the ukey is still in the cmap after the delayed delete 
> > (quiescent state) something is wrong.
>
>
> I agree with Eelco and I don't see any abvious issues with the current
> implementation.
>
> However, the ususal suspect for RCU problems is entering quiescent state
> while iterating RCU-protected structure.  Though I'm not sure how that can
> happen in the revalidator, usually such issues are hiding somewhere way
> down the call stack.  I made a small patch that can help to be sure that
> this doesn't actually happen in your setup:
>   
> https://github.com/igsilya/ovs/commit/8268347a159b5afa884f5b3008897878b5b520f5
>
> Could you try it?
>
> The change will log an error message and abort the process if we happen
> to enter quiescent state while iterating over the hash map.  Core dump
> will point to a problematic call.
>
> Best regards, Ilya Maximets.
>
> >
> >> LIU Yulong
> >>
> >>
> >> On Mon, Feb 26, 2024 at 5:41 PM Eelco Chaudron  wrote:
> >>>
> >>>
> >>>
> >>> On 26 Feb 2024, at 9:33, LIU Yulong wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have read the code by comparing the call stack of the core files
> >>>> carefully, and found
> >>>> a potential race condition. Please confirm whether the following 3 
> >>>> threads
> >>>> have a race condition. Just did some code trace, can such
> >>>> race condition happen?
> >>>>
> >>>> * PMD thread1 ===:
> >>>> -> pmd_thread_main
> >>>> -> dp_netdev_process_rxq_port
> >>>> -> dp_netdev_input
> >>>> -> dp_netdev_input__
> >>>> -> handle_packet_upcall
> >>>> -> dp_netdev_upcall
> >>>> -> upcall_cb
> >>>> -> ukey_install
> >>>> -> ukey_install__
> >>>> -> try_ukey_replace:
> >>>> ovs_mutex_lock(_ukey->mutex);
> >>>> <-- the CMAP_FOR_EACH loop in the revalidator_sweep__ run a
> >>>> bit earlier than the cmap_replace next line, so the old_ukey can be
> >>>> iterated. [1]
> >>>> cmap_replace(>cmap, _ukey->cmap_node,
> >>>>  _ukey->cmap_node, new_ukey->hash);
> >>>> ovsrcu_postpone(ukey_delete__, old_ukey);
> >>>> < delete the ukey asynchronously. [2]
> >>>> transition_ukey(old_ukey, UKEY_DELETED);   <
> >>>> transition the ukey state to UKEY_DELETED, most core files show that
> >>>> the ukey last state change was at this line. [3]
> >>>> transition_ukey(new_ukey, UKEY_VISIBLE);
> >>>>
> >>>> [1] 
> >>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1892
> >>>> [2] 
> >>>> https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1896
> >>>> [3] 
> >>>> https://

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-02-26 Thread LIU Yulong via discuss
Hi Eelco,

Thank you for the quick response.

I did not add those logs, because in order to reproduce the issue, we
have to send lots of packets to the host.
So there are too many ukeys created/deleted to do logging.

And can we ensure that this [1] is the only place for ovs to free the ukey?

[1] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2084

For last mail, can this issue be concurrent read-and-update/delete?
The revalidator_sweep__ is trying to lock the ukey->mutex, while
another thread is updating the ukey->mutex to NULL and free ukey.

LIU Yulong


On Mon, Feb 26, 2024 at 5:41 PM Eelco Chaudron  wrote:
>
>
>
> On 26 Feb 2024, at 9:33, LIU Yulong wrote:
>
> > Hi,
> >
> > I have read the code by comparing the call stack of the core files
> > carefully, and found
> > a potential race condition. Please confirm whether the following 3 threads
> > have a race condition. Just did some code trace, can such
> > race condition happen?
> >
> > * PMD thread1 ===:
> > -> pmd_thread_main
> > -> dp_netdev_process_rxq_port
> > -> dp_netdev_input
> > -> dp_netdev_input__
> > -> handle_packet_upcall
> > -> dp_netdev_upcall
> > -> upcall_cb
> > -> ukey_install
> > -> ukey_install__
> > -> try_ukey_replace:
> > ovs_mutex_lock(_ukey->mutex);
> > <-- the CMAP_FOR_EACH loop in the revalidator_sweep__ run a
> > bit earlier than the cmap_replace next line, so the old_ukey can be
> > iterated. [1]
> > cmap_replace(>cmap, _ukey->cmap_node,
> >  _ukey->cmap_node, new_ukey->hash);
> > ovsrcu_postpone(ukey_delete__, old_ukey);
> > < delete the ukey asynchronously. [2]
> > transition_ukey(old_ukey, UKEY_DELETED);   <
> > transition the ukey state to UKEY_DELETED, most core files show that
> > the ukey last state change was at this line. [3]
> > transition_ukey(new_ukey, UKEY_VISIBLE);
> >
> > [1] 
> > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1892
> > [2] 
> > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1896
> > [3] 
> > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897
> >
> > This function try_ukey_replace was called many times, because the
> > `counter_upcall_ukey_replace` is not zero.
> > For instance:
> > {
> >   name = 0x55ba9755206b "upcall_ukey_replace",
> >   count = 0x55ba971c7610 ,
> >   total = 2287997,
> >   last_total = 2287997,
> >   min = {221, 247, 444, 278, 324, 570, 379, 464, 283, 280, 0, 427},
> >   hr = {3300, 4378, 3557, 4554, 3748, 3710, 4340, 3559, 4296, 3759,
> > 3522, 4136, 3660, 4428, 3802, 3652, 3880, 3375, 4806, 4221, 4158,
> > 3816, 3750, 3846, 3761, 3653, 4293, 3816, 3723, 3691, 4033, 468, 4117,
> > 3659, 4007, 3536,
> > 3439, 4440, 3388, 4079, 3876, 3865, 4339, 3757, 3481, 4027, 3989,
> > 3633, 3737, 3564, 3403, 3992, 3793, 4390, 4124, 4354, 4164, 4383,
> > 4237, 3667}
> > }
> >
> > * RCU thread2 ===:
> > -> ovsrcu_postpone_thread
> > -> ovsrcu_call_postponed
> > -> ukey_delete__< This
> > function is not thead safe IMO, it has mark
> > OVS_NO_THREAD_SAFETY_ANALYSIS. [4]
> >
> > recirc_refs_unref(>recircs);
> > xlate_cache_delete(ukey->xcache);
> > ofpbuf_delete(ovsrcu_get(struct ofpbuf *, >actions));
> > ovs_mutex_destroy(>mutex); <-- Just
> > set ukey mutex to NULL. [5][6][7]
> > free(ukey);
> >
> > [4] 
> > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2074
> > [5] 
> > https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2083
> > [6] https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L131
> > [7] https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L124
> >
> > * revalidator thread3 ===:
> >
> > -> udpif_revalidator
> > -> revalidator_sweep
> > -> revalidator_sweep__
> >
> > CMAP_FOR_EACH(ukey, cmap_node, >cmap) {
> > enum ukey_state ukey_state;
> >
> > if (ovs_mutex_trylock(>mutex)) {  <--
> > Core at here, because of the NULL pointer. [8]
> > 

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-02-26 Thread LIU Yulong via discuss
Hi,

I have read the code by comparing the call stack of the core files
carefully, and found
a potential race condition. Please confirm whether the following 3 threads
have a race condition. Just did some code trace, can such
race condition happen?

* PMD thread1 ===:
-> pmd_thread_main
-> dp_netdev_process_rxq_port
-> dp_netdev_input
-> dp_netdev_input__
-> handle_packet_upcall
-> dp_netdev_upcall
-> upcall_cb
-> ukey_install
-> ukey_install__
-> try_ukey_replace:
ovs_mutex_lock(_ukey->mutex);
<-- the CMAP_FOR_EACH loop in the revalidator_sweep__ run a
bit earlier than the cmap_replace next line, so the old_ukey can be
iterated. [1]
cmap_replace(>cmap, _ukey->cmap_node,
 _ukey->cmap_node, new_ukey->hash);
ovsrcu_postpone(ukey_delete__, old_ukey);
< delete the ukey asynchronously. [2]
transition_ukey(old_ukey, UKEY_DELETED);   <
transition the ukey state to UKEY_DELETED, most core files show that
the ukey last state change was at this line. [3]
transition_ukey(new_ukey, UKEY_VISIBLE);

[1] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1892
[2] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1896
[3] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L1897

This function try_ukey_replace was called many times, because the
`counter_upcall_ukey_replace` is not zero.
For instance:
{
  name = 0x55ba9755206b "upcall_ukey_replace",
  count = 0x55ba971c7610 ,
  total = 2287997,
  last_total = 2287997,
  min = {221, 247, 444, 278, 324, 570, 379, 464, 283, 280, 0, 427},
  hr = {3300, 4378, 3557, 4554, 3748, 3710, 4340, 3559, 4296, 3759,
3522, 4136, 3660, 4428, 3802, 3652, 3880, 3375, 4806, 4221, 4158,
3816, 3750, 3846, 3761, 3653, 4293, 3816, 3723, 3691, 4033, 468, 4117,
3659, 4007, 3536,
3439, 4440, 3388, 4079, 3876, 3865, 4339, 3757, 3481, 4027, 3989,
3633, 3737, 3564, 3403, 3992, 3793, 4390, 4124, 4354, 4164, 4383,
4237, 3667}
}

* RCU thread2 ===:
-> ovsrcu_postpone_thread
-> ovsrcu_call_postponed
-> ukey_delete__< This
function is not thead safe IMO, it has mark
OVS_NO_THREAD_SAFETY_ANALYSIS. [4]

recirc_refs_unref(>recircs);
xlate_cache_delete(ukey->xcache);
ofpbuf_delete(ovsrcu_get(struct ofpbuf *, >actions));
ovs_mutex_destroy(>mutex); <-- Just
set ukey mutex to NULL. [5][6][7]
free(ukey);

[4] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2074
[5] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2083
[6] https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L131
[7] https://github.com/openvswitch/ovs/blob/v2.17.8/lib/ovs-thread.c#L124

* revalidator thread3 ===:

-> udpif_revalidator
-> revalidator_sweep
-> revalidator_sweep__

CMAP_FOR_EACH(ukey, cmap_node, >cmap) {
enum ukey_state ukey_state;

if (ovs_mutex_trylock(>mutex)) {  <--
Core at here, because of the NULL pointer. [8]
continue;
}
[8] 
https://github.com/openvswitch/ovs/blob/v2.17.8/ofproto/ofproto-dpif-upcall.c#L2900

CMIIW, if this race condition can happen, IMO, it is mostly because
the umap is not locked during the sweep CMAP_FOR_EACH loop.
Or some RCU protection did not work properly.


Regards,
LIU Yulong


On Wed, Feb 21, 2024 at 6:40 PM Eelco Chaudron  wrote:
>
>
>
> On 21 Feb 2024, at 4:26, LIU Yulong wrote:
>
> > Thank you very much for your reply.
> >
> > The problem is not easy to reproduce, we have to wait a random long time to 
> > see
> > if the issue happens again. It can be more than one day or longer.
> > OVS 2.17 with dpdk 20.11 had run to core before, so it's hard to say
> > if it is related to DPDK.
> > I'm running the ovs without offload to see if the issue can happen in
> > recent days.
> >
> > And again, TLDR, paste more thread call stacks.
> > Most of the threads are in the state of sched_yield, nanosleep,
> > epoll_wait and  poll.
>
> If this looks like a memory trash issue, it might be hard to figure out. Does 
> the ukey show any kind of pattern, i.e. does the trashed data look like 
> anything known?
> Maybe it’s a use after free, so you could add some debugging code 
> logging/recording all free and xmalloc of the ukey structure, to see that 
> when it crashes it was actually allocated?
>
> Hope this helps you getting started.
>
> //Eelco
>
> > The following threads are in working state. So hope this can have
> > clues f

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-02-20 Thread LIU Yulong via discuss
/ofproto-dpif-upcall.c:2214
#6  0x562773d202da in revalidator_sweep__
(revalidator=revalidator@entry=0x562777897b00,
purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:3048
#7  0x562773d241a6 in revalidator_sweep
(revalidator=0x562777897b00) at ofproto/ofproto-dpif-upcall.c:3072
#8  udpif_revalidator (arg=0x562777897b00) at ofproto/ofproto-dpif-upcall.c:1086
#9  0x562773df805f in ovsthread_wrapper (aux_=) at
lib/ovs-thread.c:422
#10 0x7fd344480e65 in start_thread (arg=0x7fd334307700) at
pthread_create.c:307
#11 0x7fd34260988d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111


Thanks
LIU Yulong

On Mon, Feb 19, 2024 at 8:12 PM Eelco Chaudron  wrote:
>
>
>
> On 19 Feb 2024, at 13:09, Ilya Maximets wrote:
>
> > On 2/19/24 11:14, Eelco Chaudron wrote:
> >>
> >>
> >> On 19 Feb 2024, at 10:34, LIU Yulong wrote:
> >>
> >>> Hi OVS experts,
> >>>
> >>> Our ovs-vswitchd runs to core at the ovs_mutex_trylock(>mutex) in 
> >>> the
> >>> function revalidator_sweep__.
> >>>
> >>> I've sent the mail before but have no response.
> >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2023-August/052604.html
> >>>
> >>> So I'm trying to send this mail again. And I may apologize in advance 
> >>> because
> >>> I would like to post as much useful information as possible to help 
> >>> identify
> >>> potential issues. So this mail will have a really long text.
> >>>
> >>> Compared to the mail 2023-August/052604.html, we upgrade the OVS to 2.17.8
> >>> and DPDK to 22.11 to pray for good luck that maybe the community has 
> >>> potential
> >>> fixes for this issue. But unfortunately, the ovs-vswitchd still runs to 
> >>> core.
> >>
> >> As you mentioned it looks like some memory corruption, which I have not 
> >> seen before.
> >>
> >> Have you tried this without rte offload? This is the only feature I never 
> >> used.
> >> There is a 2.17.9 with DPDK 22.11.6 you could try.
> >
> > OVS 2.17 is not supposed to work with DPDK 22.11, it's supposed to work 
> > with 21.11.
> > See the compatibility table here:
> >  https://docs.openvswitch.org/en/latest/faq/releases/
> >
> > Though it's hard to tell if DPDK version is anyhow related to the issue.
>
> My mistake, I was supposed to type 21.11.6 :( But yes if they are using 
> 22.11, that could also be the problem. I would suggest using the supported 
> version and see if the problem goes away.
>
> //Eelco
>
> > Best regards, Ilya Maximets.
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-02-19 Thread LIU Yulong via discuss
_key *) 0x7f871ba370d0: key_len = 168, mask_len = 172
   ufid =
7e5c26fd-af10-ba15-653c-454a828c068d
   hash = 0x9306ba27, pmd_id = 3
   state = UKEY_EVICTED
   state_where = 0x55c52eff5b48
"ofproto/ofproto-dpif-upcall.c:2608"
   n_packets = 5, n_bytes = 820
   used = 10324368550, tcp_flags = 0x
The length is 24.

The umap details:
(gdb) p *(struct umap *) 0x55c53301f998
$12 = {
  mutex = {
lock = {
  __data = {
__lock = 0,
__count = 0,
__owner = 0,
__nusers = 0,
__kind = 2,
__spins = 0,
__elision = 0,
__list = {
  __prev = 0x0,
  __next = 0x0
}
  },
  __size = '\000' , "\002", '\000' ,
  __align = 0
},
where = 0x55c52efef4be ""
  },
  cmap = {
impl = {
  p = 0x7f86f826c8c0
}
  }
}


As we can see the umap 0x55c53301f998 does not have a ukey 0x7f8718dcc050
(but bt full output has ukey = 0x7f8718dcc050). And this ukey =
0x7f8718dcc050 indeed
has a mutex with an uninitialized 'where' pointer. Maybe this pointer
is just invalid.

(gdb) p *(struct udpif_key *)0x7f8718dcc050
$11 = {
   ...
  mutex = {
lock = {
  __data = {
__lock = 0,
__count = 0,
__owner = 0,
__nusers = 0,
__kind = -1,
__spins = 0,
__elision = 0,
__list = {
  __prev = 0x0,
  __next = 0x0
}
  },
  __size = '\000' , "\377\377\377\377", '\000'
,
  __align = 0
},
    where = 0x0
  },
...
}

There seems to be an out-of-bounds access to the linked list of ukeys here.

So, I would greatly appreciate your help, as it is crucial for OVS to operate
in our production environment.

I can provide further debug related output information at any time.
Waiting for your response...
Thank you very much in advance.

Best regards,
LIU Yulong
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [ovs-vswitchd] core dump of ovs-vswitchd during revalidator sweep

2023-08-07 Thread LIU Yulong via discuss
Hi there,

We met some ovs-vswitchd coredumps of our running cloud recently. And
it has been dead many times.

The ovs version is 2.17.2. The ovs is running with dpdk 20.11. The
dpdk rte offload is enabled with Mellanox CX6-Dx.

The core dump trace is:
#0  0x7f3af0bc0337 in __bsd_signal (sig=23855, handler=0x5d5a) at
../sysdeps/posix/signal.c:50
#1  0x7f3af0bc1a28 in __GI_abort () at abort.c:79
#2  0x55cce5da44ee in ovs_abort_valist (err_no=,
format=, args=args@entry=0x7f3ae2981360) at
lib/util.c:499
#3  0x55cce5da4584 in ovs_abort (err_no=err_no@entry=0,
format=format@entry=0x55cce6042d18 "%s: %s() passed uninitialized
ovs_mutex") at lib/util.c:491
#4  0x55cce5d6f4a1 in ovs_mutex_trylock_at
(l_=l_@entry=0x7f3aac156ec8, where=where@entry=0x55cce6020318
"ofproto/ofproto-dpif-upcall.c:3014") at lib/ovs-thread.c:106
#5  0x55cce5c98181 in revalidator_sweep__
(revalidator=revalidator@entry=0x55cce9595780,
purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:3014
#6  0x55cce5c9c1a6 in revalidator_sweep
(revalidator=0x55cce9595780) at ofproto/ofproto-dpif-upcall.c:3072
#7  udpif_revalidator (arg=0x55cce9595780) at ofproto/ofproto-dpif-upcall.c:1086
#8  0x55cce5d7005f in ovsthread_wrapper (aux_=) at
lib/ovs-thread.c:422
#9  0x7f3af2afee65 in start_thread (arg=0x0) at pthread_create.c:282
#10 0x7f3af0cd in __libc_ifunc_impl_list (name=, array=0x7f3ae2986700, max=) at
../sysdeps/x86_64/multiarch/ifunc-impl-list.c:329
#11 0x in ?? ()

(gdb) info threads
  Id   Target Id Frame
  22   LWP 23896 0x7f3af2b05e5d in msync () at
../sysdeps/unix/syscall-template.S:83
  21   LWP 1041650x7f3af2b0571d in write () at
../sysdeps/unix/syscall-template.S:83
  20   LWP 23897 0x7f3af0c7dbed in fts_read
(sp=0x7f3ac40008c0) at fts.c:459
  19   LWP 23886 0x7f3af0c7dbed in fts_read
(sp=0x7f3ad80008c0) at fts.c:459
  18   LWP 23885 0x7f3af0c7dbed in fts_read
(sp=0x7f3acc0008c0) at fts.c:459
  17   LWP 23873 0x7f3af0c4f80d in __sigaddset (__sig=17,
__set=0x7f3ae3e52100) at ../sysdeps/unix/sysv/linux/bits/sigset.h:118
  16   Thread 0x7f3af315c000 (LWP 23855) 0x7f3af0c7dbed in
fts_read (sp=0x55cce966db40) at fts.c:459
  15   LWP 8050  0x7f3af2b02da2 in
pthread_cond_timedwait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:193
  14   LWP 1041780x7f3af2b0571d in write () at
../sysdeps/unix/syscall-template.S:83
  13   LWP 1041770x7f3af2b0571d in write () at
../sysdeps/unix/syscall-template.S:83
  12   LWP 23904 0x7f3af0c7dbed in fts_read
(sp=0x7f3aa4004ae0) at fts.c:459
  11   LWP 1041710x7f3af2b0571d in write () at
../sysdeps/unix/syscall-template.S:83
  10   LWP 1037660x7f3af2b0571d in write () at
../sysdeps/unix/syscall-template.S:83
  9LWP 23872 0x7f3af2b0599d in do_fcntl
(arg=, cmd=7, fd=21964) at
../sysdeps/unix/sysv/linux/fcntl.c:39
  8LWP 67365 0x7f3af2b0571d in write () at
../sysdeps/unix/syscall-template.S:83
  7LWP 23857 0x7f3af2b05b6d in recvfrom () at
../sysdeps/unix/syscall-template.S:81
  6LWP 23876 0x7f3af0c7dbed in fts_read
(sp=0x7f3ad40008c0) at fts.c:459
  5LWP 23856 0x7f3af0c88e63 in arch_prctl () at
../sysdeps/unix/syscall-template.S:81
  4LWP 23905 0x7f3af2b056bd in vfork () at
../sysdeps/unix/sysv/linux/x86_64/vfork.S:57
  3LWP 67062 0x7f3af0c7dbed in fts_read (sp=0x0) at fts.c:459
  2LWP 67061 0x7f3af0c4f80d in __sigaddset (__sig=17,
__set=0x7f3ae031c160) at ../sysdeps/unix/sysv/linux/bits/sigset.h:118
* 1LWP 23898 0x7f3af0bc0337 in __bsd_signal
(sig=23855, handler=0x5d5a) at ../sysdeps/posix/signal.c:50


And gdb with the core file we have:
(gdb) print $22->mutex
$24 = {lock = {__data = {__lock = -866881024, __count = 2697380352,
__owner = 1100469760, __nusers = 2830690816, __kind = 0, __spins = 0,
__elision = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = "\000rT\314\000\302Ơ\000֗A\000긨",
'\000' , __align = -6861583673568497152}, where =
0x0}
(gdb) print &$22->mutex
$25 = (struct ovs_mutex *) 0x7f3aae6df548
(gdb) print *0x7f3aac156ec8
$26 = 0
(gdb) print (struct ovs_mutex *)0x7f3aac156ec8
$27 = (struct ovs_mutex *) 0x7f3aac156ec8
(gdb) print $27->where
$28 = 0x0

After some code search, we can ensure that the mutex is initialized.
But,  seems the mutex is deleted/replaced/released during the
revalidator sweep phrase.
We found some patches and discussions may related to this:
https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/322128.html
https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/322125.html

May I ask if everyone can provide a preliminary evaluation of the
problem? How could this happen? And what can we do to wo

[ovs-discuss] [HOW TO] get switch port information by openvswitch lldp

2023-04-24 Thread LIU Yulong via discuss
Hi team,

For ovs, we enable lldp for OVS-DPDK. After settings, we can get host
port information in physical switch side.

But, we have a use case is how to get the switch information in host side.
Is there something similar to lldpcli commands?

# lldpcli show neighbors
---
LLDP neighbors:
---
Interface:eno1, via: LLDP, RID: 1, Time: 11 days, 02:41:02
  Chassis:
ChassisID:mac xx:de:e5:48:41:xx
SysName:  xxx
SysDescr: XX XX Routing Platform Software
  VRP (R) software, Version 8.191 (CE6881 V200R019C10SPC800)
  Copyright (C) 2012-2020 xxx Technologies Co., Ltd.
  xxx CE6881-48S6CQ

MgmtIP:   10.5.2.4
Capability:   Bridge, on
Capability:   Router, on
  Port:
PortID:   ifname 10GE1/0/11
PortDescr:TO_
TTL:  120

Thanks.

LIU Yulong
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] How to add multiple resubmit(, table) actions to one flow for OPENFLOW13 and upper?

2022-08-02 Thread LIU Yulong
Hi there,

For "ovs-ofctl add-flow" command, we can add such flow:
table=10, arp,in_port=4,arp_tpa=192.168.111.1,arp_op=1
actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],
mod_dl_src:fa:16:3e:67:13:fc,load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],
move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xc0a86f01->NXM_OF_ARP_SPA[],
load:0xfa163e6713fc->NXM_NX_ARP_SHA[], resubmit(,20),resubmit(,30)

But how to add such flows via OPENFLOW13 for controllers to
ovs-vswitchd. We are using ryu as the controller, but in ovs-vswitchd,
we met such error:

2022-08-02T09:39:10.073Z|13069|connmgr|INFO|br-880ea359-4<->tcp:127.0.0.1:7733:
sending OFPBIC_DUP_INST error reply to OFPT_FLOW_MOD message
2022-08-02T09:39:16.168Z|13070|connmgr|INFO|br-b247f145-5<->tcp:127.0.0.1:7733:
sending OFPBIC_DUP_INST error reply to OFPT_FLOW_MOD message

Ryu controller instructions are:
instructions = [
ofpp.OFPInstructionActions(ofp.OFPIT_APPLY_ACTIONS, actions),
ofpp.OFPInstructionGotoTable(table_id=table1),
ofpp.OFPInstructionGotoTable(table_id=table2)]

Is this possible? Or am I missing something?

Thank you guys in advance.


LIU Yulong
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Action "mod_nw_tos:" for OPENFLOW13

2021-07-12 Thread LIU Yulong
Thank you guys, "set_field" works now, the input command line is:
"""
ovs-ofctl add-flow -O OpenFlow13 br-int
"hard_timeout=0,idle_timeout=0,priority=65535,cookie=17142212024999476593,in_port=2,table=0,ip,reg2=0,actions=set_field:16->ip_dscp,load:55->NXM_NX_REG2[0..5],resubmit(,0)"
"""

The command "ovs-ofctl dump-flows br-int -O OPENFLOW10" results:
"""
 cookie=0xede55f036e85fd71, duration=1.537s, table=0, n_packets=0,
n_bytes=0, priority=65535,ip,reg2=0,in_port="patch-tun"
actions=mod_nw_tos:64,load:0x37->NXM_NX_REG2[0..5],resubmit(,0)
"""

"ovs-ofctl dump-flows br-int -O OPENFLOW13" results:
"""
 cookie=0xede55f036e85fd71, duration=115.497s, table=0, n_packets=0,
n_bytes=0, priority=65535,ip,reg2=0,in_port="patch-tun"
actions=set_field:16->ip_dscp,load:0x37->NXM_NX_REG2[0..5],resubmit(,0)
"""


On Tue, Jul 13, 2021 at 12:14 AM Ben Pfaff  wrote:
>
> On Mon, Jul 12, 2021 at 03:13:13PM +0800, LIU Yulong wrote:
> > Recently, I tested OpenStack Neutron with DSCP marking functions. But, I
> > met some errors related to the action "mod_nw_tos:", the command line
> > input is:
> >
> > """
> > ovs-ofctl add-flow -O OpenFlow13 br-int
> > "hard_timeout=0,idle_timeout=0,priority=65535,cookie=17142212024999476593,in_port=2,table=0,reg2=0,actions=mod_nw_tos:64,load:55->NXM_NX_REG2[0..5],resubmit(,0)"
> > """
> >
> > The error output is:
> > """
> > ovs-ofctl: none of the usable flow formats (NXM+table_id) is among the
> > allowed flow formats (OXM-OpenFlow13)
> > """
> > Action "mod_nw_tos:" is causing the error here.
> >
> > So, I would ask which action for OPENFLOW13 is to do same work of
> > "mod_nw_tos:" for OPENFLOW10? These is no doc mentioned such action
> > for OPENFLOW13.
>
> OpenFlow 1.3 replaced all of the individual actions for setting fields
> with the more general "set_field" action.  You should be able to use
> that.  You will need to shift the TOS values two bits right to match the
> OF1.3 format, I believe.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Action "mod_nw_tos:" for OPENFLOW13

2021-07-12 Thread LIU Yulong
Hi there,

Recently, I tested OpenStack Neutron with DSCP marking functions. But, I
met some errors related to the action "mod_nw_tos:", the command line
input is:

"""
ovs-ofctl add-flow -O OpenFlow13 br-int
"hard_timeout=0,idle_timeout=0,priority=65535,cookie=17142212024999476593,in_port=2,table=0,reg2=0,actions=mod_nw_tos:64,load:55->NXM_NX_REG2[0..5],resubmit(,0)"
"""

The error output is:
"""
ovs-ofctl: none of the usable flow formats (NXM+table_id) is among the
allowed flow formats (OXM-OpenFlow13)
"""
Action "mod_nw_tos:" is causing the error here.

So, I would ask which action for OPENFLOW13 is to do same work of
"mod_nw_tos:" for OPENFLOW10? These is no doc mentioned such action
for OPENFLOW13.


Regards,
LIU Yulong
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Fwd: [ovs-dpdk] bandwidth issue of vhostuserclient virtio ovs-dpdk

2018-12-03 Thread LIU Yulong
On Sat, Dec 1, 2018 at 1:17 AM LIU Yulong  wrote:

>
>
> On Fri, Nov 30, 2018 at 5:36 PM Lam, Tiago  wrote:
>
>> On 30/11/2018 02:07, LIU Yulong wrote:
>> > Hi,
>> >
>> > Thanks for the reply, please see my inline comments below.
>> >
>> >
>> > On Thu, Nov 29, 2018 at 6:00 PM Lam, Tiago > > <mailto:tiago@intel.com>> wrote:
>> >
>> > On 29/11/2018 08:24, LIU Yulong wrote:
>> > > Hi,
>> > >
>> > > We recently tested ovs-dpdk, but we met some bandwidth issue. The
>> > bandwidth
>> > > from VM to VM was not close to the physical NIC, it's about
>> > 4.3Gbps on a
>> > > 10Gbps NIC. For no dpdk (virtio-net) VMs, the iperf3 test can
>> easily
>> > > reach 9.3Gbps. We enabled the virtio multiqueue for all guest VMs.
>> > In the
>> > > dpdk vhostuser guest, we noticed that the interrupts are
>> > centralized to
>> > > only one queue. But for no dpdk VM, interrupts can hash to all
>> queues.
>> > > For those dpdk vhostuser VMs, we also noticed that the PMD usages
>> were
>> > > also centralized to one no matter server(tx) or client(rx). And no
>> > matter
>> > > one PMD or multiple PMDs, this behavior always exists.
>> > >
>> > > Furthuremore, my colleague add some systemtap hook on the
>> openvswitch
>> > > function, he found something interesting. The function
>> > > __netdev_dpdk_vhost_send will send all the packets to one
>> > virtionet-queue.
>> > > Seems that there are some algorithm/hash table/logic does not do
>> > the hash
>> > > very well.
>> > >
>> >
>> > Hi,
>> >
>> > When you say "no dpdk VMs", you mean that within your VM you're
>> relying
>> > on the Kernel to get the packets, using virtio-net. And when you say
>> > "dpdk vhostuser guest", you mean you're using DPDK inside the VM to
>> get
>> > the packets. Is this correct?
>> >
>> >
>> > Sorry for the inaccurate description. I'm really new to DPDK.
>> > No DPDK inside VM, all these settings are for host only.
>> > (`host` means the hypervisor physical machine in the perspective of
>> > virtualization.
>> > On the other hand `guest` means the virtual machine.)
>> > "no dpdk VMs" means the host does not setup DPDK (ovs is working in
>> > traditional way),
>> > the VMs were boot on that. Maybe a new name `VMs-on-NO-DPDK-host`?
>>
>> Got it. Your "no dpdk VMs" really is referred to as OvS-Kernel, while
>> your "dpdk vhostuser guest" is referred to as OvS-DPDK.
>>
>> >
>> > If so, could you also tell us which DPDK app you're using inside of
>> > those VMs? Is it testpmd? If so, how are you setting the `--rxq` and
>> > `--txq` args? Otherwise, how are you setting those in your app when
>> > initializing DPDK?
>> >
>> >
>> > Inside VM, there is no DPDK app, VM kernel also
>> > does not set any config related to DPDK. `iperf3` is the tool for
>> > bandwidth testing.
>> >
>> > The information below is useful in telling us how you're setting
>> your
>> > configurations in OvS, but we are still missing the configurations
>> > inside the VM.
>> >
>> > This should help us in getting more information,
>> >
>> >
>> > Maybe you have noticed that, we only setup one PMD in the pasted
>> > configurations.
>> > But VM has 8 queues. Should the pmd quantity match the queues?
>>
>> It shouldn't match the queues inside the VM per say. But in this case,
>> since you have configured 8 rx queues on your physical NICs as well, and
>> since you're looking for higher throughputs, you should increase that
>> number of PMDs and pin those rxqs - take a look at [1] on how to do
>> that. Later on, increasing the size of your queues could also help.
>>
>>
> I'll test it.
> Yes, as you noticed that the vhostuserclient  port has n_rxq="8",
> options:
> {n_rxq="8",vhost-server-path="/var/lib/vhost_sockets/vhu76f9a623-9f"}.
> And the physical NIC has both n_rxq="8", n_txq="8".
> options: {dpdk-devargs=":01:00.0", n_rxq="8", n_txq="8"}
> options

Re: [ovs-discuss] Fwd: [ovs-dpdk] bandwidth issue of vhostuserclient virtio ovs-dpdk

2018-11-30 Thread LIU Yulong
On Fri, Nov 30, 2018 at 5:36 PM Lam, Tiago  wrote:

> On 30/11/2018 02:07, LIU Yulong wrote:
> > Hi,
> >
> > Thanks for the reply, please see my inline comments below.
> >
> >
> > On Thu, Nov 29, 2018 at 6:00 PM Lam, Tiago  > <mailto:tiago@intel.com>> wrote:
> >
> > On 29/11/2018 08:24, LIU Yulong wrote:
> > > Hi,
> > >
> > > We recently tested ovs-dpdk, but we met some bandwidth issue. The
> > bandwidth
> > > from VM to VM was not close to the physical NIC, it's about
> > 4.3Gbps on a
> > > 10Gbps NIC. For no dpdk (virtio-net) VMs, the iperf3 test can
> easily
> > > reach 9.3Gbps. We enabled the virtio multiqueue for all guest VMs.
> > In the
> > > dpdk vhostuser guest, we noticed that the interrupts are
> > centralized to
> > > only one queue. But for no dpdk VM, interrupts can hash to all
> queues.
> > > For those dpdk vhostuser VMs, we also noticed that the PMD usages
> were
> > > also centralized to one no matter server(tx) or client(rx). And no
> > matter
> > > one PMD or multiple PMDs, this behavior always exists.
> > >
> > > Furthuremore, my colleague add some systemtap hook on the
> openvswitch
> > > function, he found something interesting. The function
> > > __netdev_dpdk_vhost_send will send all the packets to one
> > virtionet-queue.
> > > Seems that there are some algorithm/hash table/logic does not do
> > the hash
> > > very well.
> > >
> >
> > Hi,
> >
> > When you say "no dpdk VMs", you mean that within your VM you're
> relying
> > on the Kernel to get the packets, using virtio-net. And when you say
> > "dpdk vhostuser guest", you mean you're using DPDK inside the VM to
> get
> > the packets. Is this correct?
> >
> >
> > Sorry for the inaccurate description. I'm really new to DPDK.
> > No DPDK inside VM, all these settings are for host only.
> > (`host` means the hypervisor physical machine in the perspective of
> > virtualization.
> > On the other hand `guest` means the virtual machine.)
> > "no dpdk VMs" means the host does not setup DPDK (ovs is working in
> > traditional way),
> > the VMs were boot on that. Maybe a new name `VMs-on-NO-DPDK-host`?
>
> Got it. Your "no dpdk VMs" really is referred to as OvS-Kernel, while
> your "dpdk vhostuser guest" is referred to as OvS-DPDK.
>
> >
> > If so, could you also tell us which DPDK app you're using inside of
> > those VMs? Is it testpmd? If so, how are you setting the `--rxq` and
> > `--txq` args? Otherwise, how are you setting those in your app when
> > initializing DPDK?
> >
> >
> > Inside VM, there is no DPDK app, VM kernel also
> > does not set any config related to DPDK. `iperf3` is the tool for
> > bandwidth testing.
> >
> > The information below is useful in telling us how you're setting your
> > configurations in OvS, but we are still missing the configurations
> > inside the VM.
> >
> > This should help us in getting more information,
> >
> >
> > Maybe you have noticed that, we only setup one PMD in the pasted
> > configurations.
> > But VM has 8 queues. Should the pmd quantity match the queues?
>
> It shouldn't match the queues inside the VM per say. But in this case,
> since you have configured 8 rx queues on your physical NICs as well, and
> since you're looking for higher throughputs, you should increase that
> number of PMDs and pin those rxqs - take a look at [1] on how to do
> that. Later on, increasing the size of your queues could also help.
>
>
I'll test it.
Yes, as you noticed that the vhostuserclient  port has n_rxq="8",
options:
{n_rxq="8",vhost-server-path="/var/lib/vhost_sockets/vhu76f9a623-9f"}.
And the physical NIC has both n_rxq="8", n_txq="8".
options: {dpdk-devargs=":01:00.0", n_rxq="8", n_txq="8"}
options: {dpdk-devargs=":05:00.1", n_rxq="8", n_txq="8"}
But, furthermore, when remove such configuration for vhostuserclient  port
and physical NIC,
the bandwidth is same to 4.3Gbps no matter one PMD or multiple PMDs.


> Just as a curiosity, I see you have a configured MTU of 1500B on the
> physical interfaces. Is that the same MTU you're using inside the VM?
> And are you using the same configurations (including that 1500B MTU)
> when run

[ovs-discuss] Fwd: [ovs-dpdk] bandwidth issue of vhostuserclient virtio ovs-dpdk

2018-11-29 Thread LIU Yulong
Hi,

Thanks for the reply, please see my inline comments below.


On Thu, Nov 29, 2018 at 6:00 PM Lam, Tiago  wrote:

> On 29/11/2018 08:24, LIU Yulong wrote:
> > Hi,
> >
> > We recently tested ovs-dpdk, but we met some bandwidth issue. The
> bandwidth
> > from VM to VM was not close to the physical NIC, it's about 4.3Gbps on a
> > 10Gbps NIC. For no dpdk (virtio-net) VMs, the iperf3 test can easily
> > reach 9.3Gbps. We enabled the virtio multiqueue for all guest VMs. In the
> > dpdk vhostuser guest, we noticed that the interrupts are centralized to
> > only one queue. But for no dpdk VM, interrupts can hash to all queues.
> > For those dpdk vhostuser VMs, we also noticed that the PMD usages were
> > also centralized to one no matter server(tx) or client(rx). And no matter
> > one PMD or multiple PMDs, this behavior always exists.
> >
> > Furthuremore, my colleague add some systemtap hook on the openvswitch
> > function, he found something interesting. The function
> > __netdev_dpdk_vhost_send will send all the packets to one
> virtionet-queue.
> > Seems that there are some algorithm/hash table/logic does not do the hash
> > very well.
> >
>
> Hi,
>
> When you say "no dpdk VMs", you mean that within your VM you're relying
> on the Kernel to get the packets, using virtio-net. And when you say
> "dpdk vhostuser guest", you mean you're using DPDK inside the VM to get
> the packets. Is this correct?


Sorry for the inaccurate description. I'm really new to DPDK.
No DPDK inside VM, all these settings are for host only.
(`host` means the hypervisor physical machine in the perspective of
virtualization.
On the other hand `guest` means the virtual machine.)
"no dpdk VMs" means the host does not setup DPDK (ovs is working in
traditional way),
the VMs were boot on that. Maybe a new name `VMs-on-NO-DPDK-host`?

If so, could you also tell us which DPDK app you're using inside of
> those VMs? Is it testpmd? If so, how are you setting the `--rxq` and
> `--txq` args? Otherwise, how are you setting those in your app when
> initializing DPDK?
>

Inside VM, there is no DPDK app, VM kernel also
does not set any config related to DPDK. `iperf3` is the tool for bandwidth
testing.

The information below is useful in telling us how you're setting your
> configurations in OvS, but we are still missing the configurations
> inside the VM.
>
> This should help us in getting more information,
>
>
Maybe you have noticed that, we only setup one PMD in the pasted
configurations.
But VM has 8 queues. Should the pmd quantity match the queues?

Tiago.
>
> > So I'd like to find some help from the community. Maybe I'm missing some
> > configrations.
> >
> > Thanks.
> >
> >
> > Here is the list of the environment and some configrations:
> > # uname -r
> > 3.10.0-862.11.6.el7.x86_64
> > # rpm -qa|grep dpdk
> > dpdk-17.11-11.el7.x86_64
> > # rpm -qa|grep openvswitch
> > openvswitch-2.9.0-3.el7.x86_64
> > # ovs-vsctl list open_vswitch
> > _uuid   : a6a3d9eb-28a8-4bf0-a8b4-94577b5ffe5e
> > bridges : [531e4bea-ce12-402a-8a07-7074c31b978e,
> > 5c1675e2-5408-4c1f-88bc-6d9c9b932d47]
> > cur_cfg : 1305
> > datapath_types  : [netdev, system]
> > db_version  : "7.15.1"
> > external_ids: {hostname="cq01-compute-10e112e5e140",
> > rundir="/var/run/openvswitch",
> > system-id="e2cc84fe-a3c8-455f-8c64-260741c141ee"}
> > iface_types : [dpdk, dpdkr, dpdkvhostuser, dpdkvhostuserclient,
> > geneve, gre, internal, lisp, patch, stt, system, tap, vxlan]
> > manager_options : [43803994-272b-49cb-accc-ab672d1eefc8]
> > next_cfg: 1305
> > other_config: {dpdk-init="true", dpdk-lcore-mask="0x1",
> > dpdk-socket-mem="1024,1024", pmd-cpu-mask="0x10",
> > vhost-iommu-support="true"}
> > ovs_version : "2.9.0"
> > ssl : []
> > statistics  : {}
> > system_type : centos
> > system_version  : "7"
> > # lsmod |grep vfio
> > vfio_pci   41312  2
> > vfio_iommu_type1   22300  1
> > vfio   32695  7 vfio_iommu_type1,vfio_pci
> > irqbypass  13503  23 kvm,vfio_pci
> >
> > # ovs-appctl dpif/show
> > netdev@ovs-netdev: hit:759366335 missed:754283
> > br-ex:
> > bond1108 4/6: (tap)
> > br-ex 65534/3: (tap)
> > nic-10G-1 5/4: (dpdk: configured_rx_queues=8,
> > configured_rxq_descriptors=2048, configured_tx_queues=2,
>

[ovs-discuss] [ovs-dpdk] bandwidth issue of vhostuserclient virtio ovs-dpdk

2018-11-29 Thread LIU Yulong
d_txq_descriptors=2048, mtu=1500, requested_rx_queues=8,
requested_rxq_descriptors=2048, requested_tx_queues=2,
requested_txq_descriptors=2048, rx_csum_offload=true)
RX packets:5319466 errors:0 dropped:0 overruns:? frame:?
TX packets:0 errors:0 dropped:0 aborted:? carrier:?
collisions:?
RX bytes:344903551 (328.9 MiB)  TX bytes:0
port 6: bond1108 (tap)
RX packets:228 errors:0 dropped:0 overruns:0 frame:0
TX packets:5460 errors:0 dropped:18 aborted:0 carrier:0
collisions:0
RX bytes:21459 (21.0 KiB)  TX bytes:341087 (333.1 KiB)

# ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 20:
packets received: 760120690
packet recirculations: 0
avg. datapath passes per packet: 1.00
emc hits: 750787577
megaflow hits: 8578758
avg. subtable lookups per megaflow hit: 1.05
miss with success upcall: 754283
miss with failed upcall: 72
avg. packets per output batch: 2.21
idle cycles: 210648140144730 (99.13%)
processing cycles: 1846745927216 (0.87%)
avg cycles per packet: 279554.14 (212494886071946/760120690)
avg processing cycles per packet: 2429.54 (1846745927216/760120690)
main thread:
packets received: 0
packet recirculations: 0
avg. datapath passes per packet: 0.00
emc hits: 0
megaflow hits: 0
avg. subtable lookups per megaflow hit: 0.00
miss with success upcall: 0
miss with failed upcall: 0
avg. packets per output batch: 0.00

# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 20:
isolated : false
port: nic-10G-1queue-id:  0 pmd usage:  0 %
port: nic-10G-1queue-id:  1 pmd usage:  0 %
port: nic-10G-1queue-id:  2 pmd usage:  0 %
port: nic-10G-1queue-id:  3 pmd usage:  0 %
port: nic-10G-1queue-id:  4 pmd usage:  0 %
port: nic-10G-1queue-id:  5 pmd usage:  0 %
port: nic-10G-1queue-id:  6 pmd usage:  0 %
port: nic-10G-1queue-id:  7 pmd usage:  0 %
port: nic-10G-2queue-id:  0 pmd usage:  0 %
port: nic-10G-2queue-id:  1 pmd usage:  0 %
port: nic-10G-2queue-id:  2 pmd usage:  0 %
port: nic-10G-2queue-id:  3 pmd usage:  0 %
port: nic-10G-2queue-id:  4 pmd usage:  0 %
port: nic-10G-2queue-id:  5 pmd usage:  0 %
port: nic-10G-2queue-id:  6 pmd usage:  0 %
port: nic-10G-2queue-id:  7 pmd usage:  0 %
port: vhu76f9a623-9f  queue-id:  0 pmd usage:  0 %
port: vhu76f9a623-9f  queue-id:  1 pmd usage:  0 %
port: vhu76f9a623-9f  queue-id:  2 pmd usage:  0 %
port: vhu76f9a623-9f  queue-id:  3 pmd usage:  0 %
port: vhu76f9a623-9f  queue-id:  4 pmd usage:  0 %
port: vhu76f9a623-9f  queue-id:  5 pmd usage:  0 %
port: vhu76f9a623-9f  queue-id:  6 pmd usage:  0 %
port: vhu76f9a623-9f  queue-id:  7 pmd usage:  0 %


# virsh dumpxml instance-5c5191ff-c1a2-4429-9a8b-93ddd939583d
...

  
  
  
  
  
  
  

...

# ovs-vsctl show
a6a3d9eb-28a8-4bf0-a8b4-94577b5ffe5e
Manager "ptcp:6640:127.0.0.1"
is_connected: true
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port int-br-ex
Interface int-br-ex
type: patch
options: {peer=phy-br-ex}
Port br-int
Interface br-int
type: internal
Port "vhu76f9a623-9f"
tag: 1
Interface "vhu76f9a623-9f"
type: dpdkvhostuserclient
options: {n_rxq="8",
vhost-server-path="/var/lib/vhost_sockets/vhu76f9a623-9f"}
Bridge br-ex
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port dpdkbond
Interface "nic-10G-1"
type: dpdk
options: {dpdk-devargs=":01:00.0", n_rxq="8", n_txq="8"}
Interface "nic-10G-2"
type: dpdk
options: {dpdk-devargs=":05:00.1", n_rxq="8", n_txq="8"}
Port phy-br-ex
Interface phy-br-ex
type: patch
options: {peer=int-br-ex}
Port br-ex
Interface br-ex
type: internal

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 0 size: 130978 MB
node 0 free: 7539 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 1 size: 131072 MB
node 1 free: 6886 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

# grep HugePages_ /proc/meminfo
HugePages_Total: 232
HugePages_Free:   10
HugePages_Rsvd:0
HugePages_Surp:0


# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.11.6.el7.x86_64
root=UUID=220ee106-5e00-4809-91a0-641e045a4c21 ro intel_idle.max_cstate=0
crashkernel=auto rhgb quiet default_hugepagesz=1G hugepagesz=1G
hugepages=232 iommu=pt intel_iommu=on


Best regards,
LIU Yulong
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] Failed to build Open vSwitch Kernel Modules on CentOS 7 (kernel 3.10.0-514.2.2.el7.x86_64)

2017-01-11 Thread LIU Yulong
Great, that patch works for me.
After cherry-picked that patch to 2.6.1, we can also build the 2.6.1 ovs
kmod.
Thank you very much.

On Thu, Jan 12, 2017 at 10:21 AM, Joe Stringer <j...@ovn.org> wrote:

> Does your master include this commit?
>
> https://github.com/openvswitch/ovs/commit/6ccf21ca77ec092aa6
> 3b3daff66dc9f0d0e1be93
>
> On 11/01/2017 18:14, "liu yulong" <liuyulong...@gmail.com> wrote:
>
> Thanks Joe and Ben, actually the master branch we also got such error.
> The doc we followed is "Fedora, RHEL 7.x Packaging for Open vSwitch" [1].
> The conflicting shown in [2] was basically same.
>
> [1] https://github.com/openvswitch/ovs/blob/master/Documentation
> /intro/install/fedora.rst
> [2] http://paste.openstack.org/show/594350/
>
> On Thu, Jan 12, 2017 at 7:51 AM, Joe Stringer <j...@ovn.org> wrote:
>
>> On 11 January 2017 at 15:38, Ben Pfaff <b...@ovn.org> wrote:
>> > On Wed, Jan 11, 2017 at 03:03:45PM -0800, Joe Stringer wrote:
>> >> On 9 January 2017 at 19:01, liu yulong <liuyulong...@gmail.com> wrote:
>> >> > Hi experts,
>> >> >
>> >> > We have failed to build Open vSwitch Kernel Modules on CentOS 7
>> (kernel
>> >> > 3.10.0-514.2.2.el7.x86_64).
>> >> >
>> >> > Here are some traces we got:
>> >> > http://paste.openstack.org/show/594350/
>> >> >
>> >> > Steps:
>> >> > 1. download the current openvswitch release:
>> >> > http://openvswitch.org/releases/openvswitch-2.6.1.tar.gz
>> >> >
>> >> > 2. rpmbuild
>> >> > (1) prepare the SOURCE
>> >> > cp openvswitch-2.6.1.tar.gz ~/rpmbuild/SOURCES/
>> >> > tar -zxvf openvswitch-2.6.1.tar.gz
>> >> > cp ./openvswitch-2.6.1/rhel/* ~/rpmbuild/SOURCES/
>> >> > cp ./openvswitch-2.6.1/rhel/*.spec ~/rpmbuild/SPECS/
>> >> >
>> >> >
>> >> > (2) edit ~/rpmbuild/SPECS/openvswitch-kmod-fedora.spec
>> >> > change the #%define kernel to:
>> >> > #%define kernel 3.10.0-514.2.2.el7.x86_64
>> >> >
>> >> > (3) start build
>> >> > rpmbuild -bb --without check ~/rpmbuild/SPECS/openvswitch-k
>> mod-fedora.spec
>> >> >
>> >> > Then we get that error. So can anyone help to solve such issue?
>> >> > Thank you.
>> >>
>> >> If you want to use the kernel module from the OVS tree, you need to
>> >> use master or wait for the next version of OVS. Alternatively you can
>> >> skip using the kernel module from OVS tree and only compile the
>> >> userspace programs, then use the kernel module that is provided with
>> >> Centos 7.
>> >
>> > Maybe liu is confused because the FAQ that comes with OVS 2.6.1 says
>> > that the kernel module should work with Linux 3.10.  Maybe it does not
>> > work because Centos kernels diverge from upstream.
>>
>> True, it's a bit confusing. The FAQ distributed with 2.6 specifically
>> states the supported versions, with this caveat:
>>
>> "The Linux kernel versions are upstream kernel versions, so Linux
>> kernels modified from the upstream sources may not build in some cases
>> even if they are based on a supported version. This is most notably
>> true of Red Hat Enterprise Linux (RHEL) kernels, which are extensively
>> modified from upstream."
>>
>> https://github.com/openvswitch/ovs/blob/branch-2.6/FAQ.md#q-
>> what-linux-kernel-versions-does-each-open-vswitch-release-work-with
>>
>
>
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Failed to build Open vSwitch Kernel Modules on CentOS 7 (kernel 3.10.0-514.2.2.el7.x86_64)

2017-01-11 Thread liu yulong
Hi experts,

We have failed to build Open vSwitch Kernel Modules on CentOS 7 (kernel
3.10.0-514.2.2.el7.x86_64).

Here are some traces we got:
http://paste.openstack.org/show/594350/

Steps:
1. download the current openvswitch release:
http://openvswitch.org/releases/openvswitch-2.6.1.tar.gz

2. rpmbuild
(1) prepare the SOURCE
cp openvswitch-2.6.1.tar.gz ~/rpmbuild/SOURCES/
tar -zxvf openvswitch-2.6.1.tar.gz
cp ./openvswitch-2.6.1/rhel/* ~/rpmbuild/SOURCES/
cp ./openvswitch-2.6.1/rhel/*.spec ~/rpmbuild/SPECS/


(2) edit ~/rpmbuild/SPECS/openvswitch-kmod-fedora.spec
change the #%define kernel to:
#%define kernel 3.10.0-514.2.2.el7.x86_64

(3) start build
rpmbuild -bb --without check ~/rpmbuild/SPECS/openvswitch-kmod-fedora.spec

Then we get that error. So can anyone help to solve such issue?
Thank you.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss