Re: [vpp-dev] process node suspended indefinitely

2023-03-12 Thread Sudhir CR via lists.fd.io
Hi Dave,
we are using VPP Version *21.10.*

Thanks and regards,
Sudhir

On Fri, Mar 10, 2023 at 5:31 PM Dave Barach  wrote:

> I should have had the sense to ask this earlier: which version of vpp are
> you using?
>
>
>
> The line number in your debug snippet is more than 100 lines off from
> master/latest. The timer wheel code has been relatively untouched, but
> there have been several important fixes over the years...
>
>
>
> D.
>
>
>
> diff --git a/src/vlib/main.c b/src/vlib/main.c
> index af0fcd1cb..55c231d8b 100644
> --- a/src/vlib/main.c
> +++ b/src/vlib/main.c
> @@ -1490,6 +1490,9 @@ dispatch_suspended_process (vlib_main_t * vm,
>  }
>else
>  {
> +   if (strcmp((char *)node->name, "rtb-vpp-epoll-process") == 0) {
> +   ASSERT(0);
> +   }
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Sudhir
> CR via lists.fd.io
> *Sent:* Thursday, March 9, 2023 4:00 AM
> *To:* vpp-dev@lists.fd.io
> *Cc:* rtbrick@lists.fd.io
> *Subject:* Re: [vpp-dev] process node suspended indefinitely
>
>
>
> Hi Dave,
>
> Please excuse my delayed response. It took some time to recreate this
> issue.
>
> I made changes to our process node as per your suggestion. now our process
> node code looks like this
>
>
>
> while (1) {
>
> vlib_process_wait_for_event_or_clock (vm,
> RTB_VPP_EPOLL_PROCESS_NODE_TIMER);
> event_type = vlib_process_get_events (vm, _data);
> vec_reset_length(event_data);
>
> switch (event_type) {
> case ~0: /* handle timer expirations */
> rtb_event_loop_run_once ();
> break;
>
> default: /* bug! */
> ASSERT (0);
> }
> }
>
> After these changes we didn't observe any assertions but we hit the
> process node suspend issue. with this it is clear other than time out we
> are not getting any other events.
>
>
>
> In the issue state I have collected vlib_process node
> (rtb_vpp_epoll_process) flags value and it seems to be correct (flags = 11).
>
>
>
> Please find the vlib_process_t and vlib_node_t data structure values
> collected in the issue state below.
>
>
>
> vlib_process_t:
>
> 
>
> $38 = {
>   cacheline0 = 0x7f9b2da50380 "\200~\274+\233\177",
>   node_runtime = {
> cacheline0 = 0x7f9b2da50380 "\200~\274+\233\177",
> function = 0x7f9b2bbc7e80 ,
> errors = 0x7f9b3076a560,
> clocks_since_last_overflow = 0,
> max_clock = 3785970526,
> max_clock_n = 0,
> calls_since_last_overflow = 0,
> vectors_since_last_overflow = 0,
> next_frame_index = 1668,
> node_index = 437,
> input_main_loops_per_call = 0,
> main_loop_count_last_dispatch = 4147405645,
> main_loop_vector_stats = {0, 0},
> flags = 0,
> state = 0,
> n_next_nodes = 0,
> cached_next_index = 0,
> thread_index = 0,
> runtime_data = 0x7f9b2da503c6 ""
>   },
>   return_longjmp = {
> regs = {94502584873984, 140304430422064, 140306731463680,
> 94502584874048, 94502640552512, 0, 140304430422032, 140306703608766}
>   },
>   resume_longjmp = {
> regs = {94502584873984, 140304161734368, 140306731463680,
> 94502584874048, 94502640552512, 0, 140304161734272, 140304430441787}
>   },
>   *flags = 11, *
>   log2_n_stack_bytes = 16,
>   suspended_process_frame_index = 0,
>   n_suspends = 0,
>   pending_event_data_by_type_index = 0x7f9b307b8310,
>   non_empty_event_type_bitmap = 0x7f9b307b8390,
>   one_time_event_type_bitmap = 0x0,
>   event_type_index_by_type_opaque = 0x7f9b2dab8bd8,
>   event_type_pool = 0x7f9b2dcb5978,
>   resume_clock_interval = 1000,
>   stop_timer_handle = 3098,
>   output_function = 0x0,
>   output_function_arg = 0,
>   stack = 0x7f9b1bb78000
> }
>
>
>
> vlib_node_t
>
> =
>
>  (gdb) p *n
>
> $17 = {
>   function = 0x7f9b2bbc7e80 ,
>   name = 0x7f9b3076a3f0 "rtb-vpp-epoll-process",
>   name_elog_string = 11783,
>   stats_total = {
> calls = 0,
> vectors = 0,
> clocks = 1971244932732,
> suspends = 6847366,
> max_clock = 3785970526,
> max_clock_n = 0
>   },
>   stats_last_clear = {
> calls = 0,
> vectors = 0,
> clocks = 0,
> suspends = 0,
> max_clock = 0,
> max_clock_n = 0
>   },
>   type = VLIB_NODE_TYPE_PROCESS,
>   index = 437,
>   runtime_index = 40,
>   runtime_data = 0x0,
>   flags = 0,
>   state = 0 '\000',
>   runtime_data_bytes = 0 '\000',
>   protocol_hint = 0 '\000',
>   n

Re: [vpp-dev] process node suspended indefinitely

2023-03-10 Thread Dave Barach
I should have had the sense to ask this earlier: which version of vpp are you 
using? 

 

The line number in your debug snippet is more than 100 lines off from 
master/latest. The timer wheel code has been relatively untouched, but there 
have been several important fixes over the years...

 

D.

 

diff --git a/src/vlib/main.c b/src/vlib/main.c
index af0fcd1cb..55c231d8b 100644
--- a/src/vlib/main.c
+++ b/src/vlib/main.c
@@ -1490,6 +1490,9 @@ dispatch_suspended_process (vlib_main_t * vm,
 }
   else
 {
+   if (strcmp((char *)node->name, "rtb-vpp-epoll-process") == 0) {
+   ASSERT(0);
+   }

 

From: vpp-dev@lists.fd.io  On Behalf Of Sudhir CR via 
lists.fd.io
Sent: Thursday, March 9, 2023 4:00 AM
To: vpp-dev@lists.fd.io
Cc: rtbrick@lists.fd.io
Subject: Re: [vpp-dev] process node suspended indefinitely

 

Hi Dave,

Please excuse my delayed response. It took some time to recreate this issue.

I made changes to our process node as per your suggestion. now our process node 
code looks like this

 

while (1) {

vlib_process_wait_for_event_or_clock (vm, 
RTB_VPP_EPOLL_PROCESS_NODE_TIMER);
event_type = vlib_process_get_events (vm, _data);
vec_reset_length(event_data);

switch (event_type) {
case ~0: /* handle timer expirations */
rtb_event_loop_run_once ();
break;

default: /* bug! */
ASSERT (0);
}
}

After these changes we didn't observe any assertions but we hit the process 
node suspend issue. with this it is clear other than time out we are not 
getting any other events.

 

In the issue state I have collected vlib_process node (rtb_vpp_epoll_process) 
flags value and it seems to be correct (flags = 11).

 

Please find the vlib_process_t and vlib_node_t data structure values collected 
in the issue state below.

 

vlib_process_t:



$38 = {
  cacheline0 = 0x7f9b2da50380 "\200~\274+\233\177", 
  node_runtime = {
cacheline0 = 0x7f9b2da50380 "\200~\274+\233\177", 
function = 0x7f9b2bbc7e80 , 
errors = 0x7f9b3076a560, 
clocks_since_last_overflow = 0, 
max_clock = 3785970526, 
max_clock_n = 0, 
calls_since_last_overflow = 0, 
vectors_since_last_overflow = 0, 
next_frame_index = 1668, 
node_index = 437, 
input_main_loops_per_call = 0, 
main_loop_count_last_dispatch = 4147405645, 
main_loop_vector_stats = {0, 0}, 
flags = 0, 
state = 0, 
n_next_nodes = 0, 
cached_next_index = 0, 
thread_index = 0, 
runtime_data = 0x7f9b2da503c6 ""
  }, 
  return_longjmp = {
regs = {94502584873984, 140304430422064, 140306731463680, 94502584874048, 
94502640552512, 0, 140304430422032, 140306703608766}
  }, 
  resume_longjmp = {
regs = {94502584873984, 140304161734368, 140306731463680, 94502584874048, 
94502640552512, 0, 140304161734272, 140304430441787}
  }, 
  flags = 11, 
  log2_n_stack_bytes = 16, 
  suspended_process_frame_index = 0, 
  n_suspends = 0, 
  pending_event_data_by_type_index = 0x7f9b307b8310, 
  non_empty_event_type_bitmap = 0x7f9b307b8390, 
  one_time_event_type_bitmap = 0x0, 
  event_type_index_by_type_opaque = 0x7f9b2dab8bd8, 
  event_type_pool = 0x7f9b2dcb5978, 
  resume_clock_interval = 1000, 
  stop_timer_handle = 3098, 
  output_function = 0x0, 
  output_function_arg = 0, 
  stack = 0x7f9b1bb78000
}

 

vlib_node_t

=

 (gdb) p *n

$17 = {
  function = 0x7f9b2bbc7e80 , 
  name = 0x7f9b3076a3f0 "rtb-vpp-epoll-process", 
  name_elog_string = 11783, 
  stats_total = {
calls = 0, 
vectors = 0, 
clocks = 1971244932732, 
suspends = 6847366, 
max_clock = 3785970526, 
max_clock_n = 0
  }, 
  stats_last_clear = {
calls = 0, 
vectors = 0, 
clocks = 0, 
suspends = 0, 
max_clock = 0, 
max_clock_n = 0
  }, 
  type = VLIB_NODE_TYPE_PROCESS, 
  index = 437, 
  runtime_index = 40, 
  runtime_data = 0x0, 
  flags = 0, 
  state = 0 '\000', 
  runtime_data_bytes = 0 '\000', 
  protocol_hint = 0 '\000', 
  n_errors = 0, 
  scalar_size = 0, 
  vector_size = 0, 
  error_heap_handle = 0, 
  error_heap_index = 0, 
  error_counters = 0x0, 
  next_node_names = 0x7f9b3076a530, 
  next_nodes = 0x0, 
  sibling_of = 0x0, 
  sibling_bitmap = 0x0, 
  n_vectors_by_next_node = 0x0, 
  next_slot_by_node = 0x0, 
  prev_node_bitmap = 0x0, 
  owner_node_index = 4294967295, 
  owner_next_index = 4294967295, 
  format_buffer = 0x0, 
  unformat_buffer = 0x0, 
  format_trace = 0x0, 
  validate_frame = 0x0, 
  state_string = 0x0, 
  node_fn_registrations = 0x0
}

 

I added an assert statement before clearing VLIB_PROCESS_IS_RUNNING flag in 
dispatch_suspended_process function.

But this assert statement is not hitting.

 

diff --git a/src/vlib/main.c b/src/vlib/main.c
index af0fcd1cb..55c231d8b 100644
--- a/src/vlib/main.c
+++ b/src/vlib/main.c
@@ -1490,6 +1490,9 @@ dispatc

Re: [vpp-dev] process node suspended indefinitely

2023-03-10 Thread Sudhir CR via lists.fd.io
Hi jinsh,
Thanks for the help.
I placed assert statement in *vlib_process_signal_event_**helper* function. But
in this place the assert statement didn't hit.
When I debugged further I found that my process node is not there in
the *data_from_advancing_timing_wheel
*vector.
i believe due to this process node is not getting called.now i am checking
why  *rtb-**vpp-epoll-process *node entry is not present in
*data_from_advancing_timing_wheel
*vector*.*

Thanks and regards,
Sudhir

On Thu, Mar 9, 2023 at 9:08 PM jinsh11  wrote:

>
>-
>
>I think you can query who stopped the current node's time wheel.
>
>always_inline void *
>
>vlib_process_signal_event_helper (vlib_node_main_t * nm,
>
>  vlib_node_t * n,
>
>  vlib_process_t * p,
>
>  uword t,
>
>  uword n_data_elts, uword n_data_elt_bytes)
>
>{
>
>
>
>if (add_to_pending)
>
>{
>
>  u32 x = vlib_timing_wheel_data_set_suspended_process
>(n->runtime_index);
>
>  p->flags = p_flags | VLIB_PROCESS_RESUME_PENDING;
>
>  vec_add1 (nm->data_from_advancing_timing_wheel, x);
>
>  if (delete_from_wheel){
>
>TW (tw_timer_stop) ((TWT (tw_timer_wheel) *) nm->timing_wheel,
>
>p->stop_timer_handle);
>
>   *vlib_process_t *p1 = vec_elt (nm->processes,
>vlib_get_node_by_name(vm,"rtb-vpp-epoll-process”)->runtime_index);*
>
>*  If ((p != p1 && (p-> stop_timer_handle ==
>p1->stop_timer_handle))*
>
>*  {*
>
>*  ASSERT();*
>
>*   }*
>}
>
>}
>
>}
>
>
> 
>
>

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22688): https://lists.fd.io/g/vpp-dev/message/22688
Mute This Topic: https://lists.fd.io/mt/97032803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] process node suspended indefinitely

2023-03-09 Thread jinsh11
* 

I think you can query who stopped the current node's time wheel.

always_inline void *

vlib_process_signal_event_helper (vlib_node_main_t * nm,

vlib_node_t * n,

vlib_process_t * p,

uword t,

uword n_data_elts, uword n_data_elt_bytes)

{

if (add_to_pending)

{

u32 x = vlib_timing_wheel_data_set_suspended_process (n->runtime_index);

p->flags = p_flags | VLIB_PROCESS_RESUME_PENDING;

vec_add1 (nm->data_from_advancing_timing_wheel, x);

if (delete_from_wheel){

TW (tw_timer_stop) ((TWT (tw_timer_wheel) *) nm->timing_wheel,

p->stop_timer_handle);

*vlib_process_t *p1 = vec_elt (nm->processes, vlib_get_node_by_name(vm, 
"rtb-vpp-epoll-process”)-> runtime_index);*

*If ((p != p1 && (p-> stop_timer_handle == p1 -> stop_timer_handle))*

*{*

*ASSERT();*

*}*
}

}

}

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22681): https://lists.fd.io/g/vpp-dev/message/22681
Mute This Topic: https://lists.fd.io/mt/97032803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] process node suspended indefinitely

2023-03-09 Thread jinsh11
I think you can assert in this function:,

always_inline void *

vlib_process_signal_event_helper (vlib_node_main_t * nm,

vlib_node_t * n,

vlib_process_t * p,

uword t,

uword n_data_elts, uword n_data_elt_bytes)

{

...

if (add_to_pending)

{

u32 x = vlib_timing_wheel_data_set_suspended_process (n->runtime_index);

p->flags = p_flags | VLIB_PROCESS_RESUME_PENDING;

vec_add1 (nm->data_from_advancing_timing_wheel, x);

if (delete_from_wheel)

TW (tw_timer_stop) ((TWT (tw_timer_wheel) *) nm->timing_wheel,

p->stop_timer_handle);

** *P2 = vlib_get_node_by_name(vm, "rtb-vpp-epoll-process”)-> 
stop_timer_handle);*

*If ((p != p1 && (p-> stop_timer_handle == p2 -> stop_timer_handle))*

*{*

*ASSERT();*

*}*

}



}

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22680): https://lists.fd.io/g/vpp-dev/message/22680
Mute This Topic: https://lists.fd.io/mt/97032803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] process node suspended indefinitely

2023-03-09 Thread Sudhir CR via lists.fd.io
 Soon I will update my results and findings in this mail thread.
>
> Thanks and Regards,
> Sudhir
>
> On Fri, Mar 3, 2023 at 12:37 PM chetan bhasin 
> wrote:
>
>> Hi Sudhir,
>>
>> Is your issue resolved?
>>
>> Actually we are facing same issue on vpp.2106.
>> In our case "api-rx-ring" is not getting called.
>> in our usecase workers are calling some functions in main-thread context
>> leading to RPC message and memory is allocated from api section.
>> This leads to Api-segment memory is used fully and leads to crash.
>>
>> Thanks,
>> Chetan
>>
>>
>> On Mon, Feb 20, 2023, 18:24 Sudhir CR via lists.fd.io > rtbrick@lists.fd.io> wrote:
>>
>>> Hi Dave,
>>> Thank you very much for your inputs. I will try this out and get back to
>>> you with the results.
>>>
>>> Regards,
>>> Sudhir
>>>
>>> On Mon, Feb 20, 2023 at 6:01 PM Dave Barach  wrote:
>>>
>>>> Please try something like this, to eliminate the possibility that some
>>>> bit of code is sending this process an event. It’s not a good idea to skip
>>>> the vec_reset_length (event_data) step.
>>>>
>>>>
>>>>
>>>> while (1)
>>>>
>>>> {
>>>>
>>>>uword event_type, * event_data = 0;
>>>>
>>>>int i;
>>>>
>>>>
>>>>
>>>>vlib_process_wait_for_event_or_clock (vm, 1e-2 /* 10 ms */);
>>>>
>>>>
>>>>
>>>>event_type = vlib_process_get_events (vm, _data);
>>>>
>>>>
>>>>
>>>>switch (event_type) {
>>>>
>>>>   case ~0: /* handle timer expirations */
>>>>
>>>>rtb_event_loop_run_once ();
>>>>
>>>>break;
>>>>
>>>>
>>>>
>>>>default: /* bug! */
>>>>
>>>>ASSERT (0);
>>>>
>>>>}
>>>>
>>>>
>>>>
>>>>vec_reset_length(event_data);
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Sudhir
>>>> CR via lists.fd.io
>>>> *Sent:* Monday, February 20, 2023 4:02 AM
>>>> *To:* vpp-dev@lists.fd.io
>>>> *Subject:* Re: [vpp-dev] process node suspended indefinitely
>>>>
>>>>
>>>>
>>>> Hi Dave,
>>>> Thank you for your response and help.
>>>>
>>>>
>>>>
>>>> Please find the additional details below.
>>>>
>>>> VPP Version *21.10*
>>>>
>>>>
>>>> We are creating a process node* rtb-vpp-epoll-process *to handle
>>>> control plane events like interface add/delete, route add/delete.
>>>> This process node waits for *10ms* of time (Not Interested in any
>>>> events ) once 10ms is expired it will process control plane events
>>>> mentioned above.
>>>>
>>>> code snippet looks like below
>>>>
>>>>
>>>>
>>>> ```
>>>>
>>>> static uword
>>>> rtb_vpp_epoll_process (vlib_main_t *vm,
>>>>vlib_node_runtime_t  *rt,
>>>>vlib_frame_t *f)
>>>> {
>>>>
>>>> ...
>>>> ...
>>>> while (1) {
>>>> vlib_process_wait_for_event_or_clock (vm, 10e-3);
>>>> vlib_process_get_events (vm, NULL);
>>>>
>>>> rtb_event_loop_run_once();   *< controlplane events
>>>> handling*
>>>> }
>>>> }
>>>> ```
>>>>
>>>> What we observed is that sometimes (when there is a high controlplane
>>>> load like request to install more routes) "rtb-vpp-epoll-process" is
>>>> suspended and not scheduled furever. this we found by using "show runtime
>>>> rtb-vpp-epoll-process"*  (*in "show runtime rtb-vpp-epoll-process"
>>>> command output suspends counter is not incrementing.)
>>>>
>>>> *show runtime output in working case :*
>>>>
>>>>
>>>> ```
>>>> DBGvpp# show runtime rtb-vpp-epoll-process
>>>>  Name State Calls  V

Re: [vpp-dev] process node suspended indefinitely

2023-03-02 Thread Sudhir CR via lists.fd.io
Hi Chetan,
In our case we are observing this issue occasionally exact steps  to
recreate the issue are not known.
I made changes to our process node as suggested by dave and with these
changes trying to recreate the issue.

Soon I will update my results and findings in this mail thread.

Thanks and Regards,
Sudhir

On Fri, Mar 3, 2023 at 12:37 PM chetan bhasin 
wrote:

> Hi Sudhir,
>
> Is your issue resolved?
>
> Actually we are facing same issue on vpp.2106.
> In our case "api-rx-ring" is not getting called.
> in our usecase workers are calling some functions in main-thread context
> leading to RPC message and memory is allocated from api section.
> This leads to Api-segment memory is used fully and leads to crash.
>
> Thanks,
> Chetan
>
>
> On Mon, Feb 20, 2023, 18:24 Sudhir CR via lists.fd.io  rtbrick@lists.fd.io> wrote:
>
>> Hi Dave,
>> Thank you very much for your inputs. I will try this out and get back to
>> you with the results.
>>
>> Regards,
>> Sudhir
>>
>> On Mon, Feb 20, 2023 at 6:01 PM Dave Barach  wrote:
>>
>>> Please try something like this, to eliminate the possibility that some
>>> bit of code is sending this process an event. It’s not a good idea to skip
>>> the vec_reset_length (event_data) step.
>>>
>>>
>>>
>>> while (1)
>>>
>>> {
>>>
>>>uword event_type, * event_data = 0;
>>>
>>>int i;
>>>
>>>
>>>
>>>vlib_process_wait_for_event_or_clock (vm, 1e-2 /* 10 ms */);
>>>
>>>
>>>
>>>event_type = vlib_process_get_events (vm, _data);
>>>
>>>
>>>
>>>switch (event_type) {
>>>
>>>   case ~0: /* handle timer expirations */
>>>
>>>        rtb_event_loop_run_once ();
>>>
>>>break;
>>>
>>>
>>>
>>>default: /* bug! */
>>>
>>>ASSERT (0);
>>>
>>>}
>>>
>>>
>>>
>>>vec_reset_length(event_data);
>>>
>>> }
>>>
>>>
>>>
>>> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Sudhir
>>> CR via lists.fd.io
>>> *Sent:* Monday, February 20, 2023 4:02 AM
>>> *To:* vpp-dev@lists.fd.io
>>> *Subject:* Re: [vpp-dev] process node suspended indefinitely
>>>
>>>
>>>
>>> Hi Dave,
>>> Thank you for your response and help.
>>>
>>>
>>>
>>> Please find the additional details below.
>>>
>>> VPP Version *21.10*
>>>
>>>
>>> We are creating a process node* rtb-vpp-epoll-process *to handle
>>> control plane events like interface add/delete, route add/delete.
>>> This process node waits for *10ms* of time (Not Interested in any
>>> events ) once 10ms is expired it will process control plane events
>>> mentioned above.
>>>
>>> code snippet looks like below
>>>
>>>
>>>
>>> ```
>>>
>>> static uword
>>> rtb_vpp_epoll_process (vlib_main_t *vm,
>>>vlib_node_runtime_t  *rt,
>>>vlib_frame_t *f)
>>> {
>>>
>>> ...
>>> ...
>>> while (1) {
>>> vlib_process_wait_for_event_or_clock (vm, 10e-3);
>>> vlib_process_get_events (vm, NULL);
>>>
>>> rtb_event_loop_run_once();   *< controlplane events
>>> handling*
>>> }
>>> }
>>> ```
>>>
>>> What we observed is that sometimes (when there is a high controlplane
>>> load like request to install more routes) "rtb-vpp-epoll-process" is
>>> suspended and not scheduled furever. this we found by using "show runtime
>>> rtb-vpp-epoll-process"*  (*in "show runtime rtb-vpp-epoll-process"
>>> command output suspends counter is not incrementing.)
>>>
>>> *show runtime output in working case :*
>>>
>>>
>>> ```
>>> DBGvpp# show runtime rtb-vpp-epoll-process
>>>  Name State Calls  Vectors
>>>  *Suspends* Clocks   Vectors/Call
>>> rtb-vpp-epoll-process   any wait 0
>>> 0  *192246*  1.91e60.00
>>> DBGvpp#
>>>
>>> DBGvpp# show runtime rtb-vpp-epoll-process
>>>  Name  

Re: [vpp-dev] process node suspended indefinitely

2023-03-02 Thread chetan bhasin
Hi Sudhir,

Is your issue resolved?

Actually we are facing same issue on vpp.2106.
In our case "api-rx-ring" is not getting called.
in our usecase workers are calling some functions in main-thread context
leading to RPC message and memory is allocated from api section.
This leads to Api-segment memory is used fully and leads to crash.

Thanks,
Chetan


On Mon, Feb 20, 2023, 18:24 Sudhir CR via lists.fd.io  wrote:

> Hi Dave,
> Thank you very much for your inputs. I will try this out and get back to
> you with the results.
>
> Regards,
> Sudhir
>
> On Mon, Feb 20, 2023 at 6:01 PM Dave Barach  wrote:
>
>> Please try something like this, to eliminate the possibility that some
>> bit of code is sending this process an event. It’s not a good idea to skip
>> the vec_reset_length (event_data) step.
>>
>>
>>
>> while (1)
>>
>> {
>>
>>uword event_type, * event_data = 0;
>>
>>int i;
>>
>>
>>
>>vlib_process_wait_for_event_or_clock (vm, 1e-2 /* 10 ms */);
>>
>>
>>
>>event_type = vlib_process_get_events (vm, _data);
>>
>>
>>
>>switch (event_type) {
>>
>>   case ~0: /* handle timer expirations */
>>
>>rtb_event_loop_run_once ();
>>
>>break;
>>
>>
>>
>>default: /* bug! */
>>
>>        ASSERT (0);
>>
>>}
>>
>>
>>
>>vec_reset_length(event_data);
>>
>> }
>>
>>
>>
>> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Sudhir
>> CR via lists.fd.io
>> *Sent:* Monday, February 20, 2023 4:02 AM
>> *To:* vpp-dev@lists.fd.io
>> *Subject:* Re: [vpp-dev] process node suspended indefinitely
>>
>>
>>
>> Hi Dave,
>> Thank you for your response and help.
>>
>>
>>
>> Please find the additional details below.
>>
>> VPP Version *21.10*
>>
>>
>> We are creating a process node* rtb-vpp-epoll-process *to handle control
>> plane events like interface add/delete, route add/delete.
>> This process node waits for *10ms* of time (Not Interested in any events
>> ) once 10ms is expired it will process control plane events mentioned above.
>>
>> code snippet looks like below
>>
>>
>>
>> ```
>>
>> static uword
>> rtb_vpp_epoll_process (vlib_main_t *vm,
>>vlib_node_runtime_t  *rt,
>>vlib_frame_t *f)
>> {
>>
>> ...
>> ...
>> while (1) {
>> vlib_process_wait_for_event_or_clock (vm, 10e-3);
>> vlib_process_get_events (vm, NULL);
>>
>> rtb_event_loop_run_once();   *< controlplane events handling*
>>
>> }
>> }
>> ```
>>
>> What we observed is that sometimes (when there is a high controlplane
>> load like request to install more routes) "rtb-vpp-epoll-process" is
>> suspended and not scheduled furever. this we found by using "show runtime
>> rtb-vpp-epoll-process"*  (*in "show runtime rtb-vpp-epoll-process"
>> command output suspends counter is not incrementing.)
>>
>> *show runtime output in working case :*
>>
>>
>> ```
>> DBGvpp# show runtime rtb-vpp-epoll-process
>>  Name State Calls  Vectors
>>  *Suspends* Clocks   Vectors/Call
>> rtb-vpp-epoll-process   any wait 0
>> 0  *192246*  1.91e60.00
>> DBGvpp#
>>
>> DBGvpp# show runtime rtb-vpp-epoll-process
>>  Name State Calls  Vectors
>>  *Suspends* Clocks   Vectors/Call
>> rtb-vpp-epoll-process   any wait 0
>> 0  *193634*  1.89e60.00
>> DBGvpp#
>>
>> ```
>>
>>
>> *show runtime output in issue case :```*
>>
>> DBGvpp# show runtime rtb-vpp-epoll-process
>>
>>  Name State Calls  Vectors   
>>  *Suspends* Clocks   Vectors/Call
>>
>> rtb-vpp-epoll-process   any wait 0   0   
>> *81477*  7.08e60.00
>>
>> DBGvpp# show runtime rtb-vpp-epoll-process
>>
>>  Name State Calls  Vectors   
>>  *Suspends *Clocks   Vectors/Call
>>
>> rtb-vpp-epoll-process   any wait 

Re: [vpp-dev] process node suspended indefinitely

2023-02-20 Thread Sudhir CR via lists.fd.io
Hi Dave,
Thank you very much for your inputs. I will try this out and get back to
you with the results.

Regards,
Sudhir

On Mon, Feb 20, 2023 at 6:01 PM Dave Barach  wrote:

> Please try something like this, to eliminate the possibility that some bit
> of code is sending this process an event. It’s not a good idea to skip the
> vec_reset_length (event_data) step.
>
>
>
> while (1)
>
> {
>
>uword event_type, * event_data = 0;
>
>int i;
>
>
>
>vlib_process_wait_for_event_or_clock (vm, 1e-2 /* 10 ms */);
>
>
>
>event_type = vlib_process_get_events (vm, _data);
>
>
>
>switch (event_type) {
>
>   case ~0: /* handle timer expirations */
>
>rtb_event_loop_run_once ();
>
>break;
>
>
>
>default: /* bug! */
>
>ASSERT (0);
>
>}
>
>
>
>vec_reset_length(event_data);
>
> }
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Sudhir
> CR via lists.fd.io
> *Sent:* Monday, February 20, 2023 4:02 AM
> *To:* vpp-dev@lists.fd.io
> *Subject:* Re: [vpp-dev] process node suspended indefinitely
>
>
>
> Hi Dave,
> Thank you for your response and help.
>
>
>
> Please find the additional details below.
>
> VPP Version *21.10*
>
>
> We are creating a process node* rtb-vpp-epoll-process *to handle control
> plane events like interface add/delete, route add/delete.
> This process node waits for *10ms* of time (Not Interested in any events
> ) once 10ms is expired it will process control plane events mentioned above.
>
> code snippet looks like below
>
>
>
> ```
>
> static uword
> rtb_vpp_epoll_process (vlib_main_t *vm,
>vlib_node_runtime_t  *rt,
>vlib_frame_t *f)
> {
>
> ...
> ...
> while (1) {
> vlib_process_wait_for_event_or_clock (vm, 10e-3);
> vlib_process_get_events (vm, NULL);
>
> rtb_event_loop_run_once();   *< controlplane events handling*
> }
> }
> ```
>
> What we observed is that sometimes (when there is a high controlplane load
> like request to install more routes) "rtb-vpp-epoll-process" is suspended
> and not scheduled furever. this we found by using "show runtime
> rtb-vpp-epoll-process"*  (*in "show runtime rtb-vpp-epoll-process"
> command output suspends counter is not incrementing.)
>
> *show runtime output in working case :*
>
>
> ```
> DBGvpp# show runtime rtb-vpp-epoll-process
>  Name State Calls  Vectors
>*Suspends* Clocks   Vectors/Call
> rtb-vpp-epoll-process   any wait 0   0
>  *192246*  1.91e60.00
> DBGvpp#
>
> DBGvpp# show runtime rtb-vpp-epoll-process
>  Name State Calls  Vectors
>*Suspends* Clocks   Vectors/Call
> rtb-vpp-epoll-process   any wait 0   0
>  *193634*  1.89e60.00
> DBGvpp#
>
> ```
>
>
> *show runtime output in issue case :```*
>
> DBGvpp# show runtime rtb-vpp-epoll-process
>
>  Name State Calls  Vectors
> *Suspends* Clocks   Vectors/Call
>
> rtb-vpp-epoll-process   any wait 0   0
>*81477*  7.08e60.00
>
> DBGvpp# show runtime rtb-vpp-epoll-process
>
>  Name State Calls  Vectors
> *Suspends *Clocks   Vectors/Call
>
> rtb-vpp-epoll-process   any wait 0   0
>*81477*  7.08e60.00
>
> *```*
>
> Other process nodes like lldp-process,
> ip4-neighbor-age-process, ip6-ra-process running without any issue. only
> "rtb-vpp-epoll-process" process node suspended forever.
>
>
>
> Please let me know if any additional information is required.
>
> Hi Jinsh,
> Thanks for pointing me to the issue you faced. The issue I am facing looks
> similar.
> I will verify with the given patch.
>
>
> Thanks and Regards,
>
> Sudhir
>
>
>
> On Sun, Feb 19, 2023 at 6:19 AM jinsh11  wrote:
>
> HI:
>
>
>- I have the same problem,
>
> bfd process node stop running. I raised this issue,
>
> https://lists.fd.io/g/vpp-dev/message/22380
> I think there is a problem with the porcess scheduling module when using
> the time wheel.
>
>
>
>
>
> NOTICE TO RECIPIENT This e-mail message and any attac

Re: [vpp-dev] process node suspended indefinitely

2023-02-20 Thread Dave Barach
Please try something like this, to eliminate the possibility that some bit of 
code is sending this process an event. It’s not a good idea to skip the 
vec_reset_length (event_data) step.

 

while (1)

{

   uword event_type, * event_data = 0;

   int i;

 

   vlib_process_wait_for_event_or_clock (vm, 1e-2 /* 10 ms */);

 

   event_type = vlib_process_get_events (vm, _data);

 

   switch (event_type) {

  case ~0: /* handle timer expirations */

   rtb_event_loop_run_once ();

   break;

 

   default: /* bug! */

   ASSERT (0);

   }

 

   vec_reset_length(event_data);

}

 

From: vpp-dev@lists.fd.io  On Behalf Of Sudhir CR via 
lists.fd.io
Sent: Monday, February 20, 2023 4:02 AM
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] process node suspended indefinitely

 

Hi Dave,
Thank you for your response and help. 

 

Please find the additional details below.

VPP Version 21.10


We are creating a process node rtb-vpp-epoll-process to handle control plane 
events like interface add/delete, route add/delete.
This process node waits for 10ms of time (Not Interested in any events ) once 
10ms is expired it will process control plane events mentioned above.

code snippet looks like below 

 

```

static uword
rtb_vpp_epoll_process (vlib_main_t *vm,
   vlib_node_runtime_t  *rt,
   vlib_frame_t *f)
{

...
...
while (1) {
vlib_process_wait_for_event_or_clock (vm, 10e-3);
vlib_process_get_events (vm, NULL);

rtb_event_loop_run_once();   < controlplane events handling 
}  
}
``` 

What we observed is that sometimes (when there is a high controlplane load like 
request to install more routes) "rtb-vpp-epoll-process" is suspended and not 
scheduled furever. this we found by using "show runtime rtb-vpp-epoll-process"  
(in "show runtime rtb-vpp-epoll-process" command output suspends counter is not 
incrementing.)

show runtime output in working case :


```
DBGvpp# show runtime rtb-vpp-epoll-process
 Name State Calls  Vectors
Suspends Clocks   Vectors/Call  
rtb-vpp-epoll-process   any wait 0   0  
192246  1.91e60.00
DBGvpp# 

DBGvpp# show runtime rtb-vpp-epoll-process
 Name State Calls  Vectors
Suspends Clocks   Vectors/Call  
rtb-vpp-epoll-process   any wait 0   0  
193634  1.89e60.00
DBGvpp# 

``` 

show runtime output in issue case :
```

DBGvpp# show runtime rtb-vpp-epoll-process
 Name State Calls  Vectors
Suspends Clocks   Vectors/Call  
rtb-vpp-epoll-process   any wait 0   0  
 81477  7.08e60.00
DBGvpp# show runtime rtb-vpp-epoll-process
 Name State Calls  Vectors
Suspends Clocks   Vectors/Call  
rtb-vpp-epoll-process   any wait 0   0  
 81477  7.08e60.00

```

Other process nodes like lldp-process, ip4-neighbor-age-process, ip6-ra-process 
running without any issue. only "rtb-vpp-epoll-process" process node suspended 
forever. 

 

Please let me know if any additional information is required.

Hi Jinsh,
Thanks for pointing me to the issue you faced. The issue I am facing looks 
similar.
I will verify with the given patch.


Thanks and Regards,

Sudhir

 

On Sun, Feb 19, 2023 at 6:19 AM jinsh11 mailto:jins...@chinatelecom.cn> > wrote:

HI:



*   I have the same problem,

bfd process node stop running. I raised this issue,

https://lists.fd.io/g/vpp-dev/message/22380
I think there is a problem with the porcess scheduling module when using the 
time wheel.





 

NOTICE TO RECIPIENT This e-mail message and any attachments are confidential 
and may be privileged. If you received this e-mail in error, any review, use, 
dissemination, distribution, or copying of this e-mail is strictly prohibited. 
Please notify us immediately of the error by return e-mail and please delete 
this message from your system. For more information about Rtbrick, please visit 
us at www.rtbrick.com <http://www.rtbrick.com> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22605): https://lists.fd.io/g/vpp-dev/message/22605
Mute This Topic: https://lists.fd.io/mt/97032803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] process node suspended indefinitely

2023-02-20 Thread Sudhir CR via lists.fd.io
Hi Dave,
Thank you for your response and help.

Please find the additional details below.
VPP Version *21.10*

We are creating a process node* rtb-vpp-epoll-process *to handle control
plane events like interface add/delete, route add/delete.
This process node waits for *10ms* of time (Not Interested in any events )
once 10ms is expired it will process control plane events mentioned above.

code snippet looks like below

```
static uword
rtb_vpp_epoll_process (vlib_main_t *vm,
   vlib_node_runtime_t  *rt,
   vlib_frame_t *f)
{
...
...
while (1) {
vlib_process_wait_for_event_or_clock (vm, 10e-3);
vlib_process_get_events (vm, NULL);

rtb_event_loop_run_once();   *< controlplane events handling*
}
}
```
What we observed is that sometimes (when there is a high controlplane load
like request to install more routes) "rtb-vpp-epoll-process" is suspended
and not scheduled furever. this we found by using "show runtime
rtb-vpp-epoll-process"*  (*in "show runtime rtb-vpp-epoll-process" command
output suspends counter is not incrementing.)

*show runtime output in working case :*

```
DBGvpp# show runtime rtb-vpp-epoll-process
 Name State Calls  Vectors
   *Suspends* Clocks   Vectors/Call
rtb-vpp-epoll-process   any wait 0   0
 *192246*  1.91e60.00
DBGvpp#

DBGvpp# show runtime rtb-vpp-epoll-process
 Name State Calls  Vectors
   *Suspends* Clocks   Vectors/Call
rtb-vpp-epoll-process   any wait 0   0
 *193634*  1.89e60.00
DBGvpp#

```

*show runtime output in issue case :```*

DBGvpp# show runtime rtb-vpp-epoll-process
 Name State Calls  Vectors
   *Suspends* Clocks   Vectors/Call
rtb-vpp-epoll-process   any wait 0
  0   *81477*  7.08e60.00

DBGvpp# show runtime rtb-vpp-epoll-process
 Name State Calls  Vectors
   *Suspends *Clocks   Vectors/Call
rtb-vpp-epoll-process   any wait 0
  0   *81477*  7.08e60.00

*```*

Other process nodes like lldp-process,
ip4-neighbor-age-process, ip6-ra-process running without any issue. only
"rtb-vpp-epoll-process" process node suspended forever.

Please let me know if any additional information is required.

Hi Jinsh,
Thanks for pointing me to the issue you faced. The issue I am facing looks
similar.
I will verify with the given patch.

Thanks and Regards,
Sudhir

On Sun, Feb 19, 2023 at 6:19 AM jinsh11  wrote:

> HI:
>
>-
>
>I have the same problem,
>bfd process node stop running. I raised this issue,
>
> https://lists.fd.io/g/vpp-dev/message/22380
> I think there is a problem with the porcess scheduling module when using
> the time wheel.
>
> 
>
>

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22604): https://lists.fd.io/g/vpp-dev/message/22604
Mute This Topic: https://lists.fd.io/mt/97032803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] process node suspended indefinitely

2023-02-18 Thread jinsh11
HI:

* 

I have the same problem,

bfd process node stop running. I raised this issue,

https://lists.fd.io/g/vpp-dev/message/22380
I think there is a problem with the porcess scheduling module when using the 
time wheel.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22603): https://lists.fd.io/g/vpp-dev/message/22603
Mute This Topic: https://lists.fd.io/mt/97032803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] process node suspended indefinitely

2023-02-18 Thread Dave Barach
Process is a bit of a misnomer, “cooperative multitasking thread” would be more 
accurate. Src/vlib/main.c makes no effort to interrupt process nodes. If a 
process runs forever vpp will have a bad day.

 

Recitations: which version of vpp is involved? Please send “show run” output 
for the process node in question.

 

The process might be waiting on an event which never happens, or for the clock 
to reach a time so far in the future it won’t happen in one’s lifetime. Worse 
luck might involve memory corruption or an issue with the timer wheel code.

 

HTH... Dave 

 

From: vpp-dev@lists.fd.io  On Behalf Of Sudhir CR via 
lists.fd.io
Sent: Friday, February 17, 2023 12:12 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] process node suspended indefinitely

 

Hi Team,

We have a process node. which we will use to do some control plane related 
activity. Sometimes we observe that this process node is suspended  
indefinitely.

 

I know that if any process node is taking Unreasonably long time such nodes 
will  not be scheduled further. But not able to figure out in code where this 
is done.

 

Can anyone point me to the code where we are tracking time taken by each 
process node and suspend indefinitely if it is consuming more time.

 

Thanks and regards,

Sudhir

 

NOTICE TO RECIPIENT This e-mail message and any attachments are confidential 
and may be privileged. If you received this e-mail in error, any review, use, 
dissemination, distribution, or copying of this e-mail is strictly prohibited. 
Please notify us immediately of the error by return e-mail and please delete 
this message from your system. For more information about Rtbrick, please visit 
us at www.rtbrick.com  


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22601): https://lists.fd.io/g/vpp-dev/message/22601
Mute This Topic: https://lists.fd.io/mt/97032803/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-