Re: [vpp-dev] request-response between vlib processes

2022-09-13 Thread Florin Coras
Hi Vratko,

> On Sep 13, 2022, at 5:03 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
> at Cisco) via lists.fd.io  wrote:
> 
> In general, most of “communication” between VPP components
> is done by directly calling C functions,
> so it makes sense avf_flag_change is being called within vl_api_clnt_process 
> process.
> It is avf_process_request (called directly by avf_flag_change)
> that decides to hand-off the request to avf_process process for async 
> handling,
> so it should make sure to resume the API process correctly upon the response.
> 
> > just to set a mac address? 
>  
> In my particular test the async operation switches promiscuous mode on an 
> interface,
> but I guess it does not really matter what a particular operation does.
> What matters is there is a synchronous API call (l2_patch_add_del in my test)
> which only indirectly causes an asynchronous operation (as the interface uses 
> AVF driver).
>  

Didn’t have an issue with how the api ends up calling avf_process_request. I 
was just wondering why we ended up needing such a complicated procedure to 
apply what looked like simple updates.

> > Do we really need to block the binary api 
>  
> The l2_patch_add_del does block.
> Especially in the “del” case, the subsequent API calls
> need to know whether the interface is gone yet or not.

I’m pretty sure we could mark things as down and program an asyc cleanup from 
within the avf layer. That is, if async is necessary, for deletes we should be 
able to provide a return code as soon as we find that the device/state exists 
and program the removal.

But for adds, it would be good if we could avoid suspending the current process 
in avf because it can’t know all the ways in which the calling process could be 
signaled. 

>  
> > pass opaques in requests
>  
> As usual, there are several ways to make it work,
> we just need to pick one (and put an example usage into the docs).

And I believe that’s what we’re discussing here :-) 

Florin

>  
> Vratko.
>  
> From: vpp-dev@lists.fd.io   > On Behalf Of Florin Coras
> Sent: Monday, 2022-September-12 23:11
> To: vpp-dev@lists.fd.io 
> Subject: Re: [vpp-dev] request-response between vlib processes
>  
> Hi Vratko, 
>  
> Do we really need to block the binary api waiting for a reply from another 
> vpp process just to set a mac address? 
>  
> If setting up the mac (or similar) cannot be done synchronously, probably api 
> handlers should hand over all those requests to another vpp process, 
> vl_api_async_req_process, that takes care of async execution and generation 
> of api replies. You could also pass opaques in requests and maybe expect 
> backends, like avf_process, to bounce that opaques back for demuxing. 
>  
> Regards,
> Florin
> 
> 
> On Sep 12, 2022, at 4:49 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
> at Cisco) vialists.fd.io   > wrote:
>  
> [resending to the correct vpp-dev e-mail address]
>  
> Short version:
> Vratko would appreciate something like 
> vlib_current_process_wait_for_one_time_event_or_clock.
>  
> Medium version:
> One instance of request-response interaction between vlib processes had a bug 
> [11].
> Vratko contributed a fix [9] for the immediate issue,
> but the proper fix was left hinted in TODOs (and discussed in the long 
> version).
>  
> Long version:
>  
> Vlib supports processes and signals, see corresponding sections in the docs 
> [7].
> Using the actor model vocabulary, a (vlib) process is an actor,
> and (vlib) signaling a (vlib) event means sending a message between actors.
> There is no vlib name for actor behavior [10].
>  
> The typical use of event signaling in VPP is “fire and forget”,
> meaning a “request” without any need to respond.
> That means a typical process has just one behavior;
> the side effects of a process are given by event type (and data),
> not directly by the sequence of previous events received.
>  
> But there is an exception (and in future there may be more).
> The process avf_process, when handling AVF_PROCESS_EVENT_REQ
> and detecting that was signaled by some other process,
> it signals back a “response” event.
> The main reason is that some operations may take unreasonably long time,
> and we prefer VPP to crash there (instead of getting stuck)
> so we can see the backtrace.
>  
> A typical process that signaled AVF_PROCESS_EVENT_REQ is vl_api_clnt_process,
> whose loop usually handles SOCKET_READ_EVENT events.
> I mean, this socket API handling process has no idea about AVF plugin 
> specific needs,
> but it can call avf_process_request function which (upon detecting it is not 
> called
> from avf_process process) does the signaling and waiting.
>  
> But this means vl_api_clnt_process is the first process (that I know of) with 
> two behaviors.
> The first one focuses on handling new API messages,
> the second 

Re: [vpp-dev] benchmark for a patch

2022-09-13 Thread Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
> a specific patch which is not merged?

It depends whether the unmerged patch is for VPP or CSIT (or you have two).

1. If VPP code is merged, various csit-verify-perf-* jobs can be triggered
using csit-{node}-{arch}-perftest trigger word in CSIT Gerrit comment.
Example: [0].
The VPP version used is specified in the CSIT code in 
VPP_STABLE_VER_UBUNTU_JAMMY,
and it must be available in packagecloud.

2. If CSIT code is merged (needs to be in the latest oper branch),
vpp-csit-perf-* jobs can be triggered using perftest-{node}-{arch} word
in VPP Gerrit comment. Example: [1]. More detailed documentation: [2].

3. If both VPP and CSIT code is unmerged, vpp-csit-verify-perf-* job
can be started manually in Jenkins WebUI, but only if you have access rights 
for that.
Try if this [3] link lets you (after logging into Jenkins) specify CSIT_REF to 
be used.
Not sure how to get access rights nowadays, I assume it can be requested 
somewhere here [4].

Vratko.

[0] 
https://gerrit.fd.io/r/c/csit/+/36787/1#message-ead2f6d770c8d741e5bf4933e5bf215702550467
[1] 
https://gerrit.fd.io/r/c/vpp/+/36707/6#message-b71fe7ef4d8f54a7c478f90aa5b50a9d6fd6718f
[2] 
https://github.com/FDio/csit/blob/8d85a976dc8bd2c960044b32e830eb97f63f5ffe/docs/report/introduction/methodology_rca/methodology_perpatch_performance_tests.rst
[3] 
https://jenkins.fd.io/view/vpp/job/vpp-csit-verify-perf-master-ubuntu2204-x86_64-2n-icx/build?delay=0sec
[4] http://support.linuxfoundation.org/

From: vpp-dev@lists.fd.io  On Behalf Of Stanislav Zaikin
Sent: Thursday, 2022-September-01 17:19
To: vpp-dev 
Subject: [vpp-dev] benchmark for a patch

Hello folks,

Is it possible to trigger a benchmark (csit, only 1 test case) for a specific 
patch which is not merged?

--
Best regards
Stanislav Zaikin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21874): https://lists.fd.io/g/vpp-dev/message/21874
Mute This Topic: https://lists.fd.io/mt/93397909/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] request-response between vlib processes

2022-09-13 Thread Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
In general, most of “communication” between VPP components
is done by directly calling C functions,
so it makes sense avf_flag_change is being called within vl_api_clnt_process 
process.
It is avf_process_request (called directly by avf_flag_change)
that decides to hand-off the request to avf_process process for async handling,
so it should make sure to resume the API process correctly upon the response.

> just to set a mac address?

In my particular test the async operation switches promiscuous mode on an 
interface,
but I guess it does not really matter what a particular operation does.
What matters is there is a synchronous API call (l2_patch_add_del in my test)
which only indirectly causes an asynchronous operation (as the interface uses 
AVF driver).

> Do we really need to block the binary api

The l2_patch_add_del does block.
Especially in the “del” case, the subsequent API calls
need to know whether the interface is gone yet or not.

> pass opaques in requests

As usual, there are several ways to make it work,
we just need to pick one (and put an example usage into the docs).

Vratko.

From: vpp-dev@lists.fd.io  On Behalf Of Florin Coras
Sent: Monday, 2022-September-12 23:11
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] request-response between vlib processes

Hi Vratko,

Do we really need to block the binary api waiting for a reply from another vpp 
process just to set a mac address?

If setting up the mac (or similar) cannot be done synchronously, probably api 
handlers should hand over all those requests to another vpp process, 
vl_api_async_req_process, that takes care of async execution and generation of 
api replies. You could also pass opaques in requests and maybe expect backends, 
like avf_process, to bounce that opaques back for demuxing.

Regards,
Florin


On Sep 12, 2022, at 4:49 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
at Cisco) via lists.fd.io 
mailto:vrpolak=cisco@lists.fd.io>> wrote:

[resending to the correct vpp-dev e-mail address]

Short version:
Vratko would appreciate something like 
vlib_current_process_wait_for_one_time_event_or_clock.

Medium version:
One instance of request-response interaction between vlib processes had a bug 
[11].
Vratko contributed a fix [9] for the immediate issue,
but the proper fix was left hinted in TODOs (and discussed in the long version).

Long version:

Vlib supports processes and signals, see corresponding sections in the docs [7].
Using the actor model vocabulary, a (vlib) process is an actor,
and (vlib) signaling a (vlib) event means sending a message between actors.
There is no vlib name for actor behavior [10].

The typical use of event signaling in VPP is “fire and forget”,
meaning a “request” without any need to respond.
That means a typical process has just one behavior;
the side effects of a process are given by event type (and data),
not directly by the sequence of previous events received.

But there is an exception (and in future there may be more).
The process avf_process, when handling AVF_PROCESS_EVENT_REQ
and detecting that was signaled by some other process,
it signals back a “response” event.
The main reason is that some operations may take unreasonably long time,
and we prefer VPP to crash there (instead of getting stuck)
so we can see the backtrace.

A typical process that signaled AVF_PROCESS_EVENT_REQ is vl_api_clnt_process,
whose loop usually handles SOCKET_READ_EVENT events.
I mean, this socket API handling process has no idea about AVF plugin specific 
needs,
but it can call avf_process_request function which (upon detecting it is not 
called
from avf_process process) does the signaling and waiting.

But this means vl_api_clnt_process is the first process (that I know of) with 
two behaviors.
The first one focuses on handling new API messages,
the second one focuses on handling the AVF response (especially the lack 
thereof in time).
As clib_panic is called when the response does not arrive,
(and I hope there are never two requests at the same time)
the first behavior never encounters the AVF response.
But the second behavior can encounter SOCKET_READ_EVENT.
The VPP-2033 [11] bug is what happens in that case.

A minor issue is that the “response” event is defined just by
event type being zero, which would not work in (hypothetical future) scenarios
when a single process waits for two different responses.

Reading through node_funcs.h I found 
vlib_current_process_wait_for_one_time_event [12],
which looks suited for waiting for “single response” events,
but it lacks the time awareness of vlib_process_wait_for_event_or_clock.
If we had something like vlib_current_process_wait_for_one_time_event_or_clock
(and its example usage in the docs), handling the response would become easier.

Vratko.

[7] 
https://github.com/FDio/vpp/blob/9ad39c026c8a3c945a7003c4aa4f5cb1d4c80160/docs/developer/corearchitecture/vlib.rst
[9] https://gerrit.fd.io/r/c/vpp/+/37083
[10]