Re: [vpp-dev] Segmentation fault in VPP 20.05 release when using VCL VPPCOM_PROTO_UDPC #vpp-hoststack

2020-05-16 Thread Florin Coras
Hi Raj, 

Inline.

> On May 16, 2020, at 2:30 PM, Raj Kumar  wrote:
> 
>  Hi Florin,
> 
> I am using VPP 20.05 rc0 . Should I upgrade it ? 

FC: Not necessarily, as long as it’s relatively recent, i.e., it includes all 
of the recent udp updates. 

> 
> Thanks! for providing the patch, i will try it on Monday. Actually, I am 
> testing in a controlled environment where I can not change the VPP libraries. 
> I will try it on my server.

FC: Sounds good. Let me know how it goes!

> 
>  On the UDP connection; yes, the error  EINPROGRESS was there because I am 
> using non-blocking connection. Now, I am ignoring this error. 
>  Sometime , VPP crashes when I kills my application ( not gracefully) even 
> when there is  a single connection .

FC: That might have to do with the app dying such that 1) it does not detach 
from vpp (e.g., sigkill and atexit function is not executed) 2) it dies with 
the message queue mutex held and 3) vpp tries to enqueue more events before 
detecting that it crashed (~30s).

> 
> The good part is that now I am able to move connections on different cores by 
> connecting it on receiving the first packet and then re-binding the socket to 
> listen.
> Basically, this approach works but I have not tested it thoroughly. However , 
> I am still in favor of using the UDPC connection.

FC: If you have enough logic in your app to emulate a handshake, i.e., always 
have the client send a few bytes and wait for a reply from the server before 
opening a new connection, then this approach is probably more flexible from 
core placement perspective.

The patch tries to emulate the old udpc with udp (udpc in vpp was confusing for 
consumers). You might get away with listening from multiple vcl workers on the 
same udp ip:port pair and have vpp load balance accepts between them, but I’ve 
never tested that. You can do this only with udp listeners that have been 
initialized as connected (so only with the patch). 

> 
> Btw, in trace logs I see some ssvm_delete related error when re-binding the 
> connection.

FC: I think it’s fine. Going over the interactions step by step to see if 
understand what you’re doing (and hopefully help you understand what vpp does 
underneath). 

> 
> vpp# sh session verbose
> ConnectionState  Rx-f  
> Tx-f
> [0:0][U] 2001:5b0::700:b883:31f:29e:9880:6677-LISTEN 0 0
> Thread 0: active sessions 1
> 
> ConnectionState  Rx-f  
> Tx-f
> [1:0][U] 2001:5b0::700:b883:31f:29e:9880:6677-OPENED 0 0
> Thread 1: active sessions 1
> 
> ConnectionState  Rx-f  
> Tx-f
> [2:0][U] 2001:5b0::700:b883:31f:29e:9880:6677-OPENED 0 0
> Thread 2: active sessions 1
> Thread 3: no sessions
> Thread 4: no sessions
> 
> VCL<24434>: configured VCL debug level (2) from VCL_DEBUG!
> VCL<24434>: using default heapsize 268435456 (0x1000)
> VCL<24434>: allocated VCL heap = 0x7f7f18d1b010, size 268435456 (0x1000)
> VCL<24434>: using default configuration.
> vppcom_connect_to_vpp:487: vcl<24434:0>: app (udp6_rx) connecting to VPP api 
> (/vpe-api)...
> vppcom_connect_to_vpp:502: vcl<24434:0>: app (udp6_rx) is connected to VPP!
> vppcom_app_create:1200: vcl<24434:0>: sending session enable
> vppcom_app_create:1208: vcl<24434:0>: sending app attach
> vppcom_app_create:1217: vcl<24434:0>: app_name 'udp6_rx', my_client_index 0 
> (0x0)

FC: Added worker 0

> vppcom_connect_to_vpp:487: vcl<24434:1>: app (udp6_rx-wrk-1) connecting to 
> VPP api (/vpe-api)...
> vppcom_connect_to_vpp:502: vcl<24434:1>: app (udp6_rx-wrk-1) is connected to 
> VPP!
> vcl_worker_register_with_vpp:262: vcl<24434:1>: added worker 1

> vl_api_app_worker_add_del_reply_t_handler:235: vcl<94:-1>: worker 1 
> vpp-worker 1 added

FC: Adding worker 1

> vppcom_epoll_create:2558: vcl<24434:1>: Created vep_idx 0
> vppcom_session_create:1279: vcl<24434:1>: created session 1
> vppcom_session_bind:1426: vcl<24434:1>: session 1 handle 16777217: binding to 
> local IPv6 address 2001:5b0::700:b883:31f:29e:9880 port 6677, proto UDP
> vppcom_session_listen:1458: vcl<24434:1>: session 16777217: sending vpp 
> listen request...
> vcl_session_bound_handler:607: vcl<24434:1>: session 1 [0x0]: listen 
> succeeded!
> vppcom_epoll_ctl:2658: vcl<24434:1>: EPOLL_CTL_ADD: vep_sh 16777216, sh 
> 16777217, events 0x1, data 0x!

FC: Listened on session 1 and added it to epoll session 0

>  udpRxThread started!!!  ... rx port  = 6677
> Waiting for a client to connect on port 6677 ...
> vppcom_session_connect:1742: vcl<24434:1>: session handle 16777217 
> (STATE_CLOSED): connecting to peer IPv6 2001:5b0::700:b883:31f:29e:9886 
> port 40300 proto UDP

FC: I guess at this point you got data on the listener so you now try to 
connect it to the peer. 

> vppcom_epoll_ctl:2696: vcl<24434:1>: EPOLL_CTL_MOD: vep_sh 16777216, sh 
> 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Ole Troan
>>> API messages in network byte order made sense 10 years ago when I worked
>>> with a mixed x86_64 / ppc32 system. As Damjan points out, API
>>> interoperability between big-endian and little-endian systems is a boutique
>>> use-case these days.
>>> 
>>> Timing is key. We won’t be able to cherry-pick API message handler fixes
>>> across an endian-order flag-day. If we decide to switch to native byte
>>> order, we’d better switch right before we pull our next LTS release.
>> 
>> Do I understand it right that the idea here would be to flip the
>> endianness right before pulling stable/2009 branch ?
> 
> If this is done right before (or near) the branch pull for 2009 then I think 
> the release candidate should be left to bake longer than normal, and API 
> users encouraged to port/test to the release candidate. Obviously any bugs 
> found would then be fixed on the RC. Maybe ask for a couple/few stakeholders 
> to sign up to do this and then wait a reasonable amount of time for them to 
> sign-off?
> 
> I know we use the binary APIs, I believe Netgate does as well. I'm sure there 
> are others too (might be good to collect a list of these folks if one doesn't 
> exist yet).
> 
> If these changes were coupled with the (more important IMO) change i've been 
> (ceaselessly :) asking for in the API (non-callback sync functions) I would 
> be more than willing to sign up for this.

I am a little concerned about the benefits of switching the endian 
representation in the wire format.
The endianness and representation on the wire should not be exposed to the 
client-side or VPP API handler code.
The language bindings I am familiar with hide endian issues from the user code. 
Python and Go. I think VAPI does too.

VPP API action handlers do currently need to deal with endianness. That should 
be a relatively simple change.
I can add a new flag in the message definition in the API, and then wrap those 
action handlers with endian functions. Then we can avoid having VPP developers 
manually do endian conversions for API functions.

The current on-the-wire representation is that of packed C structures, and we 
are somewhat a victim of that. E.g. how the compiler puts various members of a 
union in memory is as far as I know not part of the C standard.
I would suggests we move towards isolating user-side and VPP-side code from 
endianess and encoding (and we can do that evolutionary) and then consider if 
we can move towards a better specified encoding format. If that's CBOR, 
Flatbuffers or something else.

Christian for your VAPI C callback issue, that I believe is purely under the 
control of the language binding.
Can I volunteer you ti fix that?

Best regards,
Ole-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16435): https://lists.fd.io/g/vpp-dev/message/16435
Mute This Topic: https://lists.fd.io/mt/74228149/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Segmentation fault in VPP 20.05 release when using VCL VPPCOM_PROTO_UDPC #vpp-hoststack

2020-05-16 Thread Raj Kumar
 Hi Florin,

I am using VPP 20.05 rc0 . Should I upgrade it ?

Thanks! for providing the patch, i will try it on Monday. Actually, I am
testing in a controlled environment where I can not change the VPP
libraries. I will try it on my server.

 On the UDP connection; yes, the error  EINPROGRESS was there because I am
using non-blocking connection. Now, I am ignoring this error.
 Sometime , VPP crashes when I kills my application ( not gracefully) even
when there is  a single connection .

The good part is that now I am able to move connections on different cores
by connecting it on receiving the first packet and then re-binding the
socket to listen.
Basically, this approach works but I have not tested it thoroughly. However
, I am still in favor of using the UDPC connection.

Btw, in trace logs I see some ssvm_delete related error when re-binding the
connection.

vpp# sh session verbose
ConnectionState  Rx-f
 Tx-f
[0:0][U] 2001:5b0::700:b883:31f:29e:9880:6677-LISTEN 0 0
Thread 0: active sessions 1

ConnectionState  Rx-f
 Tx-f
[1:0][U] 2001:5b0::700:b883:31f:29e:9880:6677-OPENED 0 0
Thread 1: active sessions 1

ConnectionState  Rx-f
 Tx-f
[2:0][U] 2001:5b0::700:b883:31f:29e:9880:6677-OPENED 0 0
Thread 2: active sessions 1
Thread 3: no sessions
Thread 4: no sessions

VCL<24434>: configured VCL debug level (2) from VCL_DEBUG!
VCL<24434>: using default heapsize 268435456 (0x1000)
VCL<24434>: allocated VCL heap = 0x7f7f18d1b010, size 268435456 (0x1000)
VCL<24434>: using default configuration.
vppcom_connect_to_vpp:487: vcl<24434:0>: app (udp6_rx) connecting to VPP
api (/vpe-api)...
vppcom_connect_to_vpp:502: vcl<24434:0>: app (udp6_rx) is connected to VPP!
vppcom_app_create:1200: vcl<24434:0>: sending session enable
vppcom_app_create:1208: vcl<24434:0>: sending app attach
vppcom_app_create:1217: vcl<24434:0>: app_name 'udp6_rx', my_client_index 0
(0x0)
vppcom_connect_to_vpp:487: vcl<24434:1>: app (udp6_rx-wrk-1) connecting to
VPP api (/vpe-api)...
vppcom_connect_to_vpp:502: vcl<24434:1>: app (udp6_rx-wrk-1) is connected
to VPP!
vcl_worker_register_with_vpp:262: vcl<24434:1>: added worker 1
vl_api_app_worker_add_del_reply_t_handler:235: vcl<94:-1>: worker 1
vpp-worker 1 added
vppcom_epoll_create:2558: vcl<24434:1>: Created vep_idx 0
vppcom_session_create:1279: vcl<24434:1>: created session 1
vppcom_session_bind:1426: vcl<24434:1>: session 1 handle 16777217: binding
to local IPv6 address 2001:5b0::700:b883:31f:29e:9880 port 6677, proto
UDP
vppcom_session_listen:1458: vcl<24434:1>: session 16777217: sending vpp
listen request...
vcl_session_bound_handler:607: vcl<24434:1>: session 1 [0x0]: listen
succeeded!
vppcom_epoll_ctl:2658: vcl<24434:1>: EPOLL_CTL_ADD: vep_sh 16777216, sh
16777217, events 0x1, data 0x!
 udpRxThread started!!!  ... rx port  = 6677
Waiting for a client to connect on port 6677 ...
vppcom_session_connect:1742: vcl<24434:1>: session handle 16777217
(STATE_CLOSED): connecting to peer IPv6 2001:5b0::700:b883:31f:29e:9886
port 40300 proto UDP
vppcom_epoll_ctl:2696: vcl<24434:1>: EPOLL_CTL_MOD: vep_sh 16777216, sh
16777217, events 0x2011, data 0x101!
vppcom_session_create:1279: vcl<24434:1>: created session 2
vppcom_session_bind:1426: vcl<24434:1>: session 2 handle 16777218: binding
to local IPv6 address 2001:5b0::700:b883:31f:29e:9880 port 6677, proto
UDP
vppcom_session_listen:1458: vcl<24434:1>: session 16777218: sending vpp
listen request...
vcl_session_app_add_segment_handler:855: vcl<24434:1>: mapped new segment
'24177-2' size 134217728
vcl_session_connected_handler:505: vcl<24434:1>: session 1 [0x1]
connected! rx_fifo 0x224051c80, refcnt 1, tx_fifo 0x224051b80, refcnt 1
vcl_session_app_add_segment_handler:855: vcl<24434:1>: mapped new segment
'24177-3' size 134217728
vcl_session_bound_handler:607: vcl<24434:1>: session 2 [0x0]: listen
succeeded!
vppcom_epoll_ctl:2658: vcl<24434:1>: EPOLL_CTL_ADD: vep_sh 16777216, sh
16777218, events 0x1, data 0x!
vcl_session_migrated_handler:674: vcl<24434:1>: Migrated 0x1 to
thread 2 0x2
 new connecton
vppcom_session_connect:1742: vcl<24434:1>: session handle 16777218
(STATE_CLOSED): connecting to peer IPv6 2001:5b0::700:b883:31f:29e:9886
port 60725 proto UDP
vppcom_epoll_ctl:2696: vcl<24434:1>: EPOLL_CTL_MOD: vep_sh 16777216, sh
16777218, events 0x2011, data 0x102!
vppcom_session_create:1279: vcl<24434:1>: created session 3
vppcom_session_bind:1426: vcl<24434:1>: session 3 handle 16777219: binding
to local IPv6 address 2001:5b0::700:b883:31f:29e:9880 port 6677, proto
UDP
vppcom_session_listen:1458: vcl<24434:1>: session 16777219: sending vpp
listen request...
*ssvm_delete_shm:205: unlink segment '24177-3': No such file or directory
(errno 2)*
vcl_segment_detach:467: vcl<24434:1>: 

Re: [vpp-dev] Segmentation fault in VPP 20.05 release when using VCL VPPCOM_PROTO_UDPC #vpp-hoststack

2020-05-16 Thread Florin Coras
Hi Raj, 

Assuming you are trying to open more than one connected udp session, does this 
[1] solve the problem (note it's untested)?

To reproduce legacy behavior, this allows you to listen on VPPCOM_PROTO_UDPC 
but that is now converted by vcl into a udp listen that propagates with a 
“connected” flag to vpp. That should result in a udp listener that behaves like 
an “old” udpc listener. 

Regards,
Florin

[1] https://gerrit.fd.io/r/c/vpp/+/27111

> On May 16, 2020, at 10:56 AM, Florin Coras via lists.fd.io 
>  wrote:
> 
> Hi Raj, 
> 
> Are you using master latest/20.05 rc1 or something older? The fact that 
> you’re getting a -115 (EINPROGRESS) suggests you might’ve marked the 
> connection as “non-blocking” although you created it as blocking. If that’s 
> so, the return value is not an error. 
> 
> Also, how if vpp crashing? Are you by chance trying to open a lot of udp 
> connections back to back? 
> 
> Regards,
> Florin
> 
>> On May 16, 2020, at 10:23 AM, Raj Kumar > > wrote:
>> 
>> Hi Florin,
>> I tried to connect on receiving the first UDP packet . But, it did not work. 
>> I am getting error -115 in the application and VPP is crashing.
>> 
>> This is something I tried in the code (udp receiver) -
>> sockfd = vppcom_session_create(VPPCOM_PROTO_UDP, 0);
>> rv_vpp = vppcom_session_bind (sockfd, );
>> if (FD_ISSET(session_idx, ))
>> {
>> n = vppcom_session_recvfrom(sockfd, (char *)buffer, MAXLINE, 0, );
>> if(first_pkt) 
>> rv_vpp = vppcom_session_connect (sockfd, );
>> //Here getting rv_vpp as -115 
>> }
>> Please let me know if I am doing something wrong.
>> 
>> Here are the traces - 
>> 
>> VCL<16083>: configured VCL debug level (2) from VCL_DEBUG!
>> VCL<16083>: using default heapsize 268435456 (0x1000)
>> VCL<16083>: allocated VCL heap = 0x7fd255ed2010, size 268435456 (0x1000)
>> VCL<16083>: using default configuration.
>> vppcom_connect_to_vpp:487: vcl<16083:0>: app (udp6_rx) connecting to VPP api 
>> (/vpe-api)...
>> vppcom_connect_to_vpp:502: vcl<16083:0>: app (udp6_rx) is connected to VPP!
>> vppcom_app_create:1200: vcl<16083:0>: sending session enable
>> vppcom_app_create:1208: vcl<16083:0>: sending app attach
>> vppcom_app_create:1217: vcl<16083:0>: app_name 'udp6_rx', my_client_index 0 
>> (0x0)
>> 
>> vppcom_connect_to_vpp:487: vcl<16083:1>: app (udp6_rx-wrk-1) connecting to 
>> VPP api (/vpe-api)...
>> vppcom_connect_to_vpp:502: vcl<16083:1>: app (udp6_rx-wrk-1) is connected to 
>> VPP!
>> vl_api_app_worker_add_del_reply_t_handler:235: vcl<94:-1>: worker 1 
>> vpp-worker 1 added
>> vcl_worker_register_with_vpp:262: vcl<16083:1>: added worker 1
>> vppcom_session_create:1279: vcl<16083:1>: created session 0
>> vppcom_session_bind:1426: vcl<16083:1>: session 0 handle 16777216: binding 
>> to local IPv6 address 2001:5b0::700:b883:31f:29e:9880 port 6677, proto 
>> UDP
>> vppcom_session_listen:1458: vcl<16083:1>: session 16777216: sending vpp 
>> listen request...
>> vcl_session_bound_handler:607: vcl<16083:1>: session 0 [0x0]: listen 
>> succeeded!
>> vppcom_session_connect:1742: vcl<16083:1>: session handle 16777216 
>> (STATE_CLOSED): connecting to peer IPv6 2001:5b0::700:b883:31f:29e:9886 
>> port 51190 proto UDP
>>  udpRxThread started!!!  ... rx port  = 6677vppcom_session_connect() failed 
>> ... -115
>> vcl_session_cleanup:1300: vcl<16083:1>: session 0 [0x0] closing
>> vcl_worker_cleanup_cb:190: vcl<94:-1>: cleaned up worker 1
>> vl_client_disconnect:309: peer unresponsive, give up
>> 
>> thanks,
>> -Raj
>> 
>> 
>> On Fri, May 15, 2020 at 8:10 PM Florin Coras > > wrote:
>> Hi Raj, 
>> 
>> There are no explicit vcl apis that allow a udp listener to be switched to 
>> connected mode. We might decide to do this at one point through a new bind 
>> api (non-posix like) since we do support this for builtin applications. 
>> 
>> However, you now have the option of connecting a bound session. That is, on 
>> the first received packet on a udp listener, you can grab the peer’s address 
>> and connect it. Iperf3 in udp mode, which is part of our make test infra, 
>> does exactly that. Subsequently, it re-binds the port to accept more 
>> connections. Would that work for you?
>> 
>> Regards, 
>> Florin
>> 
>>> On May 15, 2020, at 4:06 PM, Raj Kumar >> > wrote:
>>> 
>>> Thanks! Florin,
>>> 
>>> OK, I understood that I need to change my application to use UDP socket and 
>>> then use vppcom_session_connect(). 
>>> This is fine for the UDP client ( sender) .
>>> 
>>> But ,in  UDP Server ( receiver) , I am not sure how to use the 
>>> vppcom_session_connect(). .
>>> I am using vppcom_session_listen() to listen on the connections and then 
>>> calling vppcom_session_accept() to accept a new connection.
>>> 
>>> With UDPC, I was able to utilize the RSS ( receiver side scaling) feature 
>>> to move the received connections on the different cores /threads.
>>> 
>>> Just 

Re: [vpp-dev] Segmentation fault in VPP 20.05 release when using VCL VPPCOM_PROTO_UDPC #vpp-hoststack

2020-05-16 Thread Florin Coras
Hi Raj, 

Are you using master latest/20.05 rc1 or something older? The fact that you’re 
getting a -115 (EINPROGRESS) suggests you might’ve marked the connection as 
“non-blocking” although you created it as blocking. If that’s so, the return 
value is not an error. 

Also, how if vpp crashing? Are you by chance trying to open a lot of udp 
connections back to back? 

Regards,
Florin

> On May 16, 2020, at 10:23 AM, Raj Kumar  wrote:
> 
> Hi Florin,
> I tried to connect on receiving the first UDP packet . But, it did not work. 
> I am getting error -115 in the application and VPP is crashing.
> 
> This is something I tried in the code (udp receiver) -
> sockfd = vppcom_session_create(VPPCOM_PROTO_UDP, 0);
> rv_vpp = vppcom_session_bind (sockfd, );
> if (FD_ISSET(session_idx, ))
> {
> n = vppcom_session_recvfrom(sockfd, (char *)buffer, MAXLINE, 0, );
> if(first_pkt) 
> rv_vpp = vppcom_session_connect (sockfd, );
> //Here getting rv_vpp as -115 
> }
> Please let me know if I am doing something wrong.
> 
> Here are the traces - 
> 
> VCL<16083>: configured VCL debug level (2) from VCL_DEBUG!
> VCL<16083>: using default heapsize 268435456 (0x1000)
> VCL<16083>: allocated VCL heap = 0x7fd255ed2010, size 268435456 (0x1000)
> VCL<16083>: using default configuration.
> vppcom_connect_to_vpp:487: vcl<16083:0>: app (udp6_rx) connecting to VPP api 
> (/vpe-api)...
> vppcom_connect_to_vpp:502: vcl<16083:0>: app (udp6_rx) is connected to VPP!
> vppcom_app_create:1200: vcl<16083:0>: sending session enable
> vppcom_app_create:1208: vcl<16083:0>: sending app attach
> vppcom_app_create:1217: vcl<16083:0>: app_name 'udp6_rx', my_client_index 0 
> (0x0)
> 
> vppcom_connect_to_vpp:487: vcl<16083:1>: app (udp6_rx-wrk-1) connecting to 
> VPP api (/vpe-api)...
> vppcom_connect_to_vpp:502: vcl<16083:1>: app (udp6_rx-wrk-1) is connected to 
> VPP!
> vl_api_app_worker_add_del_reply_t_handler:235: vcl<94:-1>: worker 1 
> vpp-worker 1 added
> vcl_worker_register_with_vpp:262: vcl<16083:1>: added worker 1
> vppcom_session_create:1279: vcl<16083:1>: created session 0
> vppcom_session_bind:1426: vcl<16083:1>: session 0 handle 16777216: binding to 
> local IPv6 address 2001:5b0::700:b883:31f:29e:9880 port 6677, proto UDP
> vppcom_session_listen:1458: vcl<16083:1>: session 16777216: sending vpp 
> listen request...
> vcl_session_bound_handler:607: vcl<16083:1>: session 0 [0x0]: listen 
> succeeded!
> vppcom_session_connect:1742: vcl<16083:1>: session handle 16777216 
> (STATE_CLOSED): connecting to peer IPv6 2001:5b0::700:b883:31f:29e:9886 
> port 51190 proto UDP
>  udpRxThread started!!!  ... rx port  = 6677vppcom_session_connect() failed 
> ... -115
> vcl_session_cleanup:1300: vcl<16083:1>: session 0 [0x0] closing
> vcl_worker_cleanup_cb:190: vcl<94:-1>: cleaned up worker 1
> vl_client_disconnect:309: peer unresponsive, give up
> 
> thanks,
> -Raj
> 
> 
> On Fri, May 15, 2020 at 8:10 PM Florin Coras  > wrote:
> Hi Raj, 
> 
> There are no explicit vcl apis that allow a udp listener to be switched to 
> connected mode. We might decide to do this at one point through a new bind 
> api (non-posix like) since we do support this for builtin applications. 
> 
> However, you now have the option of connecting a bound session. That is, on 
> the first received packet on a udp listener, you can grab the peer’s address 
> and connect it. Iperf3 in udp mode, which is part of our make test infra, 
> does exactly that. Subsequently, it re-binds the port to accept more 
> connections. Would that work for you?
> 
> Regards, 
> Florin
> 
>> On May 15, 2020, at 4:06 PM, Raj Kumar > > wrote:
>> 
>> Thanks! Florin,
>> 
>> OK, I understood that I need to change my application to use UDP socket and 
>> then use vppcom_session_connect(). 
>> This is fine for the UDP client ( sender) .
>> 
>> But ,in  UDP Server ( receiver) , I am not sure how to use the 
>> vppcom_session_connect(). .
>> I am using vppcom_session_listen() to listen on the connections and then 
>> calling vppcom_session_accept() to accept a new connection.
>> 
>> With UDPC, I was able to utilize the RSS ( receiver side scaling) feature to 
>> move the received connections on the different cores /threads.
>> 
>> Just want to confirm if I can achieve the same with UDP.
>> 
>> I will change my application and will update you about the result.
>> 
>> Thanks,
>> -Raj
>> 
>> 
>> On Fri, May 15, 2020 at 5:17 PM Florin Coras > > wrote:
>> Hi Raj, 
>> 
>> We removed udpc transport in vpp. I’ll push a patch that removes it from vcl 
>> as well. 
>> 
>> Calling connect on a udp connection will give you connected semantics now. 
>> Let me know if that solves the issue for you.
>> 
>> Regards,
>> Florin
>> 
>> 
>>> On May 15, 2020, at 12:15 PM, Raj Kumar >> > wrote:
>>> 
>>> Hi,
>>> I am getting segmentation fault in VPP when using VCL 

Re: [vpp-dev] Segmentation fault in VPP 20.05 release when using VCL VPPCOM_PROTO_UDPC #vpp-hoststack

2020-05-16 Thread Raj Kumar
Hi Florin,
I tried to connect on receiving the first UDP packet . But, it did not
work. I am getting error -115 in the application and VPP is crashing.

This is something I tried in the code (udp receiver) -
sockfd = vppcom_session_create(VPPCOM_PROTO_UDP, 0);
rv_vpp = vppcom_session_bind (sockfd, );
if (FD_ISSET(session_idx, ))
{
n = vppcom_session_recvfrom(sockfd, (char *)buffer, MAXLINE, 0,
);
if(first_pkt)
rv_vpp = vppcom_session_connect (sockfd, );
//Here getting rv_vpp as -115
}
Please let me know if I am doing something wrong.

Here are the traces -

VCL<16083>: configured VCL debug level (2) from VCL_DEBUG!
VCL<16083>: using default heapsize 268435456 (0x1000)
VCL<16083>: allocated VCL heap = 0x7fd255ed2010, size 268435456 (0x1000)
VCL<16083>: using default configuration.
vppcom_connect_to_vpp:487: vcl<16083:0>: app (udp6_rx) connecting to VPP
api (/vpe-api)...
vppcom_connect_to_vpp:502: vcl<16083:0>: app (udp6_rx) is connected to VPP!
vppcom_app_create:1200: vcl<16083:0>: sending session enable
vppcom_app_create:1208: vcl<16083:0>: sending app attach
vppcom_app_create:1217: vcl<16083:0>: app_name 'udp6_rx', my_client_index 0
(0x0)

vppcom_connect_to_vpp:487: vcl<16083:1>: app (udp6_rx-wrk-1) connecting to
VPP api (/vpe-api)...
vppcom_connect_to_vpp:502: vcl<16083:1>: app (udp6_rx-wrk-1) is connected
to VPP!
vl_api_app_worker_add_del_reply_t_handler:235: vcl<94:-1>: worker 1
vpp-worker 1 added
vcl_worker_register_with_vpp:262: vcl<16083:1>: added worker 1
vppcom_session_create:1279: vcl<16083:1>: created session 0
vppcom_session_bind:1426: vcl<16083:1>: session 0 handle 16777216: binding
to local IPv6 address 2001:5b0::700:b883:31f:29e:9880 port 6677, proto
UDP
vppcom_session_listen:1458: vcl<16083:1>: session 16777216: sending vpp
listen request...
vcl_session_bound_handler:607: vcl<16083:1>: session 0 [0x0]: listen
succeeded!
vppcom_session_connect:1742: vcl<16083:1>: session handle 16777216
(STATE_CLOSED): connecting to peer IPv6 2001:5b0::700:b883:31f:29e:9886
port 51190 proto UDP
 udpRxThread started!!!  ... rx port  = 6677vppcom_session_connect() failed
... -115
vcl_session_cleanup:1300: vcl<16083:1>: session 0 [0x0] closing
vcl_worker_cleanup_cb:190: vcl<94:-1>: cleaned up worker 1
vl_client_disconnect:309: peer unresponsive, give up

thanks,
-Raj


On Fri, May 15, 2020 at 8:10 PM Florin Coras  wrote:

> Hi Raj,
>
> There are no explicit vcl apis that allow a udp listener to be switched to
> connected mode. We might decide to do this at one point through a new bind
> api (non-posix like) since we do support this for builtin applications.
>
> However, you now have the option of connecting a bound session. That is,
> on the first received packet on a udp listener, you can grab the peer’s
> address and connect it. Iperf3 in udp mode, which is part of our make test
> infra, does exactly that. Subsequently, it re-binds the port to accept more
> connections. Would that work for you?
>
> Regards,
> Florin
>
> On May 15, 2020, at 4:06 PM, Raj Kumar  wrote:
>
> Thanks! Florin,
>
> OK, I understood that I need to change my application to use UDP socket
> and then use vppcom_session_connect().
> This is fine for the UDP client ( sender) .
>
> But ,in  UDP Server ( receiver) , I am not sure how to use the
> vppcom_session_connect(). .
> I am using vppcom_session_listen() to listen on the connections and then
> calling vppcom_session_accept() to accept a new connection.
>
> With UDP*C,* I was able to utilize the RSS ( receiver side scaling)
> feature to move the received connections on the different cores /threads.
>
> Just want to confirm if I can achieve the same with UDP.
>
> I will change my application and will update you about the result.
>
> Thanks,
> -Raj
>
>
> On Fri, May 15, 2020 at 5:17 PM Florin Coras 
> wrote:
>
>> Hi Raj,
>>
>> We removed udpc transport in vpp. I’ll push a patch that removes it from
>> vcl as well.
>>
>> Calling connect on a udp connection will give you connected semantics
>> now. Let me know if that solves the issue for you.
>>
>> Regards,
>> Florin
>>
>>
>> On May 15, 2020, at 12:15 PM, Raj Kumar  wrote:
>>
>> Hi,
>> I am getting segmentation fault in VPP when using VCL VPPCOM_PROTO_UDP*C*
>> socket. This issue is observed with both UDP sender and UDP receiver
>> application.
>>
>> However, both UDP sender and receiver works fine with VPPCOM_PROTO_UDP.
>>
>> Here is the stack trace -
>>
>> (gdb) bt
>> #0  0x in ?? ()
>> #1  0x7775da59 in session_open_vc (app_wrk_index=1,
>> rmt=0x7fffb5e34cc0, opaque=0)
>> at
>> /usr/src/debug/vpp-20.05-rc0~748_g83d129837.x86_64/src/vnet/session/session.c:1217
>> #2  0x77779257 in session_mq_connect_handler (data=0x7fffb676e7a8)
>> at
>> /usr/src/debug/vpp-20.05-rc0~748_g83d129837.x86_64/src/vnet/session/session_node.c:138
>> #3  0x77780f48 in session_event_dispatch_ctrl
>> (elt=0x7fffb643f51c, wrk=0x7fffb650a640)
>> at
>> 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Christian Hopps


> On May 16, 2020, at 9:52 AM, Andrew  Yourtchenko  wrote:
> 
> On 5/16/20, Dave Barach (dbarach)  wrote:
>> API messages in network byte order made sense 10 years ago when I worked
>> with a mixed x86_64 / ppc32 system. As Damjan points out, API
>> interoperability between big-endian and little-endian systems is a boutique
>> use-case these days.
>> 
>> Timing is key. We won’t be able to cherry-pick API message handler fixes
>> across an endian-order flag-day. If we decide to switch to native byte
>> order, we’d better switch right before we pull our next LTS release.
> 
> Do I understand it right that the idea here would be to flip the
> endianness right before pulling stable/2009 branch ?

If this is done right before (or near) the branch pull for 2009 then I think 
the release candidate should be left to bake longer than normal, and API users 
encouraged to port/test to the release candidate. Obviously any bugs found 
would then be fixed on the RC. Maybe ask for a couple/few stakeholders to sign 
up to do this and then wait a reasonable amount of time for them to sign-off?

I know we use the binary APIs, I believe Netgate does as well. I'm sure there 
are others too (might be good to collect a list of these folks if one doesn't 
exist yet).

If these changes were coupled with the (more important IMO) change i've been 
(ceaselessly :) asking for in the API (non-callback sync functions) I would be 
more than willing to sign up for this.

Thanks,
Chris.

> 
> --a
> 
> 
>> 
>> FWIW... Dave
>> 
>> From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion
>> via lists.fd.io
>> Sent: Saturday, May 16, 2020 7:23 AM
>> To: Andrew Yourtchenko 
>> Cc: Christian Hopps ; Ole Troan ;
>> vpp-dev ; Jerome Tollet (jtollet) 
>> Subject: Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev]
>> Proposal for VPP binary API stability
>> 
>> 
>> Knowing that even hard-core big-endian systems like PowerPC decided to
>> switch to Little endian and that situation where binary api will be
>> exchanged between BE and LE systems is extremely small, maybe we should
>> consider removing endian conversion from APIs and simply represent data
>> in the native form. Looks like this is fertile ground for new bugs…
>> 
>> Thoughts?
>> 
>> —
>> Damjan
>> 
>> 
>>> On 15 May 2020, at 16:20, Andrew Yourtchenko
>>> mailto:ayour...@gmail.com>> wrote:
>>> 
>>> There's a very interesting couple of gerrit changes that just came in
>>> today that is worth discussing,
>>> and they are a perfect case study to further clarify the process - so
>>> I tweaked the subject accordingly..
>>> The API message itself is relatively minor, but regardless of what is
>>> agreed that makes a good case study.
>>> 
>>> Backstory:
>>> 
>>> Once upon a time on Aug 20 2019, commit 053204ab changed
>>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>>> conversion
>>> function didn't make it there (enums are u32 by default as far as I can
>>> see).
>>> 
>>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>>> current 20.05-rc1.
>>> 
>>> Fast forward a bit, today I was pinged about the two changes - one for
>>> master, one for stable/2001:
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a
>>> u8
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>>> htonl() call and changes the existing ("buggy")
>>> behavior in the 20.01.2 - thus would silently break any API consumers
>>> that coded against the previous "buggy" behavior.
>>> 
>>> And then we have a question open about stable/2005, which "by the
>>> letter" can potentially accept only the second approach, since it is
>>> past the API freeze.
>>> 
>>> Additional bit: this API is not touched in make test, so this bug had
>>> slipped through.
>>> 
>>> So there are the following options:
>>> 
>>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>>> making a "silent change" in both, but making the master look closer to
>>> a 19.08 format.
>>> 
>>> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
>>> merge the master patch that shrinks the enum down to 1 byte
>>> 
>>> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
>>> endianness of the u32 enum everywhere, but making an effective "silent
>>> change" in 20.01&20.05
>>> 
>>> 4)  merge the patch in master that shrinks the enum down to one byte,
>>> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
>>> "no api changes" but at least this gets to be explicitly visible early
>>> on.
>>> 
>>> 5) under the current proposal, if the API message is defined as
>>> "production" then there would need to be a new "in-progress" message
>>> in master with either of the two fixes, the buggy message would need
>>> to be marked as "deprecated".  And under the current proposal by the
>>> 20.09 the "in-progress" message would become "production", the current

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Andrew Yourtchenko
Very valid point. Endianness is only aspect. Accidental size changes on the 
wire is different.

But this drifted very far astray from my original question: if we keep 
shuffling the bits on wire regardless, how much of a value does a meticulous 
CRC tracking process provides ? i see it as actively harmful in that case, 
since it creates false expectations.

--a

> On 16 May 2020, at 14:02, Christian Hopps  wrote:
> 
> As a user of these APIs I would hugely appreciate work be done on the API 
> framework to make it more usable (don't require callbacks) fix the automatic 
> endian conversion it already generates, improve support for argument 
> additions (these are easy to support in a backward compatible way), and maybe 
> arg size changes, into non-issues, rather than punting on this particular 
> endian aspect and thinking everything is all-good.
> 
> Thanks,
> Chris.
> 
>> On May 16, 2020, at 7:49 AM, Andrew Yourtchenko  wrote:
>> 
>> I was hesitating to write exactly that - so at least I am not alone in 
>> thinking this... :-)
>> 
>> With my RelMgr hat on I would be seeing this as to have a -2’d  patch in 
>> gerrit whose existence  I can announce in 20.05 release notes, which would 
>> apply to both stable/2005 branch and to master branch, with an optional 
>> add-on patch that was master specific - so that the change can be 
>> concentrated. Probably even a separate make verify job(s) that would 
>> cherrypick this on top of whatever patch is being verified - such that we 
>> have some assurance that things don’t break.
>> 
>> This patch would add a single boolean “api is big endian”, and create 
>> api_htonX/api_ntohX versions of the endianness functions which will be 
>> no-ops if that boolean is false.
>> 
>> This patch would also add a new api call “is_vpp_api_is_big_endian”, which 
>> would return a the value of the above Boolean.
>> 
>> Its presence means that the whatever abstraction layer is there needs to 
>> query it and to if it returns true, to  make htonX()/ntohX calls in the APIs 
>> to be a a conditional no-op in the same fashion as in VPP.
>> 
>> This will make for a backwards compatible change that can even be even 
>> backported to earlier branches if needs to... 
>> 
>> Sometime in July this could be going into master.
>> 
>> Then that change goes into 2009, with the default still being big-endian, as 
>> well as is committed to 2005 (it’s an api addition that doesn’t break 
>> anything else by default, and by then the folks must have seen it on master).
>> 
>> Then somewhere before 2101, say in November, we flip the default.
>> 
>> This gives two releases to iron out the kinks, and 2109 LTS ships with the 
>> default little endian API.
>> 
>> 2201 then removes the wrappers and makes is_vpp_api_is_big_endian always 
>> return 0.
>> 
>> Granted, it’s a long roadmap, but it’s not exactly a simple thing that we’d 
>> be doing and doing so in a graceful fashion will allow to avoid annoying the 
>> downstream users..
>> 
>> Also of course this implies that we MUST start having a longer term road map 
>> for strategic stuff like this, that should be published and vetted by the 
>> community.
>> 
>> Thoughts ?
>> 
>> --a
>> 
 On 16 May 2020, at 13:22, Damjan Marion  wrote:
>>> 
>>> 
>>> Knowing that even hard-core big-endian systems like PowerPC decided to 
>>> switch to Little endian and that situation where binary api will be 
>>> exchanged between BE and LE systems is extremely small, maybe we should 
>>> consider removing endian conversion from APIs and simply represent data
>>> in the native form. Looks like this is fertile ground for new bugs…
>>> 
>>> Thoughts?
>>> 
>>> — 
>>> Damjan
>>> 
>>> 
 On 15 May 2020, at 16:20, Andrew Yourtchenko  wrote:
 
 There's a very interesting couple of gerrit changes that just came in
 today that is worth discussing,
 and they are a perfect case study to further clarify the process - so
 I tweaked the subject accordingly..
 The API message itself is relatively minor, but regardless of what is
 agreed that makes a good case study.
 
 Backstory:
 
 Once upon a time on Aug 20 2019, commit 053204ab changed
 sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
 conversion
 function didn't make it there (enums are u32 by default as far as I can 
 see).
 
 This was after the 19.08 branch pull, and it wasn't ever tackled, so
 this (buggy) behavior ended being in 20.01, 20.01.1, and in the
 current 20.05-rc1.
 
 Fast forward a bit, today I was pinged about the two changes - one for
 master, one for stable/2001:
 
 https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a 
 u8
 
 https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
 htonl() call and changes the existing ("buggy")
 behavior in the 20.01.2 - thus would silently break any API consumers
 that coded against the previous "buggy" 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Andrew Yourtchenko
On 5/16/20, Dave Barach (dbarach)  wrote:
> API messages in network byte order made sense 10 years ago when I worked
> with a mixed x86_64 / ppc32 system. As Damjan points out, API
> interoperability between big-endian and little-endian systems is a boutique
> use-case these days.
>
> Timing is key. We won’t be able to cherry-pick API message handler fixes
> across an endian-order flag-day. If we decide to switch to native byte
> order, we’d better switch right before we pull our next LTS release.

Do I understand it right that the idea here would be to flip the
endianness right before pulling stable/2009 branch ?

--a


>
> FWIW... Dave
>
> From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion
> via lists.fd.io
> Sent: Saturday, May 16, 2020 7:23 AM
> To: Andrew Yourtchenko 
> Cc: Christian Hopps ; Ole Troan ;
> vpp-dev ; Jerome Tollet (jtollet) 
> Subject: Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev]
> Proposal for VPP binary API stability
>
>
> Knowing that even hard-core big-endian systems like PowerPC decided to
> switch to Little endian and that situation where binary api will be
> exchanged between BE and LE systems is extremely small, maybe we should
> consider removing endian conversion from APIs and simply represent data
> in the native form. Looks like this is fertile ground for new bugs…
>
> Thoughts?
>
> —
> Damjan
>
>
>> On 15 May 2020, at 16:20, Andrew Yourtchenko
>> mailto:ayour...@gmail.com>> wrote:
>>
>> There's a very interesting couple of gerrit changes that just came in
>> today that is worth discussing,
>> and they are a perfect case study to further clarify the process - so
>> I tweaked the subject accordingly..
>> The API message itself is relatively minor, but regardless of what is
>> agreed that makes a good case study.
>>
>> Backstory:
>>
>> Once upon a time on Aug 20 2019, commit 053204ab changed
>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>> conversion
>> function didn't make it there (enums are u32 by default as far as I can
>> see).
>>
>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>> current 20.05-rc1.
>>
>> Fast forward a bit, today I was pinged about the two changes - one for
>> master, one for stable/2001:
>>
>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a
>> u8
>>
>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>> htonl() call and changes the existing ("buggy")
>> behavior in the 20.01.2 - thus would silently break any API consumers
>> that coded against the previous "buggy" behavior.
>>
>> And then we have a question open about stable/2005, which "by the
>> letter" can potentially accept only the second approach, since it is
>> past the API freeze.
>>
>> Additional bit: this API is not touched in make test, so this bug had
>> slipped through.
>>
>> So there are the following options:
>>
>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>> making a "silent change" in both, but making the master look closer to
>> a 19.08 format.
>>
>> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
>> merge the master patch that shrinks the enum down to 1 byte
>>
>> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
>> endianness of the u32 enum everywhere, but making an effective "silent
>> change" in 20.01&20.05
>>
>> 4)  merge the patch in master that shrinks the enum down to one byte,
>> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
>> "no api changes" but at least this gets to be explicitly visible early
>> on.
>>
>> 5) under the current proposal, if the API message is defined as
>> "production" then there would need to be a new "in-progress" message
>> in master with either of the two fixes, the buggy message would need
>> to be marked as "deprecated".  And under the current proposal by the
>> 20.09 the "in-progress" message would become "production", the current
>> message would be shown as "deprecated",  to be deleted in 21.01.
>>
>> So, the feedback that I would appreciate on the above:
>>
>> 1) the least worst course of action "right now", for this particular
>> issue. I discussed this with Ole and we have different opinions, so I
>> would like the actual API users to chime in. Please tell me which is
>> the least worst option from your point of view :-)
>>
>> 2) What is the best course of action in the future. Note, this is also
>> the simpler case in that there is a way to trigger a CRC-affecting
>> change by forcing the enum to be a u8. What would have been the best
>> course of action if it was simply a missing ntohl() for a field that
>> noone complained about for 1.5 releases. Can we assume that no-one
>> used that API and "fix the representation" ?
>>
>> 3) would there be an interest in having a sort of registry of "who
>> wants to know about things related to each API" - so that if a bug
>> like this arises that 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Dave Barach via lists.fd.io
API messages in network byte order made sense 10 years ago when I worked with a 
mixed x86_64 / ppc32 system. As Damjan points out, API interoperability between 
big-endian and little-endian systems is a boutique use-case these days.

Timing is key. We won’t be able to cherry-pick API message handler fixes across 
an endian-order flag-day. If we decide to switch to native byte order, we’d 
better switch right before we pull our next LTS release.

FWIW... Dave

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
lists.fd.io
Sent: Saturday, May 16, 2020 7:23 AM
To: Andrew Yourtchenko 
Cc: Christian Hopps ; Ole Troan ; 
vpp-dev ; Jerome Tollet (jtollet) 
Subject: Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] 
Proposal for VPP binary API stability


Knowing that even hard-core big-endian systems like PowerPC decided to switch 
to Little endian and that situation where binary api will be exchanged between 
BE and LE systems is extremely small, maybe we should consider removing endian 
conversion from APIs and simply represent data
in the native form. Looks like this is fertile ground for new bugs…

Thoughts?

—
Damjan


> On 15 May 2020, at 16:20, Andrew Yourtchenko 
> mailto:ayour...@gmail.com>> wrote:
>
> There's a very interesting couple of gerrit changes that just came in
> today that is worth discussing,
> and they are a perfect case study to further clarify the process - so
> I tweaked the subject accordingly..
> The API message itself is relatively minor, but regardless of what is
> agreed that makes a good case study.
>
> Backstory:
>
> Once upon a time on Aug 20 2019, commit 053204ab changed
> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
> conversion
> function didn't make it there (enums are u32 by default as far as I can see).
>
> This was after the 19.08 branch pull, and it wasn't ever tackled, so
> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
> current 20.05-rc1.
>
> Fast forward a bit, today I was pinged about the two changes - one for
> master, one for stable/2001:
>
> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a u8
>
> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
> htonl() call and changes the existing ("buggy")
> behavior in the 20.01.2 - thus would silently break any API consumers
> that coded against the previous "buggy" behavior.
>
> And then we have a question open about stable/2005, which "by the
> letter" can potentially accept only the second approach, since it is
> past the API freeze.
>
> Additional bit: this API is not touched in make test, so this bug had
> slipped through.
>
> So there are the following options:
>
> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
> making a "silent change" in both, but making the master look closer to
> a 19.08 format.
>
> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
> merge the master patch that shrinks the enum down to 1 byte
>
> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
> endianness of the u32 enum everywhere, but making an effective "silent
> change" in 20.01&20.05
>
> 4)  merge the patch in master that shrinks the enum down to one byte,
> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
> "no api changes" but at least this gets to be explicitly visible early
> on.
>
> 5) under the current proposal, if the API message is defined as
> "production" then there would need to be a new "in-progress" message
> in master with either of the two fixes, the buggy message would need
> to be marked as "deprecated".  And under the current proposal by the
> 20.09 the "in-progress" message would become "production", the current
> message would be shown as "deprecated",  to be deleted in 21.01.
>
> So, the feedback that I would appreciate on the above:
>
> 1) the least worst course of action "right now", for this particular
> issue. I discussed this with Ole and we have different opinions, so I
> would like the actual API users to chime in. Please tell me which is
> the least worst option from your point of view :-)
>
> 2) What is the best course of action in the future. Note, this is also
> the simpler case in that there is a way to trigger a CRC-affecting
> change by forcing the enum to be a u8. What would have been the best
> course of action if it was simply a missing ntohl() for a field that
> noone complained about for 1.5 releases. Can we assume that no-one
> used that API and "fix the representation" ?
>
> 3) would there be an interest in having a sort of registry of "who
> wants to know about things related to each API" - so that if a bug
> like this arises that requires a behavior change to fix, I could poll
> the affected folks and maybe be able to get away with (1) or (3) from
> above ?
>
> And a tweak to the process (and potentially tooling) with regards to
> going to "production API status":
>
> An API that is not touched during a "make test" can not 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Christian Hopps
As a user of these APIs I would hugely appreciate work be done on the API 
framework to make it more usable (don't require callbacks) fix the automatic 
endian conversion it already generates, improve support for argument additions 
(these are easy to support in a backward compatible way), and maybe arg size 
changes, into non-issues, rather than punting on this particular endian aspect 
and thinking everything is all-good.

Thanks,
Chris.

> On May 16, 2020, at 7:49 AM, Andrew Yourtchenko  wrote:
> 
> I was hesitating to write exactly that - so at least I am not alone in 
> thinking this... :-)
> 
> With my RelMgr hat on I would be seeing this as to have a -2’d  patch in 
> gerrit whose existence  I can announce in 20.05 release notes, which would 
> apply to both stable/2005 branch and to master branch, with an optional 
> add-on patch that was master specific - so that the change can be 
> concentrated. Probably even a separate make verify job(s) that would 
> cherrypick this on top of whatever patch is being verified - such that we 
> have some assurance that things don’t break.
> 
> This patch would add a single boolean “api is big endian”, and create 
> api_htonX/api_ntohX versions of the endianness functions which will be no-ops 
> if that boolean is false.
> 
> This patch would also add a new api call “is_vpp_api_is_big_endian”, which 
> would return a the value of the above Boolean.
> 
> Its presence means that the whatever abstraction layer is there needs to 
> query it and to if it returns true, to  make htonX()/ntohX calls in the APIs 
> to be a a conditional no-op in the same fashion as in VPP.
> 
> This will make for a backwards compatible change that can even be even 
> backported to earlier branches if needs to... 
> 
> Sometime in July this could be going into master.
> 
> Then that change goes into 2009, with the default still being big-endian, as 
> well as is committed to 2005 (it’s an api addition that doesn’t break 
> anything else by default, and by then the folks must have seen it on master).
> 
> Then somewhere before 2101, say in November, we flip the default.
> 
> This gives two releases to iron out the kinks, and 2109 LTS ships with the 
> default little endian API.
> 
> 2201 then removes the wrappers and makes is_vpp_api_is_big_endian always 
> return 0.
> 
> Granted, it’s a long roadmap, but it’s not exactly a simple thing that we’d 
> be doing and doing so in a graceful fashion will allow to avoid annoying the 
> downstream users..
> 
> Also of course this implies that we MUST start having a longer term road map 
> for strategic stuff like this, that should be published and vetted by the 
> community.
> 
> Thoughts ?
> 
> --a
> 
>> On 16 May 2020, at 13:22, Damjan Marion  wrote:
>> 
>> 
>> Knowing that even hard-core big-endian systems like PowerPC decided to 
>> switch to Little endian and that situation where binary api will be 
>> exchanged between BE and LE systems is extremely small, maybe we should 
>> consider removing endian conversion from APIs and simply represent data
>> in the native form. Looks like this is fertile ground for new bugs…
>> 
>> Thoughts?
>> 
>> — 
>> Damjan
>> 
>> 
>>> On 15 May 2020, at 16:20, Andrew Yourtchenko  wrote:
>>> 
>>> There's a very interesting couple of gerrit changes that just came in
>>> today that is worth discussing,
>>> and they are a perfect case study to further clarify the process - so
>>> I tweaked the subject accordingly..
>>> The API message itself is relatively minor, but regardless of what is
>>> agreed that makes a good case study.
>>> 
>>> Backstory:
>>> 
>>> Once upon a time on Aug 20 2019, commit 053204ab changed
>>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>>> conversion
>>> function didn't make it there (enums are u32 by default as far as I can 
>>> see).
>>> 
>>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>>> current 20.05-rc1.
>>> 
>>> Fast forward a bit, today I was pinged about the two changes - one for
>>> master, one for stable/2001:
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a 
>>> u8
>>> 
>>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>>> htonl() call and changes the existing ("buggy")
>>> behavior in the 20.01.2 - thus would silently break any API consumers
>>> that coded against the previous "buggy" behavior.
>>> 
>>> And then we have a question open about stable/2005, which "by the
>>> letter" can potentially accept only the second approach, since it is
>>> past the API freeze.
>>> 
>>> Additional bit: this API is not touched in make test, so this bug had
>>> slipped through.
>>> 
>>> So there are the following options:
>>> 
>>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>>> making a "silent change" in both, but making the master look closer to
>>> a 19.08 format.
>>> 
>>> 2) Leave the 20.05 and 20.01 alone 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Andrew Yourtchenko
I was hesitating to write exactly that - so at least I am not alone in thinking 
this... :-)

With my RelMgr hat on I would be seeing this as to have a -2’d  patch in gerrit 
whose existence  I can announce in 20.05 release notes, which would apply to 
both stable/2005 branch and to master branch, with an optional add-on patch 
that was master specific - so that the change can be concentrated. Probably 
even a separate make verify job(s) that would cherrypick this on top of 
whatever patch is being verified - such that we have some assurance that things 
don’t break.

This patch would add a single boolean “api is big endian”, and create 
api_htonX/api_ntohX versions of the endianness functions which will be no-ops 
if that boolean is false.

This patch would also add a new api call “is_vpp_api_is_big_endian”, which 
would return a the value of the above Boolean.

Its presence means that the whatever abstraction layer is there needs to query 
it and to if it returns true, to  make htonX()/ntohX calls in the APIs to be a 
a conditional no-op in the same fashion as in VPP.

This will make for a backwards compatible change that can even be even 
backported to earlier branches if needs to... 

Sometime in July this could be going into master.

Then that change goes into 2009, with the default still being big-endian, as 
well as is committed to 2005 (it’s an api addition that doesn’t break anything 
else by default, and by then the folks must have seen it on master).

Then somewhere before 2101, say in November, we flip the default.

This gives two releases to iron out the kinks, and 2109 LTS ships with the 
default little endian API.

2201 then removes the wrappers and makes is_vpp_api_is_big_endian always return 
0.

Granted, it’s a long roadmap, but it’s not exactly a simple thing that we’d be 
doing and doing so in a graceful fashion will allow to avoid annoying the 
downstream users..

Also of course this implies that we MUST start having a longer term road map 
for strategic stuff like this, that should be published and vetted by the 
community.

Thoughts ?

--a

> On 16 May 2020, at 13:22, Damjan Marion  wrote:
> 
> 
> Knowing that even hard-core big-endian systems like PowerPC decided to switch 
> to Little endian and that situation where binary api will be exchanged 
> between BE and LE systems is extremely small, maybe we should consider 
> removing endian conversion from APIs and simply represent data
> in the native form. Looks like this is fertile ground for new bugs…
> 
> Thoughts?
> 
> — 
> Damjan
> 
> 
>> On 15 May 2020, at 16:20, Andrew Yourtchenko  wrote:
>> 
>> There's a very interesting couple of gerrit changes that just came in
>> today that is worth discussing,
>> and they are a perfect case study to further clarify the process - so
>> I tweaked the subject accordingly..
>> The API message itself is relatively minor, but regardless of what is
>> agreed that makes a good case study.
>> 
>> Backstory:
>> 
>> Once upon a time on Aug 20 2019, commit 053204ab changed
>> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
>> conversion
>> function didn't make it there (enums are u32 by default as far as I can see).
>> 
>> This was after the 19.08 branch pull, and it wasn't ever tackled, so
>> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
>> current 20.05-rc1.
>> 
>> Fast forward a bit, today I was pinged about the two changes - one for
>> master, one for stable/2001:
>> 
>> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a u8
>> 
>> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
>> htonl() call and changes the existing ("buggy")
>> behavior in the 20.01.2 - thus would silently break any API consumers
>> that coded against the previous "buggy" behavior.
>> 
>> And then we have a question open about stable/2005, which "by the
>> letter" can potentially accept only the second approach, since it is
>> past the API freeze.
>> 
>> Additional bit: this API is not touched in make test, so this bug had
>> slipped through.
>> 
>> So there are the following options:
>> 
>> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
>> making a "silent change" in both, but making the master look closer to
>> a 19.08 format.
>> 
>> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
>> merge the master patch that shrinks the enum down to 1 byte
>> 
>> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
>> endianness of the u32 enum everywhere, but making an effective "silent
>> change" in 20.01&20.05
>> 
>> 4)  merge the patch in master that shrinks the enum down to one byte,
>> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
>> "no api changes" but at least this gets to be explicitly visible early
>> on.
>> 
>> 5) under the current proposal, if the API message is defined as
>> "production" then there would need to be a new "in-progress" message
>> in master with either of the 

Re: (Q about fixing endianness bugs in handlers) Re: [vpp-dev] Proposal for VPP binary API stability

2020-05-16 Thread Damjan Marion via lists.fd.io

Knowing that even hard-core big-endian systems like PowerPC decided to switch 
to Little endian and that situation where binary api will be exchanged between 
BE and LE systems is extremely small, maybe we should consider removing endian 
conversion from APIs and simply represent data
in the native form. Looks like this is fertile ground for new bugs…

Thoughts?

— 
Damjan


> On 15 May 2020, at 16:20, Andrew Yourtchenko  wrote:
> 
> There's a very interesting couple of gerrit changes that just came in
> today that is worth discussing,
> and they are a perfect case study to further clarify the process - so
> I tweaked the subject accordingly..
> The API message itself is relatively minor, but regardless of what is
> agreed that makes a good case study.
> 
> Backstory:
> 
> Once upon a time on Aug 20 2019, commit 053204ab changed
> sw_interface_set_rx_mode.mode from u8 to an enum, but an htonl
> conversion
> function didn't make it there (enums are u32 by default as far as I can see).
> 
> This was after the 19.08 branch pull, and it wasn't ever tackled, so
> this (buggy) behavior ended being in 20.01, 20.01.1, and in the
> current 20.05-rc1.
> 
> Fast forward a bit, today I was pinged about the two changes - one for
> master, one for stable/2001:
> 
> https://gerrit.fd.io/r/c/vpp/+/26879 - in master - forces the enum to be a u8
> 
> https://gerrit.fd.io/r/c/vpp/+/26915 - in stable/2001 - adds the
> htonl() call and changes the existing ("buggy")
> behavior in the 20.01.2 - thus would silently break any API consumers
> that coded against the previous "buggy" behavior.
> 
> And then we have a question open about stable/2005, which "by the
> letter" can potentially accept only the second approach, since it is
> past the API freeze.
> 
> Additional bit: this API is not touched in make test, so this bug had
> slipped through.
> 
> So there are the following options:
> 
> 1) Merge both patches, and treat the 20.05 similar to 20.01, thus
> making a "silent change" in both, but making the master look closer to
> a 19.08 format.
> 
> 2) Leave the 20.05 and 20.01 alone with the "buggy" behavior, and
> merge the master patch that shrinks the enum down to 1 byte
> 
> 3) Merge the 20.01 and cherry-pick it to master and 2005 - fixing the
> endianness of the u32 enum everywhere, but making an effective "silent
> change" in 20.01&20.05
> 
> 4)  merge the patch in master that shrinks the enum down to one byte,
> and cherry-pick it to 20.01 and 20.05 - thus breaking the contract of
> "no api changes" but at least this gets to be explicitly visible early
> on.
> 
> 5) under the current proposal, if the API message is defined as
> "production" then there would need to be a new "in-progress" message
> in master with either of the two fixes, the buggy message would need
> to be marked as "deprecated".  And under the current proposal by the
> 20.09 the "in-progress" message would become "production", the current
> message would be shown as "deprecated",  to be deleted in 21.01.
> 
> So, the feedback that I would appreciate on the above:
> 
> 1) the least worst course of action "right now", for this particular
> issue. I discussed this with Ole and we have different opinions, so I
> would like the actual API users to chime in. Please tell me which is
> the least worst option from your point of view :-)
> 
> 2) What is the best course of action in the future. Note, this is also
> the simpler case in that there is a way to trigger a CRC-affecting
> change by forcing the enum to be a u8. What would have been the best
> course of action if it was simply a missing ntohl() for a field that
> noone complained about for 1.5 releases. Can we assume that no-one
> used that API and "fix the representation" ?
> 
> 3) would there be an interest in having a sort of registry of "who
> wants to know about things related to each API" - so that if a bug
> like this arises that requires a behavior change to fix, I could poll
> the affected folks and maybe be able to get away with (1) or (3) from
> above ?
> 
> And a tweak to the process (and potentially tooling) with regards to
> going to "production API status":
> 
> An API that is not touched during a "make test" can not be moved
> beyond the "in-progress" status. Adding the unit test implies the
> commitment from the API contributor to follow the option (5) above
> even for "trivial endianness bugs".
> 
> Thoughts ?
> 
> --a
> 
> 
> 
> 
> On 5/14/20, Christian Hopps  wrote:
>> API stability is borderline critical for a project like VPP I think. Yes, it
>> can be used stand-alone, but real value is added by building products on top
>> of it.
>> 
>> Also important is having an API framework that allows for
>> backward-compatible changes to the API for making improvements.
>> 
>> I'm not sure if VPP has the latter, but if there's too much pain with your
>> proposed rules I think the latter should addressed rather than sacrificing
>> the former. So I support your rules.
>> 
>> Thanks,
>> Chris.
>> 

Re: [vpp-dev] vpp-merge-master-centos7 jobs are broken

2020-05-16 Thread Andrew Yourtchenko

1) “jobs are broken” is a bit of a strong assessment for this case - it would 
be more precise to say “sporadic failures” - there are plenty of blue bullets 
after and before that job:
https://jenkins.fd.io/view/vpp/job/vpp-merge-master-centos7/ 

(the trend overall is not stellar - 
https://jenkins.fd.io/view/vpp/job/vpp-merge-master-centos7/buildTimeTrend - 
but you see the big red dip when we had been debugging the issue last weekend 
and that was rather broken :-)

2) About the nature of the failure: this is what I see in 
https://jenkins.fd.io/view/vpp/job/vpp-merge-master-centos7/9496/console:

05:14:22 ***
05:14:22 + echo '* VPP BUILD SUCCESSFULLY COMPLETED'
05:14:22 * VPP BUILD SUCCESSFULLY COMPLETED
05:14:22 + echo 
'***'
05:14:22 ***

...

05:15:22 Pushing 
./build-root/vpp-selinux-policy-20.09-rc0~22_gf3a522f~b9496.x86_64.rpm... 
success!
05:15:22 Pushing ./build/external/vpp-ext-deps-20.09-0.x86_64.rpm... error:
05:15:22 
05:15:22filename: has already been taken
05:15:22 
05:15:22 Build step 'Execute shell' marked build as failure
05:15:22 $ ssh-agent -k
05:15:22 unset SSH_AUTH_SOCK;
05:15:22 unset SSH_AGENT_PID;
05:15:22 echo Agent pid 72 killed;
05:15:22 [ssh-agent] Stopped.

It means that the job had expected to push the VPP-ext-deps (which is updated 
every once in a while only), but found that the file with that name is already 
there in the repo. If the file is right content, it’s not the end of the world 
at all. Though odd that it happens only once in a while, I will try to look at 
it - probably after the 20.05 release is done.

In the memif issues you mention, there are some *warnings* that don’t affect 
the state of the job.

worth having a person who is the expert in that area looking at them and seeing 
if they are real or bogus.. I suppose got blame will tell who that is - 
probably Damjan ?

--a

>> On 16 May 2020, at 06:14, Paul Vinciguerra  
>> wrote:
> I'm not sure who this should go to, nor the impact, so I'm posting it here.
> 
> vpp-merge-master-centos7 is failing due to libmemif and 
> [-Wstringop-overflow=] see [0].
> 
> [0] 
> https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-merge-master-centos7/9496/console.log.gz
>  
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16423): https://lists.fd.io/g/vpp-dev/message/16423
Mute This Topic: https://lists.fd.io/mt/74243105/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-