Hi Matt, I tried reproducing this with vpp + vat. Is this a fair equivalent scenario?
1. Start vpp and attach vpp_api_test and send some msg 2. Restart vpp and stop vat 3. Restart vat and send message. The thing is, off of master, this works for me. Thanks, Florin > On Jan 26, 2018, at 2:31 PM, Matthew Smith <mgsm...@netgate.com> wrote: > > > Hi all, > > I have a few applications that use the shared memory API. I’m running these > on CentOS 7.4, and starting VPP using systemd. If VPP happens to crash or be > intentionally restarted, those applications never seem to recover their API > connection. They notice that the original VPP process died and try to call > vl_client_disconnect_from_vlib(). That call tries to send API messages to > cleanly shut down its connection. The application will time out waiting for a > response, write a message like: > > 'vl_client_disconnect:301: peer unresponsive, give up > > and eventually consider itself disconnected. When it tries to reconnect, it > hangs for a while (100 seconds on the last occurrence I checked on) and then > prints messages like: > > vl_map_shmem:619: region init fail > connect_to_vlib_internal:394: vl_client_api map rv -2 > > The client keeps on trying and continues seeing those same errors. If the > client is restarted, it sees the same errors after restart. It doesn’t > recover until VPP is restarted with the client stopped. Once that happens, > the client can be started again and successfully connect. > > The VPP systemd service file that is installed with RPMs built via ‘make > pkg-rpm' has the following: > > [Service] > ExecStartPre=-/bin/rm -f /dev/shm/db /dev/shm/global_vm /dev/shm/vpe-api > > When systemd starts VPP, it removes these files which the still-running > client applications have run shm_open/mmap on. I am guessing that when those > clients try to disconnect with vl_client_disconnect_from_vlib(), they are > stomping on something in shared memory that subsequently keeps them from > being able to connect. If I comment that command from the systemd service > definition, the problem behavior I described above disappears. The > applications write one ‘peer unresponsive’ message and then they reconnect to > the API successfully and all is (relatively) well. This also is the case if I > don’t start VPP with systemd/systemctl and just run /usr/bin/vpp directly. > > Does anyone have any thoughts on whether it would be ok to remove that > command from the systemd service file? Or is there some other better way to > deal with VPP crashing from the perspective of a client to the shared memory > API? > > Thanks! > -Matt > > _______________________________________________ > vpp-dev mailing list > vpp-dev@lists.fd.io > https://lists.fd.io/mailman/listinfo/vpp-dev _______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev