Re: [vpp-dev] recovering from a crash with the C shared memory API

Florin Coras Fri, 26 Jan 2018 14:54:46 -0800

Hi Matt, 

I tried reproducing this with vpp + vat. Is this a fair equivalent scenario?


1. Start vpp and attach vpp_api_test and send some msg
2. Restart vpp and stop vat
3. Restart vat and send message. 

The thing is, off of master, this works for me. 

Thanks, 
Florin

> On Jan 26, 2018, at 2:31 PM, Matthew Smith <mgsm...@netgate.com> wrote:
> 
> 
> Hi all,
> 
> I have a few applications that use the shared memory API. I’m running these 
> on CentOS 7.4, and starting VPP using systemd. If VPP happens to crash or be 
> intentionally restarted, those applications never seem to recover their API 
> connection. They notice that the original VPP process died and try to call 
> vl_client_disconnect_from_vlib(). That call tries to send API messages to 
> cleanly shut down its connection. The application will time out waiting for a 
> response, write a message like:
> 
> 'vl_client_disconnect:301: peer unresponsive, give up
> 
> and eventually consider itself disconnected. When it tries to reconnect, it 
> hangs for a while (100 seconds on the last occurrence I checked on) and then 
> prints messages like:
> 
> vl_map_shmem:619: region init fail
> connect_to_vlib_internal:394: vl_client_api map rv -2
> 
> The client keeps on trying and continues seeing those same errors. If the 
> client is restarted, it sees the same errors after restart. It doesn’t 
> recover until VPP is restarted with the client stopped. Once that happens, 
> the client can be started again and successfully connect.
> 
> The VPP systemd service file that is installed with RPMs built via ‘make 
> pkg-rpm' has the following:
> 
> [Service]
> ExecStartPre=-/bin/rm -f /dev/shm/db /dev/shm/global_vm /dev/shm/vpe-api
> 
> When systemd starts VPP, it removes these files which the still-running 
> client applications have run shm_open/mmap on. I am guessing that when those 
> clients try to disconnect with vl_client_disconnect_from_vlib(), they are 
> stomping on something in shared memory that subsequently keeps them from 
> being able to connect. If I comment that command from the systemd service 
> definition, the problem behavior I described above disappears. The 
> applications write one ‘peer unresponsive’ message and then they reconnect to 
> the API successfully and all is (relatively) well. This also is the case if I 
> don’t start VPP with systemd/systemctl and just run /usr/bin/vpp directly.
> 
> Does anyone have any thoughts on whether it would be ok to remove that 
> command from the systemd service file? Or is there some other better way to 
> deal with VPP crashing from the perspective of a client to the shared memory 
> API?
> 
> Thanks!
> -Matt
> 
> _______________________________________________
> vpp-dev mailing list
> vpp-dev@lists.fd.io
> https://lists.fd.io/mailman/listinfo/vpp-dev

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] recovering from a crash with the C shared memory API

Reply via email to