Ow, I’m guilty of having “manually restarted” vpp so I completely avoided the segment cleanup ...
I can’t yet figure out why, but it seems that doing vl_client_api_unmap when vpp does not respond leads to breakage. Could you try the quick fix here [1] and see if it fixes your issue? Cheers, Florin [1] https://gerrit.fd.io/r/#/c/10315/ <https://gerrit.fd.io/r/#/c/10315/> > On Jan 29, 2018, at 11:26 AM, Matthew Smith <mgsm...@netgate.com> wrote: > > Hi Florin, > > If I repeat that test exactly as you ran it, I see the same results as you > did. With a slight modification, the situation I described shows up: > > 1. systemctl start vpp > 2. start vat, execute sw_interface_dump > 3. leave vat running, in another terminal run systemctl restart vpp > 4. in still-running vat, execute ip_address_dump ipv4 sw_if_index 1 > 5. quit vat > 6. start vat > > Basically, get vat to send a message after vpp has been restarted. > > Step 4 shows this error and then the vat prompt returns: > > ip_address_dump error: Misc > > Step 5 shows this and returns me to the shell: > > main:446: BUG: message reply spin-wait timeout > vl_client_disconnect:301: peer unresponsive, give up > > Step 6 hangs for a couple of minutes and then prints: > > vl_map_shmem:639: region init fail > connect_to_vlib_internal:398: vl_client_api map rv -2 > Couldn't connect to vpe, exiting… > > > > Are you able to reproduce this? > > Thanks! > -Matt > > > >> On Jan 26, 2018, at 4:54 PM, Florin Coras <fcoras.li...@gmail.com> wrote: >> >> Hi Matt, >> >> I tried reproducing this with vpp + vat. Is this a fair equivalent scenario? >> >> 1. Start vpp and attach vpp_api_test and send some msg >> 2. Restart vpp and stop vat >> 3. Restart vat and send message. >> >> The thing is, off of master, this works for me. >> >> Thanks, >> Florin >> >>> On Jan 26, 2018, at 2:31 PM, Matthew Smith <mgsm...@netgate.com> wrote: >>> >>> >>> Hi all, >>> >>> I have a few applications that use the shared memory API. I’m running these >>> on CentOS 7.4, and starting VPP using systemd. If VPP happens to crash or >>> be intentionally restarted, those applications never seem to recover their >>> API connection. They notice that the original VPP process died and try to >>> call vl_client_disconnect_from_vlib(). That call tries to send API messages >>> to cleanly shut down its connection. The application will time out waiting >>> for a response, write a message like: >>> >>> 'vl_client_disconnect:301: peer unresponsive, give up >>> >>> and eventually consider itself disconnected. When it tries to reconnect, it >>> hangs for a while (100 seconds on the last occurrence I checked on) and >>> then prints messages like: >>> >>> vl_map_shmem:619: region init fail >>> connect_to_vlib_internal:394: vl_client_api map rv -2 >>> >>> The client keeps on trying and continues seeing those same errors. If the >>> client is restarted, it sees the same errors after restart. It doesn’t >>> recover until VPP is restarted with the client stopped. Once that happens, >>> the client can be started again and successfully connect. >>> >>> The VPP systemd service file that is installed with RPMs built via ‘make >>> pkg-rpm' has the following: >>> >>> [Service] >>> ExecStartPre=-/bin/rm -f /dev/shm/db /dev/shm/global_vm /dev/shm/vpe-api >>> >>> When systemd starts VPP, it removes these files which the still-running >>> client applications have run shm_open/mmap on. I am guessing that when >>> those clients try to disconnect with vl_client_disconnect_from_vlib(), they >>> are stomping on something in shared memory that subsequently keeps them >>> from being able to connect. If I comment that command from the systemd >>> service definition, the problem behavior I described above disappears. The >>> applications write one ‘peer unresponsive’ message and then they reconnect >>> to the API successfully and all is (relatively) well. This also is the case >>> if I don’t start VPP with systemd/systemctl and just run /usr/bin/vpp >>> directly. >>> >>> Does anyone have any thoughts on whether it would be ok to remove that >>> command from the systemd service file? Or is there some other better way to >>> deal with VPP crashing from the perspective of a client to the shared >>> memory API? >>> >>> Thanks! >>> -Matt >>> >>> _______________________________________________ >>> vpp-dev mailing list >>> vpp-dev@lists.fd.io >>> https://lists.fd.io/mailman/listinfo/vpp-dev >> >
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev