Re: [ovs-discuss] "ovs|01253|reconnect|ERR|tcp:127.0.0.1:50814: no response to inactivity probe after 5.01 seconds, disconnecting" messages and lost packets

Jean-Philippe Méthot via discuss Thu, 27 Sep 2018 13:00:01 -0700

> So something in this openstack driver is broken, because it does not respond 
> to server probes.


I must specify, it’s not broken ALL the time. It seems to start breaking when 
traffic increases on the openstack setup. This increase though is not 
accompanied by a CPU overload though: it’s happening right now and the load is 
barely over 1, on a 16 core xeon.The client trying to connect also appears to 
be ovs-vswitchd.


Jean-Philippe Méthot
Openstack system administrator
Administrateur système Openstack
PlanetHoster inc.




> Le 27 sept. 2018 à 15:47, Paul Greenberg <[email protected]> a écrit :
> 
> This specific error is triggered by the following. When a client connects to 
> ovsdb json rpc server, it has to follow certain protocol. In this case, a the 
> server sends probes, and the client must acknowledge them by sending the 
> exact message it received from the server back to the server. If a client, 
> does not do that in time, the server drops the client.
> 
> So something in this openstack driver is broken, because it does not respond 
> to server probes.
> 
> Best Regards,
> Paul Greenberg
>  
> From: 20230277700n behalf of 
> Sent: Thursday, September 27, 2018 3:40 PM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: [ovs-discuss] "ovs|01253|reconnect|ERR|tcp:127.0.0.1:50814: no 
> response to inactivity probe after 5.01 seconds, disconnecting" messages and 
> lost packets
>  
> 
> ovs-vswitchd is multi-threaded. ovsdb-server is single threaded. 
> (You did not answer my question about the file from which the logs were 
> printed in your email)
> 
> Who is at 127.0.0.1:45928 <http://127.0.0.1:45928/> and 127.0.0.1:45930 
> <http://127.0.0.1:45930/>?
> 
> On Thu, 27 Sep 2018 at 11:14, Jean-Philippe Méthot 
> <[email protected] <mailto:[email protected]>> wrote:
> Thank you for your reply.
> 
> This is Openstack with ml2 plugin. There’s no other 3rd party application 
> used with our network, so no OVN or anything of the sort. Essentially, to 
> give a quick idea of the topology, we have our vms on our compute nodes going 
> through GRE tunnels toward network nodes where they are routed in network 
> namespace toward a flat external network.
> 
>> Generally, the above indicates that a daemon fronting a Open vSwitch 
>> database hasn't been able to connect to its client. Usually happens when CPU 
>> consumption is very high.
> 
> Our network nodes CPU are literally sleeping. Is openvswitch single-thread or 
> multi-thread though? If ovs overloaded a single thread, it’s possible I may 
> have missed it.
> 
> Jean-Philippe Méthot
> Openstack system administrator
> Administrateur système Openstack
> PlanetHoster inc.
> 
> 
> 
> 
>> Le 27 sept. 2018 à 14:04, Guru Shetty <[email protected] <mailto:[email protected]>> a 
>> écrit :
>> 
>> 
>> 
>> On Wed, 26 Sep 2018 at 12:59, Jean-Philippe Méthot via discuss 
>> <[email protected] <mailto:[email protected]>> wrote:
>> Hi,
>> 
>> I’ve been using openvswitch for my networking backend on openstack for 
>> several years now. Lately, as our network has grown, we’ve started noticing 
>> some intermittent packet drop accompanied with the following error message 
>> in openvswitch:
>> 
>> 2018-09-26T04:15:20.676Z|00005|reconnect|ERR|tcp:127.0.0.1:45928 
>> <http://127.0.0.1:45928/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:15:20.677Z|00006|reconnect|ERR|tcp:127.0.0.1:45930 
>> <http://127.0.0.1:45930/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 
>> Open vSwitch is a project with multiple daemons. Since you are using 
>> OpenStack, it is not clear from your message, what type of networking plugin 
>> you are using. Do you use OVN?
>> Also, you did not mention from which file you have gotten the above errors.
>> 
>> Generally, the above indicates that a daemon fronting a Open vSwitch 
>> database hasn't been able to connect to its client. Usually happens when CPU 
>> consumption is very high.
>> 
>>  
>> 2018-09-26T04:15:30.409Z|00007|reconnect|ERR|tcp:127.0.0.1:45874 
>> <http://127.0.0.1:45874/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:15:33.661Z|00008|reconnect|ERR|tcp:127.0.0.1:45934 
>> <http://127.0.0.1:45934/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:15:33.847Z|00009|reconnect|ERR|tcp:127.0.0.1:45894 
>> <http://127.0.0.1:45894/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:16:03.247Z|00010|reconnect|ERR|tcp:127.0.0.1:45958 
>> <http://127.0.0.1:45958/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:16:21.534Z|00011|reconnect|ERR|tcp:127.0.0.1:45956 
>> <http://127.0.0.1:45956/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:16:21.786Z|00012|reconnect|ERR|tcp:127.0.0.1:45974 
>> <http://127.0.0.1:45974/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:16:47.085Z|00013|reconnect|ERR|tcp:127.0.0.1:45988 
>> <http://127.0.0.1:45988/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:16:49.618Z|00014|reconnect|ERR|tcp:127.0.0.1:45982 
>> <http://127.0.0.1:45982/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:16:53.321Z|00015|reconnect|ERR|tcp:127.0.0.1:45964 
>> <http://127.0.0.1:45964/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:17:15.543Z|00016|reconnect|ERR|tcp:127.0.0.1:45986 
>> <http://127.0.0.1:45986/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:17:24.767Z|00017|reconnect|ERR|tcp:127.0.0.1:45990 
>> <http://127.0.0.1:45990/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:17:31.735Z|00018|reconnect|ERR|tcp:127.0.0.1:45998 
>> <http://127.0.0.1:45998/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:20:12.593Z|00019|reconnect|ERR|tcp:127.0.0.1:46014 
>> <http://127.0.0.1:46014/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:23:51.996Z|00020|reconnect|ERR|tcp:127.0.0.1:46028 
>> <http://127.0.0.1:46028/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:25:12.187Z|00021|reconnect|ERR|tcp:127.0.0.1:46022 
>> <http://127.0.0.1:46022/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:25:28.871Z|00022|reconnect|ERR|tcp:127.0.0.1:46056 
>> <http://127.0.0.1:46056/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:27:11.663Z|00023|reconnect|ERR|tcp:127.0.0.1:46046 
>> <http://127.0.0.1:46046/>: no response to inactivity probe after 5 seconds, 
>> disconnecting
>> 2018-09-26T04:29:56.161Z|00024|jsonrpc|WARN|tcp:127.0.0.1:46018 
>> <http://127.0.0.1:46018/>: receive error: Connection reset by peer
>> 2018-09-26T04:29:56.161Z|00025|reconnect|WARN|tcp:127.0.0.1:46018 
>> <http://127.0.0.1:46018/>: connection dropped (Connection reset by peer)
>> 
>> This definitely kills the connection for a few seconds before it reconnects. 
>> So, I’ve been wondering, what is this probe and what is really happening 
>> here? What’s the cause and is there a way to fix this? 
>> 
>> Openvswitch version is 2.9.0-3 on CentOS 7 with Openstack Pike running on it 
>> (but the issues show up on Queens too).
>> 
>>  
>> Jean-Philippe Méthot
>> Openstack system administrator
>> Administrateur système Openstack
>> PlanetHoster inc.
>> 
>> 
>> 
>> 
>> _______________________________________________
>> discuss mailing list
>> [email protected] <mailto:[email protected]>
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss 
>> <https://mail.openvswitch.org/mailman/listinfo/ovs-discuss>

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] "ovs|01253|reconnect|ERR|tcp:127.0.0.1:50814: no response to inactivity probe after 5.01 seconds, disconnecting" messages and lost packets

Reply via email to