Re: [ovs-discuss] "ovs|01253|reconnect|ERR|tcp:127.0.0.1:50814: no response to inactivity probe after 5.01 seconds, disconnecting" messages and lost packets

Jean-Philippe Méthot via discuss Thu, 27 Sep 2018 13:18:26 -0700

> 1. Who is at 127.0.0.1:6633 <http://127.0.0.1:6633/>?  This is likely a 
> openflow controller.


That would be the neutron-openvswitch-agent, so the openstack service managing 
openvswitch.

> 2. What does `ovs-vsctl list controller` say?
_uuid               : ff2dca74-9628-43c8-b89c-8d2f1242dd3f
connection_mode     : out-of-band
controller_burst_limit: []
controller_rate_limit: []
enable_async_messages: []
external_ids        : {}
inactivity_probe    : []
is_connected        : false
local_gateway       : []
local_ip            : []
local_netmask       : []
max_backoff         : []
other_config        : {}
role                : other
status              : {last_error="Connection timed out", 
sec_since_connect="22", sec_since_disconnect="1", state=BACKOFF}
target              : "tcp:127.0.0.1:6633"

_uuid               : 4f9ae2d1-4f1b-460c-b2bc-c96d24f445bb
connection_mode     : out-of-band
controller_burst_limit: []
controller_rate_limit: []
enable_async_messages: []
external_ids        : {}
inactivity_probe    : []
is_connected        : false
local_gateway       : []
local_ip            : []
local_netmask       : []
max_backoff         : []
other_config        : {}
role                : other
status              : {last_error="Connection timed out", 
sec_since_connect="1284", sec_since_disconnect="14", state=CONNECTING}
target              : "tcp:127.0.0.1:6633"

_uuid               : 1b503dbf-3117-45c2-9e2b-0f50cb48554b
connection_mode     : out-of-band
controller_burst_limit: []
controller_rate_limit: []
enable_async_messages: []
external_ids        : {}
inactivity_probe    : []
is_connected        : false
local_gateway       : []
local_ip            : []
local_netmask       : []
max_backoff         : []
other_config        : {}
role                : other
status              : {last_error="Connection timed out", 
sec_since_connect="22", sec_since_disconnect="1", state=BACKOFF}
target              : "tcp:127.0.0.1:6633 »

> 3. What does `ovs-vsctl list manager` say?

_uuid               : 7f6c413f-972e-4ef2-89dd-1fa6078abcfe
connection_mode     : []
external_ids        : {}
inactivity_probe    : []
is_connected        : false
max_backoff         : []
other_config        : {}
status              : {bound_port="6640", sec_since_connect="0", 
sec_since_disconnect="0"}
target              : "ptcp:6640:127.0.0.1"

> 4. ovs-appctl -t ovsdb-server ovsdb-server/list-remotes

db:Open_vSwitch,Open_vSwitch,manager_options
punix:/var/run/openvswitch/db.sock

> 5. What does 'ps -ef | grep ovs' say?

openvsw+   939     1  0 19:45 ?        00:00:49 ovsdb-server 
/etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info 
--remote=punix:/var/run/openvswitch/db.sock 
--private-key=db:Open_vSwitch,SSL,private_key 
--certificate=db:Open_vSwitch,SSL,certificate 
--bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user openvswitch:hugetlbfs 
--no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log 
--pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
openvsw+  1013     1 11 19:45 ?        00:17:46 ovs-vswitchd 
unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info 
--mlockall --user openvswitch:hugetlbfs --no-chdir 
--log-file=/var/log/openvswitch/ovs-vswitchd.log 
--pidfile=/var/run/openvswitchovs-vswitchd.pid --detach
neutron  25792  2414  0 22:16 ?        00:00:00 ovsdb-client monitor 
tcp:127.0.0.1:6640 Bridge name --format=json

Jean-Philippe Méthot
Openstack system administrator
Administrateur système Openstack
PlanetHoster inc.




> Le 27 sept. 2018 à 16:09, Guru Shetty <[email protected]> a écrit :
> 
> 
> 
> On Thu, 27 Sep 2018 at 12:52, Jean-Philippe Méthot 
> <[email protected] <mailto:[email protected]>> wrote:
> Sorry, the log file is ovsdb-server.log. ovs-vswitchd.log as the other 
> counterpart of this error it seems:
> 
> 2018-09-27T19:38:01.217Z|00783|rconn|ERR|br-tun<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: no response to inactivity probe after 5 seconds, 
> disconnecting
> 2018-09-27T19:38:01.218Z|00784|rconn|ERR|br-ex<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: no response to inactivity probe after 5 seconds, 
> disconnecting
> 
> 1. Who is at 127.0.0.1:6633 <http://127.0.0.1:6633/>?  This is likely a 
> openflow controller. 
> 2. What does `ovs-vsctl list controller` say?
> 3. What does `ovs-vsctl list manager` say?
> 4. ovs-appctl -t ovsdb-server ovsdb-server/list-remotes
> 5. What does 'ps -ef | grep ovs' say?
> 
> I am asking these simple questions because, I am not familiar with OpenStack 
> ml2. 
> 
> 
> 
>  
> 2018-09-27T19:38:02.218Z|00785|rconn|INFO|br-tun<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: connecting...
> 2018-09-27T19:38:02.218Z|00786|rconn|INFO|br-ex<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: connecting...
> 2018-09-27T19:38:03.218Z|00787|rconn|INFO|br-tun<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: connection timed out
> 2018-09-27T19:38:03.218Z|00788|rconn|INFO|br-tun<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: waiting 2 seconds before reconnect
> 2018-09-27T19:38:03.218Z|00789|rconn|INFO|br-ex<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: connection timed out
> 2018-09-27T19:38:03.218Z|00790|rconn|INFO|br-ex<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: waiting 2 seconds before reconnect
> 2018-09-27T19:38:05.218Z|00791|rconn|INFO|br-tun<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: connecting...
> 2018-09-27T19:38:05.218Z|00792|rconn|INFO|br-ex<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: connecting...
> 2018-09-27T19:38:06.221Z|00793|rconn|INFO|br-tun<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: connected
> 2018-09-27T19:38:06.222Z|00794|rconn|INFO|br-ex<->tcp:127.0.0.1:6633 
> <http://127.0.0.1:6633/>: connected
> 
>> Who is at 127.0.0.1:45928 <http://127.0.0.1:45928/> and 127.0.0.1:45930 
>> <http://127.0.0.1:45930/>?
> 
> That seems to be ovs-vswitchd in that range. of course, these ports seem to 
> change all the time, but I think vswitchd tend to stay in that range.
> 
> Here’s an example of "ss -anp |grep ovs" so you can have an idea of the port 
> mapping.
> 
> tcp    LISTEN     0      10     127.0.0.1:6640 <http://127.0.0.1:6640/>       
>            *:*                   users:(("ovsdb-server",pid=939,fd=19))
> tcp    ESTAB      0      0      127.0.0.1:6640 <http://127.0.0.1:6640/>       
>         127.0.0.1:28720 <http://127.0.0.1:28720/>               
> users:(("ovsdb-server",pid=939,fd=20))
> tcp    ESTAB      0      0      127.0.0.1:6640 <http://127.0.0.1:6640/>       
>         127.0.0.1:28734 <http://127.0.0.1:28734/>               
> users:(("ovsdb-server",pid=939,fd=18))
> tcp    ESTAB      0      0      127.0.0.1:6640 <http://127.0.0.1:6640/>       
>         127.0.0.1:28754 <http://127.0.0.1:28754/>               
> users:(("ovsdb-server",pid=939,fd=25))
> tcp    ESTAB      0      0      127.0.0.1:6640 <http://127.0.0.1:6640/>       
>         127.0.0.1:28730 <http://127.0.0.1:28730/>               
> users:(("ovsdb-server",pid=939,fd=24))
> tcp    ESTAB      0      0      127.0.0.1:28754 <http://127.0.0.1:28754/>     
>          127.0.0.1:6640 <http://127.0.0.1:6640/>                
> users:(("ovsdb-client",pid=20965,fd=3))
> tcp    ESTAB      0      0      127.0.0.1:6640 <http://127.0.0.1:6640/>       
>         127.0.0.1:28752 <http://127.0.0.1:28752/>               
> users:(("ovsdb-server",pid=939,fd=23))
> tcp    ESTAB      0      0      127.0.0.1:46917 <http://127.0.0.1:46917/>     
>          127.0.0.1:6633 <http://127.0.0.1:6633/>                
> users:(("ovs-vswitchd",pid=1013,fd=214))
> tcp    ESTAB      0      0      127.0.0.1:6640 <http://127.0.0.1:6640/>       
>         127.0.0.1:28750 <http://127.0.0.1:28750/>               
> users:(("ovsdb-server",pid=939,fd=22))
> tcp    ESTAB      0      0      127.0.0.1:6640 <http://127.0.0.1:6640/>       
>         127.0.0.1:28722 <http://127.0.0.1:28722/>               
> users:(("ovsdb-server",pid=939,fd=21))
> tcp    ESTAB      0      0      127.0.0.1:28752 <http://127.0.0.1:28752/>     
>          127.0.0.1:6640 <http://127.0.0.1:6640/>                
> users:(("ovsdb-client",pid=20363,fd=3))Jean-Philippe Méthot
> 
> 
> Openstack system administrator
> Administrateur système Openstack
> PlanetHoster inc.
> 
> 
> 
> 
>> Le 27 sept. 2018 à 15:39, Guru Shetty <[email protected] <mailto:[email protected]>> a 
>> écrit :
>> 
>> 
>> ovs-vswitchd is multi-threaded. ovsdb-server is single threaded. 
>> (You did not answer my question about the file from which the logs were 
>> printed in your email)
>> 
>> Who is at 127.0.0.1:45928 <http://127.0.0.1:45928/> and 127.0.0.1:45930 
>> <http://127.0.0.1:45930/>?
>> 
>> On Thu, 27 Sep 2018 at 11:14, Jean-Philippe Méthot 
>> <[email protected] <mailto:[email protected]>> wrote:
>> Thank you for your reply.
>> 
>> This is Openstack with ml2 plugin. There’s no other 3rd party application 
>> used with our network, so no OVN or anything of the sort. Essentially, to 
>> give a quick idea of the topology, we have our vms on our compute nodes 
>> going through GRE tunnels toward network nodes where they are routed in 
>> network namespace toward a flat external network.
>> 
>>> Generally, the above indicates that a daemon fronting a Open vSwitch 
>>> database hasn't been able to connect to its client. Usually happens when 
>>> CPU consumption is very high.
>> 
>> Our network nodes CPU are literally sleeping. Is openvswitch single-thread 
>> or multi-thread though? If ovs overloaded a single thread, it’s possible I 
>> may have missed it.
>> 
>> Jean-Philippe Méthot
>> Openstack system administrator
>> Administrateur système Openstack
>> PlanetHoster inc.
>> 
>> 
>> 
>> 
>>> Le 27 sept. 2018 à 14:04, Guru Shetty <[email protected] <mailto:[email protected]>> 
>>> a écrit :
>>> 
>>> 
>>> 
>>> On Wed, 26 Sep 2018 at 12:59, Jean-Philippe Méthot via discuss 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> Hi,
>>> 
>>> I’ve been using openvswitch for my networking backend on openstack for 
>>> several years now. Lately, as our network has grown, we’ve started noticing 
>>> some intermittent packet drop accompanied with the following error message 
>>> in openvswitch:
>>> 
>>> 2018-09-26T04:15:20.676Z|00005|reconnect|ERR|tcp:127.0.0.1:45928 
>>> <http://127.0.0.1:45928/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:15:20.677Z|00006|reconnect|ERR|tcp:127.0.0.1:45930 
>>> <http://127.0.0.1:45930/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 
>>> Open vSwitch is a project with multiple daemons. Since you are using 
>>> OpenStack, it is not clear from your message, what type of networking 
>>> plugin you are using. Do you use OVN?
>>> Also, you did not mention from which file you have gotten the above errors.
>>> 
>>> Generally, the above indicates that a daemon fronting a Open vSwitch 
>>> database hasn't been able to connect to its client. Usually happens when 
>>> CPU consumption is very high.
>>> 
>>>  
>>> 2018-09-26T04:15:30.409Z|00007|reconnect|ERR|tcp:127.0.0.1:45874 
>>> <http://127.0.0.1:45874/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:15:33.661Z|00008|reconnect|ERR|tcp:127.0.0.1:45934 
>>> <http://127.0.0.1:45934/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:15:33.847Z|00009|reconnect|ERR|tcp:127.0.0.1:45894 
>>> <http://127.0.0.1:45894/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:16:03.247Z|00010|reconnect|ERR|tcp:127.0.0.1:45958 
>>> <http://127.0.0.1:45958/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:16:21.534Z|00011|reconnect|ERR|tcp:127.0.0.1:45956 
>>> <http://127.0.0.1:45956/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:16:21.786Z|00012|reconnect|ERR|tcp:127.0.0.1:45974 
>>> <http://127.0.0.1:45974/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:16:47.085Z|00013|reconnect|ERR|tcp:127.0.0.1:45988 
>>> <http://127.0.0.1:45988/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:16:49.618Z|00014|reconnect|ERR|tcp:127.0.0.1:45982 
>>> <http://127.0.0.1:45982/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:16:53.321Z|00015|reconnect|ERR|tcp:127.0.0.1:45964 
>>> <http://127.0.0.1:45964/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:17:15.543Z|00016|reconnect|ERR|tcp:127.0.0.1:45986 
>>> <http://127.0.0.1:45986/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:17:24.767Z|00017|reconnect|ERR|tcp:127.0.0.1:45990 
>>> <http://127.0.0.1:45990/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:17:31.735Z|00018|reconnect|ERR|tcp:127.0.0.1:45998 
>>> <http://127.0.0.1:45998/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:20:12.593Z|00019|reconnect|ERR|tcp:127.0.0.1:46014 
>>> <http://127.0.0.1:46014/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:23:51.996Z|00020|reconnect|ERR|tcp:127.0.0.1:46028 
>>> <http://127.0.0.1:46028/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:25:12.187Z|00021|reconnect|ERR|tcp:127.0.0.1:46022 
>>> <http://127.0.0.1:46022/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:25:28.871Z|00022|reconnect|ERR|tcp:127.0.0.1:46056 
>>> <http://127.0.0.1:46056/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:27:11.663Z|00023|reconnect|ERR|tcp:127.0.0.1:46046 
>>> <http://127.0.0.1:46046/>: no response to inactivity probe after 5 seconds, 
>>> disconnecting
>>> 2018-09-26T04:29:56.161Z|00024|jsonrpc|WARN|tcp:127.0.0.1:46018 
>>> <http://127.0.0.1:46018/>: receive error: Connection reset by peer
>>> 2018-09-26T04:29:56.161Z|00025|reconnect|WARN|tcp:127.0.0.1:46018 
>>> <http://127.0.0.1:46018/>: connection dropped (Connection reset by peer)
>>> 
>>> This definitely kills the connection for a few seconds before it 
>>> reconnects. So, I’ve been wondering, what is this probe and what is really 
>>> happening here? What’s the cause and is there a way to fix this? 
>>> 
>>> Openvswitch version is 2.9.0-3 on CentOS 7 with Openstack Pike running on 
>>> it (but the issues show up on Queens too).
>>> 
>>>  
>>> Jean-Philippe Méthot
>>> Openstack system administrator
>>> Administrateur système Openstack
>>> PlanetHoster inc.
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> discuss mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss 
>>> <https://mail.openvswitch.org/mailman/listinfo/ovs-discuss>
>

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] "ovs|01253|reconnect|ERR|tcp:127.0.0.1:50814: no response to inactivity probe after 5.01 seconds, disconnecting" messages and lost packets

Reply via email to