Re: [ovs-discuss] "ovs|01253|reconnect|ERR|tcp:127.0.0.1:50814: no response to inactivity probe after 5.01 seconds, disconnecting" messages and lost packets

Guru Shetty Thu, 27 Sep 2018 13:25:26 -0700

On Thu, 27 Sep 2018 at 13:17, Jean-Philippe Méthot <
[email protected]> wrote:


> 1. Who is at 127.0.0.1:6633?  This is likely a openflow controller.
>
>
> That would be the neutron-openvswitch-agent, so the openstack service
> managing openvswitch.
>
> 2. What does `ovs-vsctl list controller` say?
>
> _uuid               : ff2dca74-9628-43c8-b89c-8d2f1242dd3f
> connection_mode     : out-of-band
> controller_burst_limit: []
> controller_rate_limit: []
> enable_async_messages: []
> external_ids        : {}
> inactivity_probe    : []
> is_connected        : false
> local_gateway       : []
> local_ip            : []
> local_netmask       : []
> max_backoff         : []
> other_config        : {}
> role                : other
> status              : {last_error="Connection timed out",
> sec_since_connect="22", sec_since_disconnect="1", state=BACKOFF}
> target              : "tcp:127.0.0.1:6633"
>

The above tells that ovs-vswitchd is complaining in its logs about your
agent at 6633 cannot be connected to. So I would start looking at what is
the agent doing at this time. And probably ask for more questions around
this in a OpenStack mailing list. May be at the scale you are running, the
OpenStack agent is struggling.

The ovsdb-server error is also likely because of the same reason.





>
> _uuid               : 4f9ae2d1-4f1b-460c-b2bc-c96d24f445bb
> connection_mode     : out-of-band
> controller_burst_limit: []
> controller_rate_limit: []
> enable_async_messages: []
> external_ids        : {}
> inactivity_probe    : []
> is_connected        : false
> local_gateway       : []
> local_ip            : []
> local_netmask       : []
> max_backoff         : []
> other_config        : {}
> role                : other
> status              : {last_error="Connection timed out",
> sec_since_connect="1284", sec_since_disconnect="14", state=CONNECTING}
> target              : "tcp:127.0.0.1:6633"
>
> _uuid               : 1b503dbf-3117-45c2-9e2b-0f50cb48554b
> connection_mode     : out-of-band
> controller_burst_limit: []
> controller_rate_limit: []
> enable_async_messages: []
> external_ids        : {}
> inactivity_probe    : []
> is_connected        : false
> local_gateway       : []
> local_ip            : []
> local_netmask       : []
> max_backoff         : []
> other_config        : {}
> role                : other
> status              : {last_error="Connection timed out",
> sec_since_connect="22", sec_since_disconnect="1", state=BACKOFF}
> target              : "tcp:127.0.0.1:6633 »
>
> 3. What does `ovs-vsctl list manager` say?
>
>
> _uuid               : 7f6c413f-972e-4ef2-89dd-1fa6078abcfe
> connection_mode     : []
> external_ids        : {}
> inactivity_probe    : []
> is_connected        : false
> max_backoff         : []
> other_config        : {}
> status              : {bound_port="6640", sec_since_connect="0",
> sec_since_disconnect="0"}
> target              : "ptcp:6640:127.0.0.1"
>
> 4. ovs-appctl -t ovsdb-server ovsdb-server/list-remotes
>
>
> db:Open_vSwitch,Open_vSwitch,manager_options
> punix:/var/run/openvswitch/db.sock
>
> 5. What does 'ps -ef | grep ovs' say?
>
>
> openvsw+   939     1  0 19:45 ?        00:00:49 ovsdb-server
> /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info
> --remote=punix:/var/run/openvswitch/db.sock
> --private-key=db:Open_vSwitch,SSL,private_key
> --certificate=db:Open_vSwitch,SSL,certificate
> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --user
> openvswitch:hugetlbfs --no-chdir
> --log-file=/var/log/openvswitch/ovsdb-server.log
> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
> openvsw+  1013     1 11 19:45 ?        00:17:46 ovs-vswitchd
> unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info
> --mlockall --user openvswitch:hugetlbfs --no-chdir
> --log-file=/var/log/openvswitch/ovs-vswitchd.log
> --pidfile=/var/run/openvswitchovs-vswitchd.pid --detach
> neutron  25792  2414  0 22:16 ?        00:00:00 ovsdb-client monitor tcp:
> 127.0.0.1:6640 Bridge name --format=json
>
> Jean-Philippe Méthot
> Openstack system administrator
> Administrateur système Openstack
> PlanetHoster inc.
>
>
>
>
> Le 27 sept. 2018 à 16:09, Guru Shetty <[email protected]> a écrit :
>
>
>
> On Thu, 27 Sep 2018 at 12:52, Jean-Philippe Méthot <
> [email protected]> wrote:
>
>> Sorry, the log file is ovsdb-server.log. ovs-vswitchd.log as the other
>> counterpart of this error it seems:
>>
>> 2018-09-27T19:38:01.217Z|00783|rconn|ERR|br-tun<->tcp:127.0.0.1:6633: no
>> response to inactivity probe after 5 seconds, disconnecting
>> 2018-09-27T19:38:01.218Z|00784|rconn|ERR|br-ex<->tcp:127.0.0.1:6633: no
>> response to inactivity probe after 5 seconds, disconnecting
>>
>
> 1. Who is at 127.0.0.1:6633?  This is likely a openflow controller.
> 2. What does `ovs-vsctl list controller` say?
> 3. What does `ovs-vsctl list manager` say?
> 4. ovs-appctl -t ovsdb-server ovsdb-server/list-remotes
> 5. What does 'ps -ef | grep ovs' say?
>
> I am asking these simple questions because, I am not familiar with
> OpenStack ml2.
>
>
>
>
>
>> 2018-09-27T19:38:02.218Z|00785|rconn|INFO|br-tun<->tcp:127.0.0.1:6633:
>> connecting...
>> 2018-09-27T19:38:02.218Z|00786|rconn|INFO|br-ex<->tcp:127.0.0.1:6633:
>> connecting...
>> 2018-09-27T19:38:03.218Z|00787|rconn|INFO|br-tun<->tcp:127.0.0.1:6633:
>> connection timed out
>> 2018-09-27T19:38:03.218Z|00788|rconn|INFO|br-tun<->tcp:127.0.0.1:6633:
>> waiting 2 seconds before reconnect
>> 2018-09-27T19:38:03.218Z|00789|rconn|INFO|br-ex<->tcp:127.0.0.1:6633:
>> connection timed out
>> 2018-09-27T19:38:03.218Z|00790|rconn|INFO|br-ex<->tcp:127.0.0.1:6633:
>> waiting 2 seconds before reconnect
>> 2018-09-27T19:38:05.218Z|00791|rconn|INFO|br-tun<->tcp:127.0.0.1:6633:
>> connecting...
>> 2018-09-27T19:38:05.218Z|00792|rconn|INFO|br-ex<->tcp:127.0.0.1:6633:
>> connecting...
>> 2018-09-27T19:38:06.221Z|00793|rconn|INFO|br-tun<->tcp:127.0.0.1:6633:
>> connected
>> 2018-09-27T19:38:06.222Z|00794|rconn|INFO|br-ex<->tcp:127.0.0.1:6633:
>> connected
>>
>> Who is at 127.0.0.1:45928 and 127.0.0.1:45930?
>>
>>
>> That seems to be ovs-vswitchd in that range. of course, these ports seem
>> to change all the time, but I think vswitchd tend to stay in that range.
>>
>> Here’s an example of "ss -anp |grep ovs" so you can have an idea of the
>> port mapping.
>>
>> tcp    LISTEN     0      10     127.0.0.1:6640                  *:*
>>               users:(("ovsdb-server",pid=939,fd=19))
>> tcp    ESTAB      0      0      127.0.0.1:6640
>> 127.0.0.1:28720               users:(("ovsdb-server",pid=939,fd=20))
>> tcp    ESTAB      0      0      127.0.0.1:6640
>> 127.0.0.1:28734               users:(("ovsdb-server",pid=939,fd=18))
>> tcp    ESTAB      0      0      127.0.0.1:6640
>> 127.0.0.1:28754               users:(("ovsdb-server",pid=939,fd=25))
>> tcp    ESTAB      0      0      127.0.0.1:6640
>> 127.0.0.1:28730               users:(("ovsdb-server",pid=939,fd=24))
>> tcp    ESTAB      0      0      127.0.0.1:28754
>> 127.0.0.1:6640                users:(("ovsdb-client",pid=20965,fd=3))
>> tcp    ESTAB      0      0      127.0.0.1:6640
>> 127.0.0.1:28752               users:(("ovsdb-server",pid=939,fd=23))
>> tcp    ESTAB      0      0      127.0.0.1:46917
>> 127.0.0.1:6633                users:(("ovs-vswitchd",pid=1013,fd=214))
>> tcp    ESTAB      0      0      127.0.0.1:6640
>> 127.0.0.1:28750               users:(("ovsdb-server",pid=939,fd=22))
>> tcp    ESTAB      0      0      127.0.0.1:6640
>> 127.0.0.1:28722               users:(("ovsdb-server",pid=939,fd=21))
>> tcp    ESTAB      0      0      127.0.0.1:28752
>> 127.0.0.1:6640
>>   users:(("ovsdb-client",pid=20363,fd=3))Jean-Philippe Méthot
>>
>>
>> Openstack system administrator
>> Administrateur système Openstack
>> PlanetHoster inc.
>>
>>
>>
>>
>> Le 27 sept. 2018 à 15:39, Guru Shetty <[email protected]> a écrit :
>>
>>
>> ovs-vswitchd is multi-threaded. ovsdb-server is single threaded.
>> (You did not answer my question about the file from which the logs were
>> printed in your email)
>>
>> Who is at 127.0.0.1:45928 and 127.0.0.1:45930?
>>
>> On Thu, 27 Sep 2018 at 11:14, Jean-Philippe Méthot <
>> [email protected]> wrote:
>>
>>> Thank you for your reply.
>>>
>>> This is Openstack with ml2 plugin. There’s no other 3rd party
>>> application used with our network, so no OVN or anything of the sort.
>>> Essentially, to give a quick idea of the topology, we have our vms on our
>>> compute nodes going through GRE tunnels toward network nodes where they are
>>> routed in network namespace toward a flat external network.
>>>
>>> Generally, the above indicates that a daemon fronting a Open vSwitch
>>> database hasn't been able to connect to its client. Usually happens when
>>> CPU consumption is very high.
>>>
>>>
>>> Our network nodes CPU are literally sleeping. Is openvswitch
>>> single-thread or multi-thread though? If ovs overloaded a single thread,
>>> it’s possible I may have missed it.
>>>
>>> Jean-Philippe Méthot
>>> Openstack system administrator
>>> Administrateur système Openstack
>>> PlanetHoster inc.
>>>
>>>
>>>
>>>
>>> Le 27 sept. 2018 à 14:04, Guru Shetty <[email protected]> a écrit :
>>>
>>>
>>>
>>> On Wed, 26 Sep 2018 at 12:59, Jean-Philippe Méthot via discuss <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I’ve been using openvswitch for my networking backend on openstack for
>>>> several years now. Lately, as our network has grown, we’ve started noticing
>>>> some intermittent packet drop accompanied with the following error message
>>>> in openvswitch:
>>>>
>>>> 2018-09-26T04:15:20.676Z|00005|reconnect|ERR|tcp:127.0.0.1:45928: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:15:20.677Z|00006|reconnect|ERR|tcp:127.0.0.1:45930: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>>
>>>
>>> Open vSwitch is a project with multiple daemons. Since you are using
>>> OpenStack, it is not clear from your message, what type of networking
>>> plugin you are using. Do you use OVN?
>>> Also, you did not mention from which file you have gotten the above
>>> errors.
>>>
>>> Generally, the above indicates that a daemon fronting a Open vSwitch
>>> database hasn't been able to connect to its client. Usually happens when
>>> CPU consumption is very high.
>>>
>>>
>>>
>>>> 2018-09-26T04:15:30.409Z|00007|reconnect|ERR|tcp:127.0.0.1:45874: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:15:33.661Z|00008|reconnect|ERR|tcp:127.0.0.1:45934: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:15:33.847Z|00009|reconnect|ERR|tcp:127.0.0.1:45894: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:16:03.247Z|00010|reconnect|ERR|tcp:127.0.0.1:45958: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:16:21.534Z|00011|reconnect|ERR|tcp:127.0.0.1:45956: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:16:21.786Z|00012|reconnect|ERR|tcp:127.0.0.1:45974: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:16:47.085Z|00013|reconnect|ERR|tcp:127.0.0.1:45988: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:16:49.618Z|00014|reconnect|ERR|tcp:127.0.0.1:45982: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:16:53.321Z|00015|reconnect|ERR|tcp:127.0.0.1:45964: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:17:15.543Z|00016|reconnect|ERR|tcp:127.0.0.1:45986: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:17:24.767Z|00017|reconnect|ERR|tcp:127.0.0.1:45990: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:17:31.735Z|00018|reconnect|ERR|tcp:127.0.0.1:45998: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:20:12.593Z|00019|reconnect|ERR|tcp:127.0.0.1:46014: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:23:51.996Z|00020|reconnect|ERR|tcp:127.0.0.1:46028: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:25:12.187Z|00021|reconnect|ERR|tcp:127.0.0.1:46022: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:25:28.871Z|00022|reconnect|ERR|tcp:127.0.0.1:46056: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:27:11.663Z|00023|reconnect|ERR|tcp:127.0.0.1:46046: no
>>>> response to inactivity probe after 5 seconds, disconnecting
>>>> 2018-09-26T04:29:56.161Z|00024|jsonrpc|WARN|tcp:127.0.0.1:46018:
>>>> receive error: Connection reset by peer
>>>> 2018-09-26T04:29:56.161Z|00025|reconnect|WARN|tcp:127.0.0.1:46018:
>>>> connection dropped (Connection reset by peer)
>>>>
>>>> This definitely kills the connection for a few seconds before it
>>>> reconnects. So, I’ve been wondering, what is this probe and what is really
>>>> happening here? What’s the cause and is there a way to fix this?
>>>>
>>>> Openvswitch version is 2.9.0-3 on CentOS 7 with Openstack Pike running
>>>> on it (but the issues show up on Queens too).
>>>>
>>>>
>>>> Jean-Philippe Méthot
>>>> Openstack system administrator
>>>> Administrateur système Openstack
>>>> PlanetHoster inc.
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list
>>>> [email protected]
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>
>>>
>>>
>>
>

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] "ovs|01253|reconnect|ERR|tcp:127.0.0.1:50814: no response to inactivity probe after 5.01 seconds, disconnecting" messages and lost packets

Reply via email to