Reviewed: https://review.opendev.org/c/openstack/neutron/+/869770 Committed: https://opendev.org/openstack/neutron/commit/ed048638f4ef6a1799a1dcbd8819627b3fa8e75e Submitter: "Zuul (22348)" Branch: master
commit ed048638f4ef6a1799a1dcbd8819627b3fa8e75e Author: Brian Haley <haleyb....@gmail.com> Date: Tue Jan 10 16:40:29 2023 -0500 Add text to WSGI config guide for debugging Add "broken pipe" example error message to WSGI guide in rpc_workers section, could help diagnose agent issues. Trivialfix Change-Id: I6e95614bce42124f125be4c23fff0d923ea9ebcc Closes-bug: #1855919 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1855919 Title: broken pipe errors cause neutron metadata agent to fail Status in neutron: Fix Released Bug description: After we increased computes to 200, we started seeing "broken pipe" errors in neutron-metadata-agent.log on the controllers. After a neutron restart the errors are reduced, then they increase until the log is mostly errors, and the neutron metadata service fails, and VMs cannot boot. Another symptom is that unacked RMQ messages build up in the q-plugin queue. This is the first error we see; this one occurs as the server is starting: 2019-12-10 10:56:01.942 1838536 INFO eventlet.wsgi.server [-] (1838536) wsgi starting up on http:/var/lib/neutron/metadata_proxy 2019-12-10 10:56:01.943 1838538 INFO eventlet.wsgi.server [-] (1838538) wsgi starting up on http:/var/lib/neutron/metadata_proxy 2019-12-10 10:56:01.945 1838539 INFO eventlet.wsgi.server [-] (1838539) wsgi starting up on http:/var/lib/neutron/metadata_proxy 2019-12-10 10:56:21.138 1838538 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe 2019-12-10 10:56:21.138 1838538 INFO eventlet.wsgi.server [-] 10.195.74.25,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 0 time: 19.0296111 2019-12-10 10:56:25.059 1838516 INFO eventlet.wsgi.server [-] 10.195.74.28,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.2840948 2019-12-10 10:56:25.181 1838529 INFO eventlet.wsgi.server [-] 10.195.74.68,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.2695429 2019-12-10 10:56:25.259 1838518 INFO eventlet.wsgi.server [-] 10.195.74.28,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.1980510 Then we see some "call queues" warnings and the threshold increases to 40: 2019-12-10 10:56:31.414 1838515 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11, greater than warning threshold: 10. There could be a leak. Increasing threshold to: 20 Next we see RPC timeout errors: 2019-12-10 10:57:02.043 1838520 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11, greater than warning threshold: 10. There could be a leak. Increasing threshold to: 20 2019-12-10 10:57:02.059 1838534 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 37 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 1ed3e021607e466f8b9b84cd3b05b188 2019-12-10 10:57:02.059 1838534 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 1ed3e021607e466f8b9b84cd3b05b188 2019-12-10 10:57:02.285 1838521 INFO eventlet.wsgi.server [-] 10.195.74.27,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.7959940 2019-12-10 10:57:16.215 1838531 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 21, greater than warning threshold: 20. There could be a leak. Increasing threshold to: 40 2019-12-10 10:57:17.339 1838539 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11, greater than warning threshold: 10. There could be a leak. Increasing threshold to: 20 2019-12-10 10:57:24.838 1838524 INFO eventlet.wsgi.server [-] 10.195.73.242,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.6842020 2019-12-10 10:57:24.882 1838524 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 3 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e 2019-12-10 10:57:24.883 1838524 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e 2019-12-10 10:57:24.887 1838525 INFO eventlet.wsgi.server [-] 10.195.74.26,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.9827850 2019-12-10 10:57:24.903 1838518 INFO eventlet.wsgi.server [-] 10.195.74.43,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 3.5630379 2019-12-10 10:57:25.045 1838529 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 21 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID b38361bf9906482b8b24c5b534a6652b 2019-12-10 10:57:25.046 1838529 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID b38361bf9906482b8b24c5b534a6652b 2019-12-10 10:57:25.055 1838537 INFO eventlet.wsgi.server [-] 10.195.73.247,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.7542410 2019-12-10 10:57:25.119 1838523 INFO eventlet.wsgi.server [-] 10.195.74.2,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.7057869 2019-12-10 10:57:25.185 1838524 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 47 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID f1a268d937f94def97bd238916715744 2019-12-10 10:57:25.261 1838529 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 26 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 93c31cf4f5d34bd1a5ba90165e89cb79 2019-12-10 10:57:25.284 1838536 INFO eventlet.wsgi.server [-] 10.195.73.207,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.4315739 2019-12-10 10:57:25.319 1838520 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 50 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 37c1b168536e4c70b522c330209b11ec 2019-12-10 10:57:25.319 1838520 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 37c1b168536e4c70b522c330209b11ec 2019-12-10 10:57:25.374 1838530 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 30 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID fb837fc73c664209bfbada0fb32886ad 2019-12-10 10:57:25.375 1838530 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID fb837fc73c664209bfbada0fb32886ad 2019-12-10 10:57:25.388 1838526 INFO eventlet.wsgi.server [-] 10.195.65.7,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 3.5798080 2019-12-10 10:57:25.446 1838520 INFO eventlet.wsgi.server [-] 10.195.74.104,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 3.6868739 2019-12-10 10:57:25.448 1838528 INFO eventlet.wsgi.server [-] 10.195.74.202,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 3.7513518 2019-12-10 10:57:25.452 1838519 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 21, greater than warning threshold: 20. There could be a leak. Increasing threshold to: 40 2019-12-10 10:57:25.504 1838535 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 15 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 7b677a7d40274b0ea22510dcf3865cf6 2019-12-10 10:57:25.505 1838535 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 7b677a7d40274b0ea22510dcf3865cf6 2019-12-10 10:57:25.609 1838539 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 20 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 378f11ce14334be38ffaa95ec3fc26f2 2019-12-10 10:57:25.610 1838539 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 378f11ce14334be38ffaa95ec3fc26f2 2019-12-10 10:57:25.661 1838524 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 28 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 0c911a0ac95f42209cfa8b265d4d5c3d 2019-12-10 10:57:25.787 1838525 INFO eventlet.wsgi.server [-] 10.195.74.86,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.7191069 2019-12-10 10:57:25.831 1838522 INFO eventlet.wsgi.server [-] 10.195.64.185,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.5980189 2019-12-10 10:57:25.837 1838532 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 51 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 9a7cfd81ba714d2680aa223ba96798f0 2019-12-10 10:57:25.837 1838532 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 9a7cfd81ba714d2680aa223ba96798f0 2019-12-10 10:57:25.903 1838536 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 28 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID da2628209cde4562bf47dc0bdfecbf1d 2019-12-10 10:57:25.904 1838536 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID da2628209cde4562bf47dc0bdfecbf1d 2019-12-10 10:57:25.914 1838521 INFO eventlet.wsgi.server [-] 10.195.74.44,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 3.8841410 2019-12-10 10:57:25.936 1838524 INFO eventlet.wsgi.server [-] 10.195.73.231,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.3305228 A minute or two after starting the server we get more errors. At this point VMs are unable to build. If I try to pull metadata from a VM get a 503 or 504 and nothing is logged in neutron-metadata-agent.log. Haproxy logs the 503/504 response. albertb@<html><body><h1>503:~ $ curl -s http://169.254.169.254/2009-04-04/meta-data/hostname <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> Now the log is almost all errors: 2019-12-10 10:57:27.666 1838530 INFO eventlet.wsgi.server [-] 10.195.73.174,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 3.7101889 2019-12-10 10:57:27.719 1838537 INFO eventlet.wsgi.server [-] 10.195.65.6,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.3497119 2019-12-10 10:57:27.720 1838525 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 60 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 138877a7326e40f38de23b05fb97127a 2019-12-10 10:57:27.720 1838525 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 138877a7326e40f38de23b05fb97127a 2019-12-10 10:57:27.741 1838523 INFO eventlet.wsgi.server [-] 10.195.74.86,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 4.7329929 2019-12-10 10:57:27.820 1838525 INFO eventlet.wsgi.server [-] 10.195.73.206,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 1.4146030 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent [-] Unexpected error.: MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent Traceback (most recent call last): 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 89, in __call__ 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent instance_id, tenant_id = self._get_instance_and_tenant_id(req) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 162, in _get_instance_and_tenant_id 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent ports = self._get_ports(remote_address, network_id, router_id) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 155, in _get_ports 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent return self._get_ports_for_remote_address(remote_address, networks) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/common/cache_utils.py", line 116, in __call__ 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent return self.func(target_self, *args, **kwargs) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 137, in _get_ports_for_remote_address 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent ip_address=remote_address) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 106, in _get_ports_from_server 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent return self.plugin_rpc.get_ports(self.context, filters) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/metadata/agent.py", line 72, in get_ports 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent return cctxt.call(context, 'get_ports', filters=filters) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/common/rpc.py", line 173, in call 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent time.sleep(wait) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent self.force_reraise() 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent six.reraise(self.type_, self.value, self.tb) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/neutron/common/rpc.py", line 150, in call 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent return self._original_context.call(ctxt, method, **kwargs) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 179, in call 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent retry=self.retry) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 133, in _send 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent retry=retry) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent call_monitor_timeout, retry=retry) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 573, in _send 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent call_monitor_timeout) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in wait 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent message = self.waiters.get(msg_id, timeout=timeout) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 336, in get 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent 'to message ID %s' % msg_id) 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e 2019-12-10 10:57:27.824 1838524 ERROR neutron.agent.metadata.agent 2019-12-10 10:57:27.828 1838524 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe 2019-12-10 10:57:27.828 1838524 INFO eventlet.wsgi.server [-] 10.195.73.248,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 500 len: 0 time: 63.0060959 2019-12-10 10:57:27.873 1838528 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 41 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 7d73358e2fe841e4a4b818395e2e5b2d 2019-12-10 10:57:27.877 1838524 INFO eventlet.wsgi.server [-] 10.195.73.238,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 4.4559531 2019-12-10 10:57:27.921 1838538 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 6 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 297db0c16653413cabc868027f9e6abb 2019-12-10 10:57:27.921 1838538 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 297db0c16653413cabc868027f9e6abb 2019-12-10 10:57:27.967 1838520 INFO eventlet.wsgi.server [-] 10.195.74.29,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.4040241 2019-12-10 10:57:28.006 1838517 INFO eventlet.wsgi.server [-] 10.195.74.202,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 3.6681471 2019-12-10 10:57:28.026 1838522 INFO eventlet.wsgi.server [-] 10.195.74.202,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.3529530 2019-12-10 10:57:28.058 1838519 INFO eventlet.wsgi.server [-] 10.195.74.121,<local> "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 3.5390451 To reproduce this issue: Build openstack cluster on Rocky and add 200 computes. 3 controllers with 48 CPU Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz and 92G RAM. This bug seems severe to us. It is ruining our production cluster and we cannot build VMs. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1855919/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp