[Yahoo-eng-team] [Bug 1855902] Re: Inefficient Security Group listing
*** This bug is a duplicate of bug 1830679 *** https://bugs.launchpad.net/bugs/1830679 This was fixed in https://review.opendev.org/#/c/665566/ and backported to stable/stein (15.0.0), it wasn't backported further. Closing as a duplicate of https://bugs.launchpad.net/neutron/+bug/1830679 ** This bug has been marked a duplicate of bug 1830679 Security groups RBAC cause a major performance degradation -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1855902 Title: Inefficient Security Group listing Status in neutron: New Bug description: Issue: Fetching a large Security Group list takes relatively long as several database queries are made for each Security Group. Context: Listing SG's takes around 9 seconds with ~500 existing SG's, 16 seconds with ~1000 SG's and around 30 seconds with ~1500 existing SG's, so this time seems to grow at least linearly with number of SG's. We've looked at flamegraphs of the neutron controller which show that the stack frame `/usr/lib/python2.7/site- packages/neutron/db/securitygroups_db.py:get_security_groups:166` splits into two long running functions, each taking about half of the time (one at line 112 and the other at 115). ```python 103 @classmethod 104 def get_objects(cls, context, _pager=None, validate_filters=True, 105 **kwargs): 106 # We want to get the policy regardless of its tenant id. We'll make 107 # sure the tenant has permission to access the policy later on. 108 admin_context = context.elevated() 109 with cls.db_context_reader(admin_context): 110 objs = super(RbacNeutronDbObjectMixin, 111 cls).get_objects(admin_context, _pager, 112 validate_filters, **kwargs) 113 result = [] 114 for obj in objs: 115 if not cls.is_accessible(context, obj): 116 continue 117 result.append(obj) 118 return result ``` We've also seen that the number of database queries also seems to grow linearly: * Listing ~500 SG's performs ~2100 queries * Listing ~1000 SG's performs ~3500 queries * Listing ~1500 SG's performs ~5200 queries This does not scale well, we're expecting a neglectable increase in listing time. Reproduction: * Create 1000 SG's * Execute `time openstack security group list` * Create 500 more SG's * Execute `time openstack security group list` Version: We're using neutron 14.0.2-1 on CentOS 7.7.1908. Perceived Severity: MEDIUM To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1855902/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1855945] [NEW] Network Config Version 2 Device Configuration ID used as interface name when set-name is not specified
Public bug reported: With Cloud-Init Network Config Version 2, the name of Device Configuration ID object is overwriting the ethernet interface device name when set-name is not specified. Example - Version 2 metadata: instance-id: "management-cluster-controlplane-0" network: version: 2 ethernets: id0: match: macaddress: "00:50:56:a5:1a:78" When 'set-name' is not defined within the version 2 config, then cloud- init network_state.py retrieves the name of the ethernet dict. In the above case it would be "id0": network_state.py code [https://github.com/canonical/cloud-init/blob/ec6924ea1d321cc87e7414bee7734074590045b8/cloudinit/net/network_state.py#L645] for eth, cfg in command.items(): phy_cmd = { 'type': 'physical', 'name': cfg.get('set-name', eth), } See debug output where set-name is not specified and the ethernet device config id = id0: 2019-12-05 01:50:14,692 - network_state.py[DEBUG]: v2(ethernets) -> v1(physical): {'subnets': [{'dns_nameservers': ['10.10.10.10'], 'type': 'static', 'gateway': '10.7.7.254', 'address': '10.7.5.102/21'}], 'name': 'id0', 'mac_address': '00:50:56:a5:75:b3', 'type': 'physical', 'wakeonlan': True, 'match': {'macaddress': '00:50:56:a5:75:b3'}} Within CentOS the sysconfig renderer then later uses this Device Config ID for the /etc/sysconfig/network-scripts/ifcfg- name and the DEVICE= parameter. sysconfig.py code [https://github.com/canonical/cloud- init/blob/ec6924ea1d321cc87e7414bee7734074590045b8/cloudinit/net/sysconfig.py#L701-L702] cloud-init.log - without set-name configured: 2019-12-05 21:11:38,913 - stages.py[DEBUG]: Using distro class 2019-12-05 21:11:38,914 - __init__.py[DEBUG]: no interfaces to rename 2019-12-05 21:11:38,914 - __init__.py[DEBUG]: Datasource DataSourceVMwareGuestInfo not updated for events: System boot 2019-12-05 21:11:38,914 - stages.py[DEBUG]: No network config applied. Neither a new instance nor datasource network update on 'System boot' event 2019-12-05 21:11:38,914 - handlers.py[DEBUG]: start: init-network/setup-datasource: setting up datasource 2019-12-05 21:11:38,914 - DataSourceVMwareGuestInfo.py[INFO]: got host-info: {'network': {'interfaces': {'by-mac': OrderedDict([('00:50:56:a5:b2:b2', {'ipv6': [{'addr': 'fe80::250:56ff:fea5:b2b2%eth0', 'netmask': ':::::/64'}]})]), 'by-ipv4': OrderedDict(), 'by-ipv6': OrderedDict([('fe80::250:56ff:fea5:b2b2%eth0', {'netmask': ':::::/64', 'mac': '00:50:56:a5:b2:b2'})])}}, 'hostname': 'localhost', 'local-hostname': 'localhost'} cloud-init.log - with set-name: eth0 configured: 2019-12-05 21:57:19,179 - util.py[DEBUG]: Running command ['ip', '-6', 'addr', 'show', 'permanent', 'scope', 'global'] with allowed return codes [0] (shell=False, capture=True) 2019-12-05 21:57:19,190 - util.py[DEBUG]: Running command ['ip', '-4', 'addr', 'show'] with allowed return codes [0] (shell=False, capture=True) 2019-12-05 21:57:19,198 - __init__.py[DEBUG]: no work necessary for renaming of [['00:50:56:a5:b2:b2', 'eth0', 'vmxnet3', '0x07b0']] 2019-12-05 21:57:19,198 - stages.py[INFO]: Applying network configuration from system_cfg bringup=False: {'version': 2, 'ethernets': {'id0': {'match': {'macaddress': '00:50:56:a5:b2:b2'}, 'wakeonlan': True, 'set-name': 'eth0', 'dhcp4': False, 'dhcp6': False, 'addresses': ['10.7.5.101/21'], 'gateway4': '10.7.7.254', 'nameservers': {'addresses': ['10.10.10.10'] 2019-12-05 21:57:19,199 - __init__.py[WARNING]: apply_network_config is not currently implemented for distribution ''. Attempting to use apply_network 2019-12-05 21:57:19,199 - network_state.py[DEBUG]: v2(ethernets) -> v1(physical): {'type': 'physical', 'name': 'eth0', 'mac_address': '00:50:56:a5:b2:b2', 'match': {'macaddress': '00:50:56:a5:b2:b2'}, 'wakeonlan': True, 'subnets': [{'type': 'static', 'address': '10.7.5.101/21', 'gateway': '10.7.7.254', 'dns_nameservers': ['10.10.10.10']}]} 2019-12-05 21:57:19,206 - network_state.py[DEBUG]: v2_common: handling config: {'id0': {'match': {'macaddress': '00:50:56:a5:b2:b2'}, 'wakeonlan': True, 'set-name': 'eth0', 'dhcp4': False, 'dhcp6': False, 'addresses': ['10.7.5.101/21'], 'gateway4': '10.7.7.254', 'nameservers': {'addresses': ['10.10.10.10']}}} 2019-12-05 21:57:19,207 - photon.py[DEBUG]: Translated ubuntu style network settings # Converted from network_config for distro Implementation of _write_network_config is needed. auto lo iface lo inet loopback auto eth0 iface eth0 inet static hwaddress 00:50:56:a5:b2:b2 address 10.7.5.101/21 dns-nameservers 10.10.10.10 gateway 10.7.7.254 into {'lo': {'ipv6': {}, 'auto': True}, 'eth0': {'ipv6': {}, 'bootproto': 'static', 'address': '10.7.5.101', 'gateway': '10.7.7.254', 'netmask': '255.255.248.0', 'broadcast': '10.7.7.255', 'dns-nameservers': ['10.10.10.10'], 'auto': True}} 2019-12-05 21:57:19,208 - util.py[DEBUG]: Writing to
[Yahoo-eng-team] [Bug 1855934] [NEW] new versions of flake8 parse typeing coments
Public bug reported: while playing with pre-commit i notice that new versions of flake8 parse type annotion comments. if you have not imported the relevent typing module then it fails with F821 undefined name nova/virt/hardware.py:1396:5: F821 undefined name 'Optional' nova/virt/hardware.py:1396:5: F821 undefined name 'List' nova/virt/hardware.py:1396:5: F821 undefined name 'Set' nova/virt/hardware.py:1426:5: F821 undefined name 'Optional' nova/virt/hardware.py:1426:5: F821 undefined name 'List' nova/virt/hardware.py:1426:5: F821 undefined name 'Set' nova/virt/hardware.py:1456:5: F821 undefined name 'Optional' nova/virt/hardware.py:1483:5: F821 undefined name 'Optional' nova/virt/hardware.py:1525:5: F821 undefined name 'Optional' nova/virt/hardware.py:1624:5: F821 undefined name 'Tuple' nova/virt/hardware.py:1646:5: F821 undefined name 'Optional' nova/virt/hardware.py:1658:5: F821 undefined name 'Optional' nova/virt/hardware.py:1674:5: F821 undefined name 'List' nova/virt/hardware.py:1696:5: F821 undefined name 'Optional' nova/virt/hardware.py:1920:29: F821 undefined name 'List' nova/virt/hardware.py:1939:31: F821 undefined name 'Set' while this is not an issue today because we pin to an old version of flake8 we should still fix this just as a code hygiene issue. given this has no impact on the running code im going to triage this as low an push a trivial patch. ** Affects: nova Importance: Low Assignee: sean mooney (sean-k-mooney) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1855934 Title: new versions of flake8 parse typeing coments Status in OpenStack Compute (nova): New Bug description: while playing with pre-commit i notice that new versions of flake8 parse type annotion comments. if you have not imported the relevent typing module then it fails with F821 undefined name nova/virt/hardware.py:1396:5: F821 undefined name 'Optional' nova/virt/hardware.py:1396:5: F821 undefined name 'List' nova/virt/hardware.py:1396:5: F821 undefined name 'Set' nova/virt/hardware.py:1426:5: F821 undefined name 'Optional' nova/virt/hardware.py:1426:5: F821 undefined name 'List' nova/virt/hardware.py:1426:5: F821 undefined name 'Set' nova/virt/hardware.py:1456:5: F821 undefined name 'Optional' nova/virt/hardware.py:1483:5: F821 undefined name 'Optional' nova/virt/hardware.py:1525:5: F821 undefined name 'Optional' nova/virt/hardware.py:1624:5: F821 undefined name 'Tuple' nova/virt/hardware.py:1646:5: F821 undefined name 'Optional' nova/virt/hardware.py:1658:5: F821 undefined name 'Optional' nova/virt/hardware.py:1674:5: F821 undefined name 'List' nova/virt/hardware.py:1696:5: F821 undefined name 'Optional' nova/virt/hardware.py:1920:29: F821 undefined name 'List' nova/virt/hardware.py:1939:31: F821 undefined name 'Set' while this is not an issue today because we pin to an old version of flake8 we should still fix this just as a code hygiene issue. given this has no impact on the running code im going to triage this as low an push a trivial patch. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1855934/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1855927] [NEW] _poll_unconfirmed_resizes may not retry later if confirm_resize fails in API
Public bug reported: This is based on code inspection but let's say I have configured my computes to set resize_confirm_window=3600 to automatically confirm a resized server after 1 hour. Within that hour, let's say the source compute service is down. The periodic task gets the unconfirmed migrations with status='finished' which have been updated some time older than the given configurable window: https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/manager.py#L8793 https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/db/sqlalchemy/api.py#L4342 The periodic task then calls the compute API code to confirm the resize: https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7160 which changes the migration status to 'confirming': https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3684 And casts off to the source compute: https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/rpcapi.py#L600 Now if the source compute is down and that fails, the compute manager task code will handle it and say it will retry later: https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7163 However, because the migration status was changed from 'finished' to 'confirming' the task will not retry because it won't find the migration given the DB query. And trying to confirm the resize via the API will fail as well because we'll get MigrationNotFoundByStatus since the migration status is no longer 'finished': https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3681 The compute manager code should probably mark the migration status as 'finished' again if it's really going to try later, or mark the migration status as 'error'. Note that the confirm_resize method in the compute manager doesn't mark the migration status as 'error' if something fails there either: https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L3807 ** Affects: nova Importance: Low Status: New ** Tags: error-handling migrate resize -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1855927 Title: _poll_unconfirmed_resizes may not retry later if confirm_resize fails in API Status in OpenStack Compute (nova): New Bug description: This is based on code inspection but let's say I have configured my computes to set resize_confirm_window=3600 to automatically confirm a resized server after 1 hour. Within that hour, let's say the source compute service is down. The periodic task gets the unconfirmed migrations with status='finished' which have been updated some time older than the given configurable window: https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/manager.py#L8793 https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/db/sqlalchemy/api.py#L4342 The periodic task then calls the compute API code to confirm the resize: https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7160 which changes the migration status to 'confirming': https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3684 And casts off to the source compute: https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/rpcapi.py#L600 Now if the source compute is down and that fails, the compute manager task code will handle it and say it will retry later: https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7163 However, because the migration status was changed from 'finished' to 'confirming' the task will not retry because it won't find the migration given the DB query. And trying to confirm the resize via the API will fail as well because we'll get MigrationNotFoundByStatus since the migration status is no longer 'finished': https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3681 The compute manager code should probably mark the migration status as 'finished' again if it's really going to try later, or mark the migration status as 'error'. Note that the confirm_resize method in the compute manager doesn't mark the migration status as 'error' if something fails there either: https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L3807 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1855927/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1855919] [NEW] roken pipe erros cause neutron metadata agent to fail
Public bug reported: After we increased computes to 200, we started seeing "broken pipe" errors in neutron-metadata-agent.log on the controllers. After a neutron restart the errors are reduced, then they increase until the log is mostly errors, and the neutron metadata service fails, and VMs cannot boot. Another symptom is that unacked RMQ messages build up in the q-plugin queue. This is the first error we see; this one occurs as the server is starting: 2019-12-10 10:56:01.942 1838536 INFO eventlet.wsgi.server [-] (1838536) wsgi starting up on http:/var/lib/neutron/metadata_proxy 2019-12-10 10:56:01.943 1838538 INFO eventlet.wsgi.server [-] (1838538) wsgi starting up on http:/var/lib/neutron/metadata_proxy 2019-12-10 10:56:01.945 1838539 INFO eventlet.wsgi.server [-] (1838539) wsgi starting up on http:/var/lib/neutron/metadata_proxy 2019-12-10 10:56:21.138 1838538 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, in _send_loop return send_method(data, *args) error: [Errno 32] Broken pipe 2019-12-10 10:56:21.138 1838538 INFO eventlet.wsgi.server [-] 10.195.74.25, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 0 time: 19.0296111 2019-12-10 10:56:25.059 1838516 INFO eventlet.wsgi.server [-] 10.195.74.28, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.2840948 2019-12-10 10:56:25.181 1838529 INFO eventlet.wsgi.server [-] 10.195.74.68, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.2695429 2019-12-10 10:56:25.259 1838518 INFO eventlet.wsgi.server [-] 10.195.74.28, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.1980510 Then we see some "call queues" warnings and the threshold increases to 40: 2019-12-10 10:56:31.414 1838515 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11, greater than warning threshold: 10. There could be a leak. Increasing threshold to: 20 Next we see RPC timeout errors: 2019-12-10 10:57:02.043 1838520 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11, greater than warning threshold: 10. There could be a leak. Increasing threshold to: 20 2019-12-10 10:57:02.059 1838534 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 37 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 1ed3e021607e466f8b9b84cd3b05b188 2019-12-10 10:57:02.059 1838534 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 1ed3e021607e466f8b9b84cd3b05b188 2019-12-10 10:57:02.285 1838521 INFO eventlet.wsgi.server [-] 10.195.74.27, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.7959940 2019-12-10 10:57:16.215 1838531 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 21, greater than warning threshold: 20. There could be a leak. Increasing threshold to: 40 2019-12-10 10:57:17.339 1838539 WARNING oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11, greater than warning threshold: 10. There could be a leak. Increasing threshold to: 20 2019-12-10 10:57:24.838 1838524 INFO eventlet.wsgi.server [-] 10.195.73.242, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200 len: 146 time: 0.6842020 2019-12-10 10:57:24.882 1838524 ERROR neutron.common.rpc [-] Timeout in RPC method get_ports. Waiting for 3 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e 2019-12-10 10:57:24.883 1838524 WARNING neutron.common.rpc [-] Increasing timeout for get_ports calls to 120 seconds. Restart the agent to restore it to the default value.: MessagingTimeout: Timed out waiting for a reply to message ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e 2019-12-10 10:57:24.887 1838525 INFO eventlet.wsgi.server [-] 10.195.74.26, "GET
[Yahoo-eng-team] [Bug 1855912] [NEW] MariaDB 10.1 fails during alembic migration
Public bug reported: New CI job running with MariaDB [1] fails during the alembic migration. According to [2] the problems seems to be solved in v10.2.2. LOG: https://b12f79f00ace923cb903-227be9d6f8442281010ef49b8394f34d.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master /neutron-tempest-mariadb-full/18fecee/job-output.txt SNIPPET: http://paste.openstack.org/show/787390/ [1] https://review.opendev.org/#/c/681202/ [2] https://laracasts.com/discuss/channels/general-discussion/specified-key-was-too-long-max-key-length-is-767-bytes-1 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1855912 Title: MariaDB 10.1 fails during alembic migration Status in neutron: New Bug description: New CI job running with MariaDB [1] fails during the alembic migration. According to [2] the problems seems to be solved in v10.2.2. LOG: https://b12f79f00ace923cb903-227be9d6f8442281010ef49b8394f34d.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master /neutron-tempest-mariadb-full/18fecee/job-output.txt SNIPPET: http://paste.openstack.org/show/787390/ [1] https://review.opendev.org/#/c/681202/ [2] https://laracasts.com/discuss/channels/general-discussion/specified-key-was-too-long-max-key-length-is-767-bytes-1 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1855912/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1839009] Re: os-server-external-events does not behave correctly for failed single events
*** This bug is a duplicate of bug 1855752 *** https://bugs.launchpad.net/bugs/1855752 Sorry, I didn't know about this bug when we opened 1855752. The issue has been fixed under that bug. ** This bug has been marked a duplicate of bug 1855752 Inappropriate HTTP error status from os-server-external-events -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839009 Title: os-server-external-events does not behave correctly for failed single events Status in OpenStack Compute (nova): New Bug description: The "os-server-external-events" API does not behave correctly when the request body contains a list of one event and if that event ends up in a non-200 state, i.e if the event ends up in either 400 or 404 or 422 states, the function executes all the way to L147 (https://github.com/openstack/nova/blob/433b1662e48db57aaa42e11756fa4a6d8722b386/nova/api/openstack/compute/server_external_events.py#L147) and overall returns a 404 HTTP response without any body. This is wrong since as per the documentation it should return the respective codes (422/404/400) to the client. Infact correctly speaking, if out of the list of provided events, if at least one of them doesn't get into the "accepted_events" list, rest of them are discarded without returning the correct response against each event. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839009/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1855902] [NEW] Inefficient Security Group listing
Public bug reported: Issue: Fetching a large Security Group list takes relatively long as several database queries are made for each Security Group. Context: Listing SG's takes around 9 seconds with ~500 existing SG's, 16 seconds with ~1000 SG's and around 30 seconds with ~1500 existing SG's, so this time seems to grow at least linearly with number of SG's. We've looked at flamegraphs of the neutron controller which show that the stack frame `/usr/lib/python2.7/site- packages/neutron/db/securitygroups_db.py:get_security_groups:166` splits into two long running functions, each taking about half of the time (one at line 112 and the other at 115). ```python 103 @classmethod 104 def get_objects(cls, context, _pager=None, validate_filters=True, 105 **kwargs): 106 # We want to get the policy regardless of its tenant id. We'll make 107 # sure the tenant has permission to access the policy later on. 108 admin_context = context.elevated() 109 with cls.db_context_reader(admin_context): 110 objs = super(RbacNeutronDbObjectMixin, 111 cls).get_objects(admin_context, _pager, 112 validate_filters, **kwargs) 113 result = [] 114 for obj in objs: 115 if not cls.is_accessible(context, obj): 116 continue 117 result.append(obj) 118 return result ``` We've also seen that the number of database queries also seems to grow linearly: * Listing ~500 SG's performs ~2100 queries * Listing ~1000 SG's performs ~3500 queries * Listing ~1500 SG's performs ~5200 queries This does not scale well, we're expecting a neglectable increase in listing time. Reproduction: * Create 1000 SG's * Execute `time openstack security group list` * Create 500 more SG's * Execute `time openstack security group list` Version: We're using neutron 14.0.2-1 on CentOS 7.7.1908. Perceived Severity: MEDIUM ** Affects: neutron Importance: Undecided Status: New ** Tags: group list security time -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1855902 Title: Inefficient Security Group listing Status in neutron: New Bug description: Issue: Fetching a large Security Group list takes relatively long as several database queries are made for each Security Group. Context: Listing SG's takes around 9 seconds with ~500 existing SG's, 16 seconds with ~1000 SG's and around 30 seconds with ~1500 existing SG's, so this time seems to grow at least linearly with number of SG's. We've looked at flamegraphs of the neutron controller which show that the stack frame `/usr/lib/python2.7/site- packages/neutron/db/securitygroups_db.py:get_security_groups:166` splits into two long running functions, each taking about half of the time (one at line 112 and the other at 115). ```python 103 @classmethod 104 def get_objects(cls, context, _pager=None, validate_filters=True, 105 **kwargs): 106 # We want to get the policy regardless of its tenant id. We'll make 107 # sure the tenant has permission to access the policy later on. 108 admin_context = context.elevated() 109 with cls.db_context_reader(admin_context): 110 objs = super(RbacNeutronDbObjectMixin, 111 cls).get_objects(admin_context, _pager, 112 validate_filters, **kwargs) 113 result = [] 114 for obj in objs: 115 if not cls.is_accessible(context, obj): 116 continue 117 result.append(obj) 118 return result ``` We've also seen that the number of database queries also seems to grow linearly: * Listing ~500 SG's performs ~2100 queries * Listing ~1000 SG's performs ~3500 queries * Listing ~1500 SG's performs ~5200 queries This does not scale well, we're expecting a neglectable increase in listing time. Reproduction: * Create 1000 SG's * Execute `time openstack security group list` * Create 500 more SG's * Execute `time openstack security group list` Version: We're using neutron 14.0.2-1 on CentOS 7.7.1908. Perceived Severity: MEDIUM To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1855902/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1855875] Re: When create a new server instance this error occurred.
This looks like a configuration issue. What is the value of your transport_url config option in both the nova config and cell_mappings table? ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1855875 Title: When create a new server instance this error occurred. Status in OpenStack Compute (nova): Invalid Bug description: 2019-12-10 17:17:31.679 31059 ERROR oslo.messaging._drivers.impl_rabbit [req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba b52c7945d86f428e8cf16a4d886f1f9a - default default] Failed to publish message to topic 'nova': 'NoneType' object has no attribute '__getitem__' 2019-12-10 17:17:31.679 31059 ERROR oslo.messaging._drivers.impl_rabbit [req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba b52c7945d86f428e8cf16a4d886f1f9a - default default] Unable to connect to AMQP server on 192.168.0.204:5672 after inf tries: 'NoneType' object has no attribute '__getitem__' 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi [req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba b52c7945d86f428e8cf16a4d886f1f9a - default default] Unexpected exception in API method: MessageDeliveryFailure: Unable to connect to AMQP server on 192.168.0.204:5672 after inf tries: 'NoneType' object has no attribute '__getitem__' 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 671, in wrapped 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return f(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 686, in create 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi **create_kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File
[Yahoo-eng-team] [Bug 1855888] [NEW] ovs-offload with vxlan is broken due to adding skb mark
Public bug reported: The following patch [1] add use of egress_pkt_mark which is not support with ovs hardware offload. This cause regression break in openstack when using ovs hardware offload when using vxlan [1] - https://review.opendev.org/#/c/675054/ ** Affects: neutron Importance: High Assignee: Moshe Levi (moshele) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1855888 Title: ovs-offload with vxlan is broken due to adding skb mark Status in neutron: In Progress Bug description: The following patch [1] add use of egress_pkt_mark which is not support with ovs hardware offload. This cause regression break in openstack when using ovs hardware offload when using vxlan [1] - https://review.opendev.org/#/c/675054/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1855888/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1855883] [NEW] can not migrate serve on aarch64
Public bug reported: Description === We setup OpenStack env on aarch64 KylinOS. Failed to live migrate instance cause of 'This operating system kernel does not support vITS migration'. Steps to reproduce == 1. Setup OpenStack on aarch64 servers with openstack-helm 2. Live migrate instance from compute02 to compute03 Expected result === Success, instance locate on compute03 Actual result = Failed, instnace locate on compute02 Environment === 1. Exact version of OpenStack you are running. See the following stable/rocky # apt list --installed |egrep "libvirt|qemu" ipxe-qemu/now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~cloud0 all [installed,local] ipxe-qemu-256k-compat-efi-roms/now 1.0.0+git-20150424.a25a16d-0ubuntu2~cloud0 all [installed,local] libvirt-bin/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] libvirt-clients/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] libvirt-daemon/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] libvirt-daemon-system/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] libvirt0/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] qemu/now 1:2.11+dfsg-1ubuntu7.15~cloud1 arm64 [installed,local] # uname -a Linux compute03 4.4.131-20190726.kylin.server-generic #kylin SMP Tue Jul 30 16:44:09 CST 2019 aarch64 aarch64 aarch64 GNU/Linux 2. Which hypervisor did you use? libvirt+kvm Logs & Configs == nova-compute: File "/var/lib/openstack/local/lib/python2.7/site-packages/libvirt.py", line 1745, in migrateToURI3 if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirtError: internal error: unable to execute QEMU command 'migrate': This operating system kernel does not support vITS migration libvirt: 2019-12-07 05:34:34.820+: 57546: error : qemuMonitorJSONCheckError:392 : internal error: unable to execute QEMU command 'migrate': This operating system kernel does not support vITS migration 2019-12-07 05:34:35.226+: 57546: error : virNetClientProgramDispatchError:177 : internal error: qemu unexpectedly closed the monitor: 2019-12-07T05:34:29.355638Z qemu-system-aarch64: Not a migration stream 2019-12-07T05:34:29.355781Z qemu-system-aarch64: load of migration failed: Invalid argument ** Affects: nova Importance: Undecided Assignee: Eric Xie (eric-xie) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1855883 Title: can not migrate serve on aarch64 Status in OpenStack Compute (nova): New Bug description: Description === We setup OpenStack env on aarch64 KylinOS. Failed to live migrate instance cause of 'This operating system kernel does not support vITS migration'. Steps to reproduce == 1. Setup OpenStack on aarch64 servers with openstack-helm 2. Live migrate instance from compute02 to compute03 Expected result === Success, instance locate on compute03 Actual result = Failed, instnace locate on compute02 Environment === 1. Exact version of OpenStack you are running. See the following stable/rocky # apt list --installed |egrep "libvirt|qemu" ipxe-qemu/now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~cloud0 all [installed,local] ipxe-qemu-256k-compat-efi-roms/now 1.0.0+git-20150424.a25a16d-0ubuntu2~cloud0 all [installed,local] libvirt-bin/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] libvirt-clients/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] libvirt-daemon/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] libvirt-daemon-system/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] libvirt0/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local] qemu/now 1:2.11+dfsg-1ubuntu7.15~cloud1 arm64 [installed,local] # uname -a Linux compute03 4.4.131-20190726.kylin.server-generic #kylin SMP Tue Jul 30 16:44:09 CST 2019 aarch64 aarch64 aarch64 GNU/Linux 2. Which hypervisor did you use? libvirt+kvm Logs & Configs == nova-compute: File "/var/lib/openstack/local/lib/python2.7/site-packages/libvirt.py", line 1745, in migrateToURI3 if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self) libvirtError: internal error: unable to execute QEMU command 'migrate': This operating system kernel does not support vITS migration libvirt: 2019-12-07 05:34:34.820+: 57546: error : qemuMonitorJSONCheckError:392 : internal error: unable to execute QEMU command 'migrate': This operating system kernel does not support vITS migration 2019-12-07 05:34:35.226+: 57546: error : virNetClientProgramDispatchError:177 : internal error: qemu unexpectedly closed the monitor: 2019-12-07T05:34:29.355638Z qemu-system-aarch64: Not a migration stream 2019-12-07T05:34:29.355781Z qemu-system-aarch64: load of migration failed:
[Yahoo-eng-team] [Bug 1804502] Re: Rebuild server with NUMATopologyFilter enabled fails (in some cases)
Reviewed: https://review.opendev.org/689861 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3f9411071d4c1a04ab0b68fd635597bf6959c0ca Submitter: Zuul Branch:master commit 3f9411071d4c1a04ab0b68fd635597bf6959c0ca Author: Sean Mooney Date: Mon Oct 21 16:17:17 2019 + Disable NUMATopologyFilter on rebuild This change leverages the new NUMA constraint checking added in in I0322d872bdff68936033a6f5a54e8296a6fb3434 to allow the NUMATopologyFilter to be skipped on rebuild. As the new behavior of rebuild enfroces that no changes to the numa constraints are allowed on rebuild we no longer need to execute the NUMATopologyFilter. Previously the NUMATopologyFilter would process the rebuild request as if it was a request to spawn a new instnace as the numa_fit_instance_to_host function is not rebuild aware. As such prior to this change a rebuild would only succeed if a host had enough additional capacity for a second instance on the same host meeting the requirement of the new image and existing flavor. This behavior was incorrect on two counts as a rebuild uses a noop claim. First the resouce usage cannot change so it was incorrect to require the addtional capacity to rebuild an instance. Secondly it was incorrect not to assert the resouce usage remained the same. I0322d872bdff68936033a6f5a54e8296a6fb3434 adressed guarding the rebuild against altering the resouce usage and this change allows in place rebuild. This change found a latent bug that will be adressed in a follow up change and updated the functional tests to note the incorrect behavior. Change-Id: I48bccc4b9adcac3c7a3e42769c11fdeb8f6fd132 Closes-Bug: #1804502 Implements: blueprint inplace-rebuild-of-numa-instances ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1804502 Title: Rebuild server with NUMATopologyFilter enabled fails (in some cases) Status in OpenStack Compute (nova): Fix Released Bug description: Description === server rebuild will fail in nova scheduler on NUMATopologyFilter if the computes do not have enough capacity (even though clearly the running server is already accounted into that calculation) to resolve the issue a fix is required in NUMATopologyFilter to not perform the rebuild operation in the case that the request is due to rebuild. the result of such a case will be that server rebuild will fail with error of "no valid host found" (do not mix resize with rebuild functions...) Steps to reproduce == 1. create a flavor that contain metadata that will point to a specific compute (use host aggregate with same key:value metadata make sure flavor contain topology related metadata: hw:cpu_cores='1', hw:cpu_policy='dedicated', hw:cpu_sockets='6', hw:cpu_thread_policy='prefer', hw:cpu_threads='1', hw:mem_page_size='large', location='area51' 2. create a server on that compute (preferably using heat stack) 3. (try to) rebuild the server using stack update 4. issue reproduced Expected result === server in an active running state (if image was replaced in the rebuild command than with a reference to the new image in the server details. Actual result = server in error state with error of no valid host found. Message No valid host was found. There are not enough hosts available. Code 500 Details File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 966, in rebuild_instance return_alternates=False) File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 723, in _schedule_instances return_alternates=return_alternates) File "/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 907, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 53, in select_destinations instance_uuids, return_objects, return_alternates) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 42, in select_destinations instance_uuids, return_objects, return_alternates) File "/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 158, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, in call retry=self.retry) File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, in _send retry=retry) File
[Yahoo-eng-team] [Bug 1855752] Re: Inappropriate HTTP error status from os-server-external-events
Reviewed: https://review.opendev.org/698037 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e6f742544432d6066f1fba4666580919eb7859bd Submitter: Zuul Branch:master commit e6f742544432d6066f1fba4666580919eb7859bd Author: Eric Fried Date: Mon Dec 9 09:58:53 2019 -0600 Nix os-server-external-events 404 condition The POST /os-server-external-events API had the following confusing behavior: With multiple events in the payload, if *some* (but not all) were dropped, the HTTP response was 207, with per-event 4xx error codes in the payload. But if *all* of the events were dropped, the overall HTTP response was 404 with no payload. Thus, especially for consumers sending only one event at a time, it was impossible to distinguish e.g. "you tried to send an event for a nonexistent instance" from "the instance you specified hasn't landed on a host yet". This fix gets rid of that sweeping 404 condition, so if *any* subset of the events are dropped (including *all* of them), the HTTP response will always be 207, and the payload will always contain granular per-event error codes. This effectively means the API can no longer return 404, ever. Closes-Bug: #1855752 Change-Id: Ibad1b51e2cf50d00102295039b6e82bc00bec058 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1855752 Title: Inappropriate HTTP error status from os-server-external-events Status in OpenStack Compute (nova): Fix Released Bug description: The handling of os-server-external-events API [1] has a bug. It is designed to handle multiple events, with the following expected behavior: * If all events are successfully handled, it should return HTTP 200. * If no event is successfully handled, it should return HTTP 404. * If some are handled successfully but not all, it should return HTTP 207, with per-event status codes. However, when Cyborg sends a single event for a single instance, and that instance is not yet associated with a host [*], the 'else' clause in Line 137 [1] will set HTTP 207 as return code; but, since accepted_events is [] in Line 146, that will throw an exception and return 404. IOW, the expected return is 207 but the actual return is 404. This has been discussed in IRC [2]. A patch has been proposed [3] to address this. [*] This happens because Nova calls into Cyborg from the conductor to initiate binding of accelerator requests (ARQs), lets it proceed asynchronously, and waits for the binding notification event in the compute manager. The notification event could come before the compute manager has called self._rt.instance_claim(), which would associate the instance with a host and a node. That race condition triggers the behavior above. [1] https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/api/openstack/compute/server_external_events.py [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova /%23openstack-nova.2019-12-09.log.html#t2019-12-09T15:45:18 [3] https://review.opendev.org/#/c/698037/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1855752/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1855875] [NEW] When create a new server instance this error occurred.
Public bug reported: 2019-12-10 17:17:31.679 31059 ERROR oslo.messaging._drivers.impl_rabbit [req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba b52c7945d86f428e8cf16a4d886f1f9a - default default] Failed to publish message to topic 'nova': 'NoneType' object has no attribute '__getitem__' 2019-12-10 17:17:31.679 31059 ERROR oslo.messaging._drivers.impl_rabbit [req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba b52c7945d86f428e8cf16a4d886f1f9a - default default] Unable to connect to AMQP server on 192.168.0.204:5672 after inf tries: 'NoneType' object has no attribute '__getitem__' 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi [req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba b52c7945d86f428e8cf16a4d886f1f9a - default default] Unexpected exception in API method: MessageDeliveryFailure: Unable to connect to AMQP server on 192.168.0.204:5672 after inf tries: 'NoneType' object has no attribute '__getitem__' 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi Traceback (most recent call last): 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 671, in wrapped 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return f(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, in wrapper 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return func(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 686, in create 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi **create_kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/hooks.py", line 154, in inner 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi rv = f(*args, **kwargs) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1857, in create 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi supports_port_resource_request=supports_port_resource_request) 2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line
[Yahoo-eng-team] [Bug 1855869] [NEW] federation role mapping does not add users to groups
Public bug reported: I'm using AzureAD and keystone oidc mapping remote users into local groups does not work as expected. I'm using the auto generated domain for ephemeral cloud users, a remote attribute of OIDC_DEPARTMENT is used for mapping federated users to local groups, the groups and projects have been created in the default domain, users should inherit the roles of their mapped group or in other words "group based role based access". my expectation when following the docs for oidc or openid or mapped is that users inherit roles of their mapped groups how to reproduce 1 - create idp 2 - create protocol 3 - create mapping 4 - create project 5 - create group 6 - assign group to project 7 - assign roles to group in project WEB SSO is working and a certain amount of the mapping seems to be working, for example if I grant group access to a project, the federated user will be granted access to the project in horizon - but they won't inherit the roles of that group, i.e. they will not become group members in Horizon >> Identity >> Users (Select a federated User) >> Groups (no groups) In Horizon >> Identity >> Groups >> Members (no members) Is this intended? The federated users domain id is the auto generated federation domain, but I am mapping them into Default domain / project / group here is the mapping from oidc group to openstack group { "rules": [ { "local": [ { "group": { "domain": { "name": "Default" }, "name": "itdept" }, "user": { "name": "{0}", "email": "{1}" } } ], "remote": [ { "type": "HTTP_OIDC_EMAIL" }, { "type": "HTTP_OIDC_EMAIL" }, { "type": "HTTP_OIDC_DEPARTMENT", "any_one_of": [ "7050", "7051" ] } ] } There is nothing in the mapping regarding projects as I would not like to use such a mechanism for simple access to projects, but if I assign the local group to another project then I *can* switch to that project in horizon - but, I do not have the roles of the group, I have the member role only - I'm guessing because this is bestowed by default or by horizon. So in summary Configured a working SSO - users not being added to groups, seems to be ephemeral - Users do inherit groups projects, so project enrolment works as expected - User do not inherit groups roles on projects ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1855869 Title: federation role mapping does not add users to groups Status in OpenStack Identity (keystone): New Bug description: I'm using AzureAD and keystone oidc mapping remote users into local groups does not work as expected. I'm using the auto generated domain for ephemeral cloud users, a remote attribute of OIDC_DEPARTMENT is used for mapping federated users to local groups, the groups and projects have been created in the default domain, users should inherit the roles of their mapped group or in other words "group based role based access". my expectation when following the docs for oidc or openid or mapped is that users inherit roles of their mapped groups how to reproduce 1 - create idp 2 - create protocol 3 - create mapping 4 - create project 5 - create group 6 - assign group to project 7 - assign roles to group in project WEB SSO is working and a certain amount of the mapping seems to be working, for example if I grant group access to a project, the federated user will be granted access to the project in horizon - but they won't inherit the roles of that group, i.e. they will not become group members in Horizon >> Identity >> Users (Select a federated User) >> Groups (no groups) In Horizon >> Identity >> Groups >> Members (no members) Is this intended? The federated users domain id is the auto generated federation domain, but I am mapping them into Default domain / project / group here is the mapping from oidc group to openstack group { "rules": [ { "local": [ { "group": { "domain": { "name": "Default" }, "name": "itdept" }, "user": { "name": "{0}", "email": "{1}" } } ], "remote": [ { "type": "HTTP_OIDC_EMAIL" }, { "type": "HTTP_OIDC_EMAIL" }, { "type": "HTTP_OIDC_DEPARTMENT", "any_one_of": [ "7050", "7051" ] } ] } There is nothing in the mapping
[Yahoo-eng-team] [Bug 1855854] [NEW] [RFE] Dynamic DHCP allocation pool
Public bug reported: Neuton currently only support configuring the DHCP agent with static /fixed-address allocations. The DHCP client id (the client MAC address in neutron's case) is mapped to a specific IP address in the dhcp server configuration. No range of addresses are made available for clients without a pre-allocated fixed-address. When network booting on IPv6 this become an issue, the DHCPv6 specification which mandetes use of the DHCP Unique Identifier (DUID) and Identity Association Identifier (IAID) to identify a lease. When network booting, an instance will move trough a minimum of two DHCP clients, and these rareley end up using identical DUID's and IAID's. The combination of static/fixed-address allocations in the DHCP server and the changeing DUID/IAID of the clients causes the second DHCP client request to get a ``no address available`` reply from the server, and thus the network boot process errors. NOTE: In some cases just the UEFI PXE6 end up doing two cycles of DHCPv6 S.A.R.R (Solicit, Advertise, Request, Reply) with different IAID's. Because some UEFI firwmare use's a non-RFC compliant random generator for the IAID see bug: https://bugzilla.tianocore.org/show_bug.cgi?id=1518. While this is a bug in UEFI firmware, it's a common enough implementation that is used by various hardware vendors that it makes sense to workaround the issue where possible. This RFE is for adding the possibility to make a subnet with dynamic allocation pool(s). This would solve the network booting issue with changing IAID's described above. A new lease with a new address will be offered during each step of network booting. For example an instance deployment via Openstack Baremetal service (ironic) would typically include three DHCP clients during provisioning: UEFI firmware, iPXE, ironic-python-agent ramdisk. So a toatal of 3 leases would be consumed to complete the provisioning. If this RFE is implemented, the dhcp server (dnsmasq) would configure the dhcp-range for a dynamic subnet (or dynamic allocation pool of a subnet) without the ``mode`` set to ``static``. To ensure that the dhcp server only provide a dynamic allocation for the desired ports the ``ignore`` option is used in a ``dhcp-host`` entry with a wildcard ``*`` host (``dhcp-host="*",ignore``). Ports that require dynamic addressing would get a ``dhcp-host`` entry with ``dhcp-host=`` (whithout the ``ignore``) so that these specific ports get addresses from the dynamic allocation pool. ** Affects: neutron Importance: Undecided Status: New ** Tags: rfe -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1855854 Title: [RFE] Dynamic DHCP allocation pool Status in neutron: New Bug description: Neuton currently only support configuring the DHCP agent with static /fixed-address allocations. The DHCP client id (the client MAC address in neutron's case) is mapped to a specific IP address in the dhcp server configuration. No range of addresses are made available for clients without a pre-allocated fixed-address. When network booting on IPv6 this become an issue, the DHCPv6 specification which mandetes use of the DHCP Unique Identifier (DUID) and Identity Association Identifier (IAID) to identify a lease. When network booting, an instance will move trough a minimum of two DHCP clients, and these rareley end up using identical DUID's and IAID's. The combination of static/fixed-address allocations in the DHCP server and the changeing DUID/IAID of the clients causes the second DHCP client request to get a ``no address available`` reply from the server, and thus the network boot process errors. NOTE: In some cases just the UEFI PXE6 end up doing two cycles of DHCPv6 S.A.R.R (Solicit, Advertise, Request, Reply) with different IAID's. Because some UEFI firwmare use's a non-RFC compliant random generator for the IAID see bug: https://bugzilla.tianocore.org/show_bug.cgi?id=1518. While this is a bug in UEFI firmware, it's a common enough implementation that is used by various hardware vendors that it makes sense to workaround the issue where possible. This RFE is for adding the possibility to make a subnet with dynamic allocation pool(s). This would solve the network booting issue with changing IAID's described above. A new lease with a new address will be offered during each step of network booting. For example an instance deployment via Openstack Baremetal service (ironic) would typically include three DHCP clients during provisioning: UEFI firmware, iPXE, ironic-python-agent ramdisk. So a toatal of 3 leases would be consumed to complete the provisioning. If this RFE is implemented, the dhcp server (dnsmasq) would configure the dhcp-range for a dynamic subnet (or dynamic allocation pool of a subnet) without the
[Yahoo-eng-team] [Bug 1832768] Re: Horizon: AgularJS pages do not display dates in system's timezone
** Changed in: starlingx Status: Triaged => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1832768 Title: Horizon: AgularJS pages do not display dates in system's timezone Status in OpenStack Dashboard (Horizon): Fix Released Status in StarlingX: Fix Released Bug description: Brief Description - Horizon's AngularJS pages (for example Images) does not display timestamp in system's timezone. The timezone used is the browser's. Severity Minor Steps to Reproduce -- - If Timezone is not set under Settings (menu on the top right): Set the controller's system timezone to a one different from UTC and the browser, for example "system modify --timezone America/Regina" The Image timestamps (for example 'Created At') are displayed in the browser's timezone and not the system's timezone. or - If Timezone is set under Settings (menu on the top right): Set Horizon's Settings Timezone to one that is different from UTC and the browser. The Image timestamps (for example 'Created At') are displayed in the browser's timezone and not the Settings' timezone. Expected Behavior -- AngularJS pages should align with Django pages which use the timezone from the Horizon's cookie (which is set under the Settings menu on the top right), or if that is not set use the controller’s timezone. Actual Behavior The AngularJS pages use the browser's timezone. Reproducibility --- Reproducible System Configuration Any configuration Branch/Pull Time/Commit --- Last Pass - NA Timestamp/Logs -- NA To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1832768/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp