[Yahoo-eng-team] [Bug 1896603] Re: ovn-octavia-provider: Cannot create listener due to alowed_cidrs validation
Reviewed: https://review.opendev.org/753302 Committed: https://git.openstack.org/cgit/openstack/ovn-octavia-provider/commit/?id=76b20882aa9fef3c693e45c2b504224a44e84ce8 Submitter: Zuul Branch:master commit 76b20882aa9fef3c693e45c2b504224a44e84ce8 Author: Brian Haley Date: Tue Sep 22 08:34:29 2020 -0400 Fix the check for allowed_cidrs in listeners The allowed_cidrs value could be an empty list if the request involves the sdk, so change the check to account for that. Change-Id: I2df7e5a944cbd40c60943ad105f6e09f7afa85a9 Closes-bug: #1896603 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1896603 Title: ovn-octavia-provider: Cannot create listener due to alowed_cidrs validation Status in neutron: Fix Released Bug description: Kuryr-Kubernetes tests running with ovn-octavia-provider started to fail with "Provider 'ovn' does not support a requested option: OVN provider does not support allowed_cidrs option" showing up in the o-api logs. We've tracked that to check [1] getting introduced. Apparently it's broken and makes the request explode even if the property isn't set at all. Please take a look at output from python-openstackclient [2] where body I used is just '{"listener": {"loadbalancer_id": "faca9a1b- 30dc-45cb-80ce-2ab1c26b5521", "protocol": "TCP", "protocol_port": 80, "admin_state_up": true}}'. Also this is all over your gates as well, see o-api log [3]. Somehow ovn-octavia-provider tests skip 171 results there, so that's why it's green. [1] https://opendev.org/openstack/ovn-octavia-provider/src/branch/master/ovn_octavia_provider/driver.py#L142 [2] http://paste.openstack.org/show/798197/ [3] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_4ba/751085/7/gate/ovn-octavia-provider-v2-dsvm-scenario/4bac575/controller/logs/screen-o-api.txt To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1896603/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896617] Re: Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri
As background, adding libvirt-qemu user to the nova group was an attempt to make /var/lib/nova/* directories more restricted, but that proved to be difficult with ownership changes between changes nova and libvirt/qemu. ** Summary changed: - Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri + [SRU] Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri ** Also affects: nova (Ubuntu Groovy) Importance: Critical Assignee: Corey Bryant (corey.bryant) Status: Triaged ** Also affects: nova (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: nova (Ubuntu Focal) Status: New => Triaged ** Changed in: nova (Ubuntu Focal) Importance: Undecided => Critical ** Changed in: nova (Ubuntu Focal) Assignee: (unassigned) => Corey Bryant (corey.bryant) ** Also affects: cloud-archive Importance: Undecided Status: New ** Also affects: cloud-archive/ussuri Importance: Undecided Status: New ** Also affects: cloud-archive/victoria Importance: Undecided Status: New ** Changed in: cloud-archive/ussuri Status: New => Triaged ** Changed in: cloud-archive/victoria Status: New => Triaged ** Changed in: cloud-archive/victoria Importance: Undecided => Critical ** Changed in: cloud-archive/ussuri Importance: Undecided => Critical ** Changed in: cloud-archive/victoria Assignee: (unassigned) => Corey Bryant (corey.bryant) ** Changed in: cloud-archive/ussuri Assignee: (unassigned) => Corey Bryant (corey.bryant) ** Description changed: + [Impact] + tl;dr 1) creating the image from the existing VM fails if qcow2 image backend is used, but everything is fine if using rbd image backend in nova-compute. 2) openstack server image create --name fails with some unrelated error: $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc HTTP 404 Not Found: No image found with ID f4693860-cd8d-4088-91b9-56b2f173ffc7 == Details == Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists [0] are failing with the following exception: 49701867-bedc-4d7d-aa71-7383d877d90c Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 369, in create_image_from_server waiters.wait_for_image_status(client, image_id, wait_until) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py", line 161, in wait_for_image_status image = show_image(image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py", line 74, in show_image resp, body = self.get("images/%s" % image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 298, in get return self.request('GET', url, extra_headers, headers) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py", line 48, in request method, url, extra_headers, headers, body, chunked) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 687, in request self._error_checker(resp, resp_body) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 793, in _error_checker raise exceptions.NotFound(resp_body, resp=resp) tempest.lib.exceptions.NotFound: Object not found Details: {'code': 404, 'message': 'Image not found.'} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py", line 69, in test_create_delete_image wait_until='ACTIVE') File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 384, in create_image_from_server image_id=image_id) tempest.exceptions.SnapshotNotFoundException: Server snapshot image d82e95b0-9c62-492d-a08c-5bb118d3bf56 not found. So far I was able to identify the following: 1) https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_images_oneserver.py#L69 invokes a "create image from server" 2) It fails with the following error message in the
[Yahoo-eng-team] [Bug 1896617] Re: Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri
This is caused because the libvirt-qemu user is added to the nova group as part of the nova-compute-libvirt package post-install script. Following up on comment #17 above, the user/group of the delta file changes from nova:nova to libvirt-qemu:kvm, whereas in comment #21 above, the user/group of the delta file changes to nova:kvm. Dropping libvirt-qemu from nova in /etc/group fixes this as a work- around. I'm building packages with a fix now and will get this fixed for ussuri and victoria. Marking the upstream bug as invalid. ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896617 Title: [SRU] Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri Status in OpenStack nova-compute charm: Invalid Status in Ubuntu Cloud Archive: Triaged Status in Ubuntu Cloud Archive ussuri series: Triaged Status in Ubuntu Cloud Archive victoria series: Triaged Status in OpenStack Compute (nova): Invalid Status in nova package in Ubuntu: Triaged Status in nova source package in Focal: Triaged Status in nova source package in Groovy: Triaged Bug description: [Impact] tl;dr 1) creating the image from the existing VM fails if qcow2 image backend is used, but everything is fine if using rbd image backend in nova-compute. 2) openstack server image create --name fails with some unrelated error: $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc HTTP 404 Not Found: No image found with ID f4693860-cd8d-4088-91b9-56b2f173ffc7 == Details == Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists [0] are failing with the following exception: 49701867-bedc-4d7d-aa71-7383d877d90c Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 369, in create_image_from_server waiters.wait_for_image_status(client, image_id, wait_until) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py", line 161, in wait_for_image_status image = show_image(image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py", line 74, in show_image resp, body = self.get("images/%s" % image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 298, in get return self.request('GET', url, extra_headers, headers) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py", line 48, in request method, url, extra_headers, headers, body, chunked) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 687, in request self._error_checker(resp, resp_body) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 793, in _error_checker raise exceptions.NotFound(resp_body, resp=resp) tempest.lib.exceptions.NotFound: Object not found Details: {'code': 404, 'message': 'Image not found.'} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py", line 69, in test_create_delete_image wait_until='ACTIVE') File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 384, in create_image_from_server image_id=image_id) tempest.exceptions.SnapshotNotFoundException: Server snapshot image d82e95b0-9c62-492d-a08c-5bb118d3bf56 not found. So far I was able to identify the following: 1) https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_images_oneserver.py#L69 invokes a "create image from server" 2) It fails with the following error message in the nova-compute logs: https://pastebin.canonical.com/p/h6ZXdqjRRm/ The same occurs if the "openstack server image create --wait" will be executed; however, according to https://docs.openstack.org/nova/ussuri/admin/migrate-instance-with- snapshot.html the VM has to be shut down before the image creation: "Shut down the source VM before you take the snapshot to ensure that all data is flushed to
[Yahoo-eng-team] [Bug 1886298] Re: Few of the lower constraints are not compatible with python3.8
** Changed in: masakari Status: In Progress => Fix Released ** Changed in: masakari Milestone: None => 10.0.0.0rc1 ** Changed in: masakari Assignee: ZHOU LINHUI (zhoulinhui) => Radosław Piliszek (yoctozepto) ** Also affects: masakari/victoria Importance: Undecided Assignee: Radosław Piliszek (yoctozepto) Status: Fix Released ** Changed in: masakari/victoria Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1886298 Title: Few of the lower constraints are not compatible with python3.8 Status in castellan: In Progress Status in ec2-api: In Progress Status in futurist: Fix Released Status in OpenStack Dashboard (Horizon): Fix Released Status in kolla: Fix Released Status in kolla-ansible: Fix Released Status in OpenStack Shared File Systems Service (Manila): Fix Committed Status in manila-ui: Fix Released Status in masakari: Fix Released Status in masakari victoria series: Fix Released Status in OpenStack Compute (nova): Fix Released Status in os-win: New Status in oslo.messaging: In Progress Status in oslo.policy: In Progress Status in oslo.privsep: Fix Released Status in oslo.reports: Fix Released Status in oslo.vmware: Fix Released Status in Glance Client: New Status in python-keystoneclient: Fix Committed Status in python-manilaclient: Fix Released Status in python-novaclient: Fix Released Status in python-senlinclient: New Status in python-troveclient: New Status in python-watcherclient: New Status in Solum: New Status in tacker: Fix Released Status in taskflow: New Status in tripleo-validations: New Status in watcher: New Bug description: lower constraint are being tested with python.6 till now and jobs running fine. With the migration of testing to ubuntu focal where python3.8 is default, lower-constraints job started failing due to multiple issues. For example, Markupsafe 1.0 not compatible with new setuptools: - https://github.com/pallets/markupsafe/issues/116 paramiko 2.7.1 fixed the compatiblity for python3.7 onwards: https://github.com/paramiko/paramiko/issues/1108 greenlet 0.4.15 added wheels for python 3.8: https://github.com/python-greenlet/greenlet/issues/151 numpy 1.19.1 added python 3.8 support and testing: https://github.com/numpy/numpy/pull/14775 paramiko 2.7.1 fixed the compatibility for python3.7 onwards: https://github.com/paramiko/paramiko/commit/4753881223e0ff5e3b3be35bb687a18dfec4f672 Similarly there are many dependencies which added the python3.8 support in their later version. So we need to bump their lower constraints to compatible version. Approach to identify the required bump is by running lower-constraint job on Focal and star bumping for the failed things. I started with nova repos and found below version bump: For Nova: Markupsafe==1.1.1 cffi==1.14.0 greenlet==0.4.15 PyYAML==3.13 lxml==4.5.0 numpy==1.19.0 psycopg2==2.8 paramiko==2.7.1 For python-novaclient: Markupsafe==1.1.1 cffi==1.14.0 greenlet==0.4.15 PyYAML==3.13 For os-vif: Markupsafe==1.1.1 cffi==1.14.0 To manage notifications about this bug go to: https://bugs.launchpad.net/castellan/+bug/1886298/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1897118] Re: nova-compute does not start in devstack-platform-opensuse-15 job due to < 4.0.0 qemu version
adding devstack also as devstack needs to move to opensuse 15.2[1] which has qemu 4.2.0 available also move jobs to run on opensuse 15.2 will fix this. [1]https://github.com/openstack/devstack/blob/5aa38f51b3dd0660a0622aecd65937d3c56eedc2/stack.sh#L224 ** Also affects: devstack Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1897118 Title: nova-compute does not start in devstack-platform-opensuse-15 job due to < 4.0.0 qemu version Status in devstack: New Status in OpenStack Compute (nova): New Bug description: Nova bumped minimum qemu version to 4.0.0 [1]. It seem that the devstack-platform-opensuse-15 non-voting job has older qemu version than that. Therefore nova-compute service does not start[2]. [1] https://review.opendev.org/#/c/746981 [2] https://a5f2733c1907b1f26b90-5593d50c131879f6a486eeedbad80e3c.ssl.cf5.rackcdn.com/743800/14/check/devstack-platform-opensuse-15/91eeaf7/controller/logs/screen-n-cpu.txt To manage notifications about this bug go to: https://bugs.launchpad.net/devstack/+bug/1897118/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1897118] [NEW] nova-compute does not start in devstack-platform-opensuse-15 job due to < 4.0.0 qemu version
Public bug reported: Nova bumped minimum qemu version to 4.0.0 [1]. It seem that the devstack-platform-opensuse-15 non-voting job has older qemu version than that. Therefore nova-compute service does not start[2]. [1] https://review.opendev.org/#/c/746981 [2] https://a5f2733c1907b1f26b90-5593d50c131879f6a486eeedbad80e3c.ssl.cf5.rackcdn.com/743800/14/check/devstack-platform-opensuse-15/91eeaf7/controller/logs/screen-n-cpu.txt ** Affects: devstack Importance: Undecided Status: New ** Affects: nova Importance: Undecided Status: New ** Tags: gate-failure ** Tags added: gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1897118 Title: nova-compute does not start in devstack-platform-opensuse-15 job due to < 4.0.0 qemu version Status in devstack: New Status in OpenStack Compute (nova): New Bug description: Nova bumped minimum qemu version to 4.0.0 [1]. It seem that the devstack-platform-opensuse-15 non-voting job has older qemu version than that. Therefore nova-compute service does not start[2]. [1] https://review.opendev.org/#/c/746981 [2] https://a5f2733c1907b1f26b90-5593d50c131879f6a486eeedbad80e3c.ssl.cf5.rackcdn.com/743800/14/check/devstack-platform-opensuse-15/91eeaf7/controller/logs/screen-n-cpu.txt To manage notifications about this bug go to: https://bugs.launchpad.net/devstack/+bug/1897118/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896741] Re: Intel mediated device info doesn't provide a name attribute
** Changed in: nova/ussuri Status: New => Confirmed ** Also affects: nova/victoria Importance: Low Assignee: Sylvain Bauza (sylvain-bauza) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896741 Title: Intel mediated device info doesn't provide a name attribute Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) train series: Confirmed Status in OpenStack Compute (nova) ussuri series: Confirmed Status in OpenStack Compute (nova) victoria series: In Progress Bug description: When testing some Xeon server for virtual GPU support, I saw that Nova provides an exception as the i915 driver doesn't provide a name for mdev types : Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager Traceback (most recent call last): Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 9824, in _update_available_resource_for_node Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 896, in update_available_resource Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_available_resource(context, resources, startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 360, in inner Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return f(*args, **kwargs) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 981, in _update_available_resource Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update(context, cn, startup=startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1233, in _update Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self._update_to_placement(context, compute_node, startup) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager return attempt.get(self._wrap_exception) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager six.reraise(self.value[0], self.value[1], self.value[2]) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/six.py", line 703, in reraise Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager raise value Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/usr/local/lib/python3.7/site-packages/retrying.py", line 200, in call Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/compute/resource_tracker.py", line 1169, in _update_to_placement Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager self.driver.update_provider_tree(prov_tree, nodename) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7857, in update_provider_tree Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager provider_tree, nodename, allocations=allocations) Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR nova.compute.manager File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 8250, in _update_provider_tree_for_vgpu Sep 23 06:00:19
[Yahoo-eng-team] [Bug 1896617] Re: Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri
I'm still really confused by this but some thoughts on the nova os.chmod() call mentioned in an earlier commit that would fix this. If I chmod the tmp dir that gets created by nova (e.g. /var/lib/nova/instances/snapshots/tmpkajuir8o) to 755 just before the snapshot (after the nova chmod), the snapshot is successful. As mentioned in https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1896617/comments/18, the upstream nova code sets permissions for the tmp dir with: os.chmod(tmpdir, 0o701) That code has been that way since 2015, so it's not new in ussuri, see git blame: 824c3706a3e nova/virt/libvirt/driver.py (Nicolas Simonds 2015-07-23 12:47:24 -0500 2388) # NOTE(xqueralt): libvirt needs o+x in the tempdir 824c3706a3e nova/virt/libvirt/driver.py (Nicolas Simonds 2015-07-23 12:47:24 -0500 2389) os.chmod(tmpdir, 0o701) However, this seems like a heavy handed chmod if the goal, as the comment above it mentions, is to give libvirt o+x in the tempdir. I say this because it overrides any default permissions that were set previously by the operating system. It seems that this should really be a lighter touch such as the following (equivalent to chmod o+x tmpdir): st = os.stat(tmpdir) os.chmod(tmpdir, st.st_mode | stat.S_IXOTH) That would fix this bug for us, but still doesn't explain what changed in Ubuntu to cause this to fail. We did make some permissions changes in the nova package in focal but as compared above (with ussuri-proposed) file/directory permissions above in comment #21 I'm seeing no differences. ** Changed in: nova Status: Invalid => New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896617 Title: Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri Status in OpenStack nova-compute charm: Invalid Status in OpenStack Compute (nova): New Status in nova package in Ubuntu: Triaged Bug description: tl;dr 1) creating the image from the existing VM fails if qcow2 image backend is used, but everything is fine if using rbd image backend in nova-compute. 2) openstack server image create --name fails with some unrelated error: $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc HTTP 404 Not Found: No image found with ID f4693860-cd8d-4088-91b9-56b2f173ffc7 == Details == Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists [0] are failing with the following exception: 49701867-bedc-4d7d-aa71-7383d877d90c Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 369, in create_image_from_server waiters.wait_for_image_status(client, image_id, wait_until) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py", line 161, in wait_for_image_status image = show_image(image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py", line 74, in show_image resp, body = self.get("images/%s" % image_id) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 298, in get return self.request('GET', url, extra_headers, headers) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py", line 48, in request method, url, extra_headers, headers, body, chunked) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 687, in request self._error_checker(resp, resp_body) File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py", line 793, in _error_checker raise exceptions.NotFound(resp_body, resp=resp) tempest.lib.exceptions.NotFound: Object not found Details: {'code': 404, 'message': 'Image not found.'} During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py", line 69, in test_create_delete_image wait_until='ACTIVE') File "/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py", line 384, in
[Yahoo-eng-team] [Bug 1896463] Re: evacuation failed: Port update failed : Unable to correlate PCI slot
just adding the previous filed downstream redhat bug https://bugzilla.redhat.com/show_bug.cgi?id=1852110 this can happen in queens for context so when we root cause the issue and fix it it should like be backported to queens. tjere are other older bugs form newton that look similar related to unshelve so its posible that the same issue is affecting multiple move operations. ** Bug watch added: Red Hat Bugzilla #1852110 https://bugzilla.redhat.com/show_bug.cgi?id=1852110 ** Also affects: nova/train Importance: Undecided Status: New ** Also affects: nova/stein Importance: Undecided Status: New ** Also affects: nova/ussuri Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/victoria Importance: Low Assignee: Balazs Gibizer (balazs-gibizer) Status: Confirmed ** Also affects: nova/rocky Importance: Undecided Status: New ** Changed in: nova/ussuri Importance: Undecided => Low ** Changed in: nova/ussuri Status: New => Triaged ** Changed in: nova/train Importance: Undecided => Low ** Changed in: nova/train Status: New => Triaged ** Changed in: nova/stein Importance: Undecided => Low ** Changed in: nova/stein Status: New => Triaged ** Changed in: nova/rocky Importance: Undecided => Low ** Changed in: nova/rocky Status: New => Triaged ** Changed in: nova/queens Importance: Undecided => Low ** Changed in: nova/queens Status: New => Triaged -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896463 Title: evacuation failed: Port update failed : Unable to correlate PCI slot Status in OpenStack Compute (nova): Confirmed Status in OpenStack Compute (nova) queens series: Triaged Status in OpenStack Compute (nova) rocky series: Triaged Status in OpenStack Compute (nova) stein series: Triaged Status in OpenStack Compute (nova) train series: Triaged Status in OpenStack Compute (nova) ussuri series: Triaged Status in OpenStack Compute (nova) victoria series: Confirmed Bug description: Description === if the _update_available_resource() of resource_tracker is called between _do_rebuild_instance_with_claim() and instance.save() when evacuating VM instances on destination host, nova/compute/manager.py 2931 def rebuild_instance(self, context, instance, orig_image_ref, image_ref, 2932 +-- 84 lines: injected_files, new_pass, orig_sys_metadata,--- 3016 claim_ctxt = rebuild_claim( 3017 context, instance, scheduled_node, 3018 limits=limits, image_meta=image_meta, 3019 migration=migration) 3020 self._do_rebuild_instance_with_claim( 3021 +-- 47 lines: claim_ctxt, context, instance, orig_image_ref,- 3068 instance.apply_migration_context() 3069 # NOTE (ndipanov): This save will now update the host and node 3070 # attributes making sure that next RT pass is consistent since 3071 # it will be based on the instance and not the migration DB 3072 # entry. 3073 instance.host = self.host 3074 instance.node = scheduled_node 3075 instance.save() 3076 instance.drop_migration_context() the instance is not handled as managed instance of the destination host because it is not updated on DB yet. 2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req- b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance 22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by this compute host but has allocations referencing this compute host: {u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}. Skipping heal of allocation because we do not know what to do. And so the SRIOV ports (PCI device) was free by clean_usage() eventhough the VM has the VF port already. 743 def _update_available_resource(self, context, resources): 744 +-- 45 lines: # initialize the compute node object, creating it-- 789 self.pci_tracker.clean_usage(instances, migrations, orphans) 790 dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj() After that, evacuated this VM to another compute host again, we got the error like below. Steps to reproduce == 1. create a VM on com1 with SRIOV VF ports. 2. stop and disable nova-compute service on com1 3. wait 60 sec (nova-compute reporting interval) 4. evauate the VM to com2 5. wait the VM is
[Yahoo-eng-team] [Bug 1897100] [NEW] Improve port listing command
Public bug reported: As reported in https://bugzilla.redhat.com/show_bug.cgi?id=1772106, between Queens and Train there was a performance degradation in the port listing operation. This could be caused because of the port DB object new relationships (portuplinkstatuspropagation) or new extensions added. In any case, improving the server performance could be an arduous task. This bug proposes to improve the OSC query, adding a filter for those parameters shown in the list command: ID, name, MAC address, fixed IPs and status. This bug and a possible solution are similar to https://bugs.launchpad.net/neutron/+bug/1865223. ** Affects: neutron Importance: Low Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: New ** Changed in: neutron Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez) ** Changed in: neutron Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1897100 Title: Improve port listing command Status in neutron: New Bug description: As reported in https://bugzilla.redhat.com/show_bug.cgi?id=1772106, between Queens and Train there was a performance degradation in the port listing operation. This could be caused because of the port DB object new relationships (portuplinkstatuspropagation) or new extensions added. In any case, improving the server performance could be an arduous task. This bug proposes to improve the OSC query, adding a filter for those parameters shown in the list command: ID, name, MAC address, fixed IPs and status. This bug and a possible solution are similar to https://bugs.launchpad.net/neutron/+bug/1865223. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1897100/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1897099] [NEW] create_swap do not fallback to dd when fallocate fails
Public bug reported: Name: cloud-init Version : 20.2-1 Code in questioning: cloudinit/config/cc_mounts.py try: create_swap(fname, size, "fallocate") except util.ProcessExecutionError as e: LOG.warning(errmsg, fname, size, "dd", e) LOG.warning("Will attempt with dd.") create_swap(fname, size, "dd") as there is a kernel bug in latest's linux versions, fallocate creates swap images with holes. The workaround is to move fallocate (make create_swap function to fail) so that cloud-init will fallback to dd. Used bootcmd (or cloud-boothook) to rename (move) fallocate binary from my system but according the the logs, it didnt fallback to dd as it should. Probably the error was not ProcessExecutionError Logs: /var/log/cloud-init-output.log:2020-09-24 09:13:16,470 - cc_mounts.py[WARNING]: Failed to create swapfile '/swapfile' of size 2048MB via fallocate: Unexpected error while running command. /var/log/cloud-init-output.log:Command: ['fallocate', '-l', '2048M', '/swapfile'] /var/log/cloud-init-output.log:Reason: [Errno 2] No such file or directory: b'fallocate' /var/log/cloud-init-output.log:2020-09-24 09:13:16,479 - cc_mounts.py[WARNING]: failed to setup swap: [Errno 2] No such file or directory: '/swapfile' /var/log/cloud-init-output.log:chmod: cannot access '/usr/bin/fallocate': No such file or directory /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: Attempting to determine the real name of swap /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: changed default device swap => None /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: Ignoring nonexistent default named mount swap /var/log/cloud-init.log:2020-09-24 09:13:16,461 - cc_mounts.py[DEBUG]: suggest 2048.0 MB swap for 1983.953125 MB memory with '9030.296875 MB' disk given max=2048.0 MB [max=2048.0 MB]' /var/log/cloud-init.log:2020-09-24 09:13:16,461 - cc_mounts.py[DEBUG]: Creating swapfile in '/swapfile' on fstype 'ext4' using 'fallocate' /var/log/cloud-init.log:2020-09-24 09:13:16,461 - util.py[DEBUG]: Running command ['fallocate', '-l', '2048M', '/swapfile'] with allowed return codes [0] (she ll=False, capture=True) /var/log/cloud-init.log:2020-09-24 09:13:16,470 - cc_mounts.py[WARNING]: Failed to create swapfile '/swapfile' of size 2048MB via fallocate: Unexpected error while running command. /var/log/cloud-init.log:Command: ['fallocate', '-l', '2048M', '/swapfile'] /var/log/cloud-init.log:Reason: [Errno 2] No such file or directory: b'fallocate' /var/log/cloud-init.log:2020-09-24 09:13:16,479 - util.py[DEBUG]: Attempting to remove /swapfile /var/log/cloud-init.log:2020-09-24 09:13:16,479 - util.py[DEBUG]: Setting up swap file took 0.019 seconds /var/log/cloud-init.log:2020-09-24 09:13:16,479 - cc_mounts.py[WARNING]: failed to setup swap: [Errno 2] No such file or directory: '/swapfile' ** Affects: cloud-init Importance: Undecided Status: New ** Tags: fallocate swap -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1897099 Title: create_swap do not fallback to dd when fallocate fails Status in cloud-init: New Bug description: Name: cloud-init Version : 20.2-1 Code in questioning: cloudinit/config/cc_mounts.py try: create_swap(fname, size, "fallocate") except util.ProcessExecutionError as e: LOG.warning(errmsg, fname, size, "dd", e) LOG.warning("Will attempt with dd.") create_swap(fname, size, "dd") as there is a kernel bug in latest's linux versions, fallocate creates swap images with holes. The workaround is to move fallocate (make create_swap function to fail) so that cloud-init will fallback to dd. Used bootcmd (or cloud-boothook) to rename (move) fallocate binary from my system but according the the logs, it didnt fallback to dd as it should. Probably the error was not ProcessExecutionError Logs: /var/log/cloud-init-output.log:2020-09-24 09:13:16,470 - cc_mounts.py[WARNING]: Failed to create swapfile '/swapfile' of size 2048MB via fallocate: Unexpected error while running command. /var/log/cloud-init-output.log:Command: ['fallocate', '-l', '2048M', '/swapfile'] /var/log/cloud-init-output.log:Reason: [Errno 2] No such file or directory: b'fallocate' /var/log/cloud-init-output.log:2020-09-24 09:13:16,479 - cc_mounts.py[WARNING]: failed to setup swap: [Errno 2] No such file or directory: '/swapfile' /var/log/cloud-init-output.log:chmod: cannot access '/usr/bin/fallocate': No such file or directory /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: Attempting to determine the real name of swap /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: changed
[Yahoo-eng-team] [Bug 1897095] [NEW] [OVN] ARP/MAC handling for routers connected to external network is scaling poorly
Public bug reported: With current router configuration set by neutron, a number of logical flows in lr_in_arp_resolve seems to have O(n^2) scaling where n is a number of routers connected to the external network, for example this is our test where we created 800 routers (I believe it was 800, and not 400 as stated in the linked discussion): --8<--8<--8<-- # cat lflows.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | uniq -c |sort -n | tail -10 3264 lr_in_learn_neighbor 3386 ls_out_port_sec_l2 4112 lr_in_admission 4202 ls_in_port_sec_l2 4898 lr_in_lookup_neighbor 4900 lr_in_ip_routing 9144 ls_in_l2_lkup 9160 ls_in_arp_rsp 22136 lr_in_ip_input 671656 lr_in_arp_resolve # --8<--8<--8<-- I've opened a review where we set `always_learn_from_arp_request=false` and `dynamic_neigh_routers=true` on all routers, which has a significant impact on a number of logical flows: --8<--8<--8<-- # cat lflows-new.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | uniq -c |sort -n | tail -10 2170 ls_out_port_sec_l2 2172 lr_in_learn_neighbor 2666 lr_in_admission 2690 ls_in_port_sec_l2 3190 lr_in_ip_routing 4276 lr_in_lookup_neighbor 4873 lr_in_arp_resolve 5864 ls_in_arp_rsp 5873 ls_in_l2_lkup 14343 lr_in_ip_input # ovn-sbctl --timeout=120 lflow-list > lflows-new.txt --8<--8<--8<-- There is however some performance penalty, which from my understanding affects east-west traffic between routers - I'm not quite sure how much of an effect it is, and it may be a good idea to make that change optional as mentioned in the mailing list discussion. See https://mail.openvswitch.org/pipermail/ovs- discuss/2020-May/049994.html and http://lists.openstack.org/pipermail /openstack-discuss/2020-September/017370.html for related discussions. ** Affects: neutron Importance: Undecided Assignee: Krzysztof Klimonda (kklimonda) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1897095 Title: [OVN] ARP/MAC handling for routers connected to external network is scaling poorly Status in neutron: In Progress Bug description: With current router configuration set by neutron, a number of logical flows in lr_in_arp_resolve seems to have O(n^2) scaling where n is a number of routers connected to the external network, for example this is our test where we created 800 routers (I believe it was 800, and not 400 as stated in the linked discussion): --8<--8<--8<-- # cat lflows.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | uniq -c |sort -n | tail -10 3264 lr_in_learn_neighbor 3386 ls_out_port_sec_l2 4112 lr_in_admission 4202 ls_in_port_sec_l2 4898 lr_in_lookup_neighbor 4900 lr_in_ip_routing 9144 ls_in_l2_lkup 9160 ls_in_arp_rsp 22136 lr_in_ip_input 671656 lr_in_arp_resolve # --8<--8<--8<-- I've opened a review where we set `always_learn_from_arp_request=false` and `dynamic_neigh_routers=true` on all routers, which has a significant impact on a number of logical flows: --8<--8<--8<-- # cat lflows-new.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | uniq -c |sort -n | tail -10 2170 ls_out_port_sec_l2 2172 lr_in_learn_neighbor 2666 lr_in_admission 2690 ls_in_port_sec_l2 3190 lr_in_ip_routing 4276 lr_in_lookup_neighbor 4873 lr_in_arp_resolve 5864 ls_in_arp_rsp 5873 ls_in_l2_lkup 14343 lr_in_ip_input # ovn-sbctl --timeout=120 lflow-list > lflows-new.txt --8<--8<--8<-- There is however some performance penalty, which from my understanding affects east-west traffic between routers - I'm not quite sure how much of an effect it is, and it may be a good idea to make that change optional as mentioned in the mailing list discussion. See https://mail.openvswitch.org/pipermail/ovs- discuss/2020-May/049994.html and http://lists.openstack.org/pipermail /openstack-discuss/2020-September/017370.html for related discussions. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1897095/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1874273] Re: Horizon doesn't show all OS-EXT-SRV-ATTR attributes
Reviewed: https://review.opendev.org/721992 Committed: https://git.openstack.org/cgit/openstack/horizon/commit/?id=d403b31d70e06d784bc644c525e89a7e1b0b549d Submitter: Zuul Branch:master commit d403b31d70e06d784bc644c525e89a7e1b0b549d Author: Ivan Kolodyazhny Date: Wed Apr 22 17:41:13 2020 +0300 Show all os-extended-server-attributes Patch I0cfe9090e8263f983fa5f42f42616a26407be47a adds hypervisor hostname to the instance details view. This patch adds the rest of instance attributes allowed by 'os_compute_api:os-extended-server-attributes' policy. Change-Id: Id39ee14e3054422a96248f8cdd0a5bf07c27f2fc Closes-Bug: #1874273 ** Changed in: horizon Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1874273 Title: Horizon doesn't show all OS-EXT-SRV-ATTR attributes Status in OpenStack Dashboard (Horizon): Fix Released Bug description: If nova policy os_compute_api:os-extended-server-attributes allows showing extended server attributes horizon should show them too. To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1874273/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896945] [NEW] dnsmasq >= 2.81 not responding to DHCP requests with current q-dhcp configs
Public bug reported: * High level description: I've been attempting to enable Fedora 32 support in devstack and encountered the following issue where dnsmasq as configured by q-dhcp isn't responding to DHCP requests from clients: https://review.opendev.org/#/c/750292/ Looking at tcpdump and strace it appears that dnsmasq can see the requests but doesn't reply suggesting a configuration issue either caused by q-dhcp *or* a regression in dnsmasq itself: $ openstack server reboot --hard test && sudo ip netns exec qdhcp-df64061e-0784-4bbe-909b-ae1c5f466981 tcpdump -i tapee679459-e1 -n port 67 or port 68 dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on tapee679459-e1, link-type EN10MB (Ethernet), capture size 262144 bytes 18:40:24.070796 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:16:07:a0, length 300 18:41:24.118961 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:16:07:a0, length 300 18:42:24.192716 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:16:07:a0, length 300 $ openstack server reboot --hard test && sudo ip netns exec qdhcp-df64061e-0784-4bbe-909b-ae1c5f466981 strace -p 196856 strace: Process 196856 attached restart_syscall(<... resuming interrupted read ...>) = 1 recvmsg(4, {msg_name={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, msg_namelen=16, msg_iov=[{iov_base="\1\1\6\0\0041\326S\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\372\26>\26"..., iov_len=548}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("tapee679459-e1"), ipi_spec_dst=inet_addr("10.0.0.2"), ipi_addr=inet_addr("255.255.255.255")}}], msg_controllen=32, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 300 recvmsg(4, {msg_name={sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, msg_namelen=16, msg_iov=[{iov_base="\1\1\6\0\0041\326S\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\372\26>\26"..., iov_len=548}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("tapee679459-e1"), ipi_spec_dst=inet_addr("10.0.0.2"), ipi_addr=inet_addr("255.255.255.255")}}], msg_controllen=32, msg_flags=0}, 0) = 300 ioctl(4, SIOCGIFNAME, {ifr_index=9, ifr_name="tapee679459-e1"}) = 0 ioctl(4, SIOCGIFFLAGS, {ifr_name="tapee679459-e1", ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_MULTICAST}) = 0 ioctl(4, SIOCGIFADDR, {ifr_name="tapee679459-e1", ifr_addr={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.0.0.2")}}) = 0 poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=14, events=POLLIN}], 8, -1 The current configs are listed below: http://paste.openstack.org/show/798334/ I was able to downgrade dnsmasq on f32 to 2.80 in order to workaround this: $ sudo dnf downgrade dnsmasq -y [..] $ rpm -qa | grep dnsmasq dnsmasq-2.80-14.fc32.x86_64 $ sudo killall dnsmasq && sudo systemctl restart devstack@q-* $ openstack server reboot --hard test && sudo ip netns exec qdhcp-df64061e-0784-4bbe-909b-ae1c5f466981 tcpdump -i tapee679459-e1 -n port 67 or port 68 dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on tapee679459-e1, link-type EN10MB (Ethernet), capture size 262144 bytes 12:06:57.028953 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:16:07:a0, length 300 12:06:57.029994 IP 10.0.0.2.bootps > 10.0.0.49.bootpc: BOOTP/DHCP, Reply, length 328 12:06:57.042300 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:16:07:a0, length 300 12:06:57.047014 IP 10.0.0.2.bootps > 10.0.0.49.bootpc: BOOTP/DHCP, Reply, length 344 * Pre-conditions: F32 with dnsmasq >= 2.81 installed. * Step-by-step reproduction steps: Deploy F32 with dnsmasq >= 2.81 installed, attempt to spawn an instance attached to a subnet with dhcp enabled. * Expected output: dnsmasq responds to DHCP request from instance. * Actual output: dnsmasq doesn't respond to DHCP request from instance. * Version: ** OpenStack version (Specific stable branch, or git hash if from trunk); Neutron @ 0fdcc4b1b63dc90fbc9f46f5947f84626f8e5b41 ** Linux distro, kernel. For a distro, it’s also worth knowing specific versions of client and server; Fedora 32 with kernel 5.8.10-200.fc32.x86_64 ** DevStack or other _deployment_ mechanism? Devstack @ https://review.opendev.org/#/c/750292/ * Environment: what types of services are you running (core services like DB and AMQP broker, as well as Nova/hypervisor if it matters), and which type of deployment (clustered servers)? Multi-node or single node, etc. Single node devstack env. * Perceived severity: is this a blocker for you? High, assuming other distros will
[Yahoo-eng-team] [Bug 1882918] Re: Instances page crashes on repeated confirm/revert resize actions
Reviewed: https://review.opendev.org/734814 Committed: https://git.openstack.org/cgit/openstack/horizon/commit/?id=a4a549a1814a9ea9f142fd8ec5928fe9cfebc269 Submitter: Zuul Branch:master commit a4a549a1814a9ea9f142fd8ec5928fe9cfebc269 Author: pedh Date: Wed Jun 10 18:18:00 2020 +0800 Fix: Page crashes on instance confirm resize Add error handling to instance confirm/revert resize methods. Change-Id: I128049091f38e8db3c1524a5c4cb932f3e809714 Closes-Bug: #1882918 ** Changed in: horizon Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1882918 Title: Instances page crashes on repeated confirm/revert resize actions Status in OpenStack Dashboard (Horizon): Fix Released Bug description: Environment: OpenStack Rocky/Ussuri compute node of libvirt/kvm on HP Proliant server with 24*cpu and 256GB mem Reproduction steps: 1. Create an instance with flavor "m1.small"; 2. Resize the instance to flavor "m1.medium"; 3. In a while the instance enters "VERIFY_RESIZE" state, and the "Confirm Resize/Migration" button is enabled, we click the button and wait for the button to be enabled again, and then click the button repeatedly; 4. The page crashes after the resize is done. Error log: http://ix.io/2oOd Bug analysis: The bug reproduces easily, if the instance stays in VERIFY_RESIZE state for long enough. After digging into the horizon source code, I found that "ConfirmResize.single" and "RevertResize.single" methods in "openstack_dashboard/dashboards/project/instance/tables.py" are lack of exception handling, and expose the exception to django, to cause the page crash. To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1882918/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896933] [NEW] Exception when plugin creates a network without specifying the MTU
Public bug reported: This was found as a UT regression failure in the x/group-based-policy project, but I think the same issue applies to the auto-allocated- topology workflow (or any feature that creates a network at the plugin or DB layer instead of at the REST layer). The exception seen is this: File "/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py", line 1053, in create_network result, mech_context = self._create_network_db(context, network) File "/home/zuul/src/opendev.org/x/group-based-policy/gbpservice/neutron/plugins/ml2plus/plugin.py", line 333, in _create_network_db context, network) File "/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py", line 1013, in _create_network_db net_db.mtu = self._get_network_mtu(net_db) File "/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py", line 995, in _get_network_mtu raise exc.InvalidInput(error_message=msg) neutron_lib.exceptions.InvalidInput: Invalid input for operation: Requested MTU is too big, maximum is 1000. The UT limits the network MTU using configuration file settings: https://opendev.org/x/group-based-policy/src/branch/master/gbpservice/neutron/tests/unit/services/grouppolicy/test_aim_mapping_driver.py#L2951-L2952 The regression happens because a default MTU for the DB layer was introduced in this patch: https://review.opendev.org/#/c/679399/ The default value used is the DEFAULT_NETWORK_MTU constant from neutron- lib (1500). This is different than the default value installed by the REST layer (0). When the network MTU is constrained using configuration files, it gets to this code path: https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L992-L1002 Since the DB layer has set the default to 1500 instead of 0, this exception gets triggered, even though the caller at the plugin layer didn't specify a value for the MTU. One possible fix is to have the DB layer use a value of 0 for the default instead of DEFAULT_NETWORK_MTU. ** Affects: neutron Importance: Undecided Assignee: Thomas Bachman (bachman) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1896933 Title: Exception when plugin creates a network without specifying the MTU Status in neutron: In Progress Bug description: This was found as a UT regression failure in the x/group-based-policy project, but I think the same issue applies to the auto-allocated- topology workflow (or any feature that creates a network at the plugin or DB layer instead of at the REST layer). The exception seen is this: File "/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py", line 1053, in create_network result, mech_context = self._create_network_db(context, network) File "/home/zuul/src/opendev.org/x/group-based-policy/gbpservice/neutron/plugins/ml2plus/plugin.py", line 333, in _create_network_db context, network) File "/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py", line 1013, in _create_network_db net_db.mtu = self._get_network_mtu(net_db) File "/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py", line 995, in _get_network_mtu raise exc.InvalidInput(error_message=msg) neutron_lib.exceptions.InvalidInput: Invalid input for operation: Requested MTU is too big, maximum is 1000. The UT limits the network MTU using configuration file settings: https://opendev.org/x/group-based-policy/src/branch/master/gbpservice/neutron/tests/unit/services/grouppolicy/test_aim_mapping_driver.py#L2951-L2952 The regression happens because a default MTU for the DB layer was introduced in this patch: https://review.opendev.org/#/c/679399/ The default value used is the DEFAULT_NETWORK_MTU constant from neutron-lib (1500). This is different than the default value installed by the REST layer (0). When the network MTU is constrained using configuration files, it gets to this code path: https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L992-L1002 Since the DB layer has set the default to 1500 instead of 0, this exception gets triggered, even though the caller at the plugin layer didn't specify a value for the MTU. One possible fix is to have the DB layer use a value of 0 for the default instead of DEFAULT_NETWORK_MTU. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1896933/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to :
[Yahoo-eng-team] [Bug 1896920] [NEW] Unnecessary error log when checking if a device is ready
Public bug reported: In method "ensure_device_is_ready" [1], if the device does not exist or the MAC is still not assigned, the method returns False and also logs an error. This error log is distracting; instead of this, an info message could be logged. The code using this method, that returns True or False depending on the state of the interface, can decide to log a higher log message. [1]https://github.com/openstack/neutron/blob/856cae4cf8e33c05b308d880df78b7be02ae90ad/neutron/agent/linux/ip_lib.py#L955 ** Affects: neutron Importance: Wishlist Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: In Progress ** Changed in: neutron Importance: Undecided => Low ** Changed in: neutron Importance: Low => Wishlist -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1896920 Title: Unnecessary error log when checking if a device is ready Status in neutron: In Progress Bug description: In method "ensure_device_is_ready" [1], if the device does not exist or the MAC is still not assigned, the method returns False and also logs an error. This error log is distracting; instead of this, an info message could be logged. The code using this method, that returns True or False depending on the state of the interface, can decide to log a higher log message. [1]https://github.com/openstack/neutron/blob/856cae4cf8e33c05b308d880df78b7be02ae90ad/neutron/agent/linux/ip_lib.py#L955 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1896920/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1896574] Re: how to deal with hypervisor name changing
I think previous title is misleading. Actually hostname itself is still A. what changes is fqdn name seen by hostname --fqdn. ** Changed in: nova Status: Invalid => New ** Summary changed: - how to deal with hypervisor name changing + how to deal with hypervisor host fqdn name changing ** Description changed: - nova fails to correctly account for resources after hypervisor name - changes. For example, if previously the hypervisor name is A, and some - later it switches to A.B, then all of the instances which belong to A + nova fails to correctly account for resources after hypervisor hosntame + fqdn changes. For example, if previously the hypervisor hostname fqdn is + A, and some later it to A.B, then all of the instances which belong to A will not be included in the resource computation for A.B although effectively they are the same thing. + But under such circumstances, compute service's is still A. + Is there any way to deal with this situation? we are using openstack rocky. ** Description changed: nova fails to correctly account for resources after hypervisor hosntame fqdn changes. For example, if previously the hypervisor hostname fqdn is A, and some later it to A.B, then all of the instances which belong to A will not be included in the resource computation for A.B although effectively they are the same thing. - But under such circumstances, compute service's is still A. + But under such circumstances, compute service's is listed as A. Is there any way to deal with this situation? we are using openstack rocky. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1896574 Title: how to deal with hypervisor host fqdn name changing Status in OpenStack Compute (nova): New Bug description: nova fails to correctly account for resources after hypervisor hosntame fqdn changes. For example, if previously the hypervisor hostname fqdn is A, and some later it to A.B, then all of the instances which belong to A will not be included in the resource computation for A.B although effectively they are the same thing. But under such circumstances, compute service's is listed as A. Is there any way to deal with this situation? we are using openstack rocky. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1896574/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp