[Yahoo-eng-team] [Bug 1961068] Re: nova-ceph-multistore job fails with mysqld got oom-killed
Reviewed: https://review.opendev.org/c/openstack/nova/+/874664 Committed: https://opendev.org/openstack/nova/commit/84d1f25446731e4e51beb83a017cdf7bfda8c5d5 Submitter: "Zuul (22348)" Branch:master commit 84d1f25446731e4e51beb83a017cdf7bfda8c5d5 Author: Dan Smith Date: Tue Feb 21 08:43:13 2023 -0800 Use mysql memory reduction flags for ceph job This makes the ceph-multistore job use the MYSQL_REDUCE_MEMORY flag in devstack to try to address the frequent OOMs we see in that job. Change-Id: Ibc203bd10dcb530027c2c9f58eb840ccc088280d Closes-Bug: #1961068 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1961068 Title: nova-ceph-multistore job fails with mysqld got oom-killed Status in OpenStack Compute (nova): Fix Released Bug description: Searching through the jobs showed that nova-ceph-multistore job fails time to time with DB crash due to out of memory error. In the tempest errors the following message can be seen: tempest.lib.exceptions.ServerFault: Got server fault Details: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. in mysqld error logs (controller/logs/mysql/error_log.txt) the crash recovery is visible: 2022-02-15T19:26:40.245179Z 0 [System] [MY-010229] [Server] Starting XA crash recovery... 2022-02-15T19:26:40.268204Z 0 [System] [MY-010232] [Server] XA crash recovery finished. and around that time in syslog (controller/logs/syslog.txt) the Out of Memory logs can be seen: Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mysql.service,task=mysqld,pid=67959,uid=116 Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: Out of memory: Killed process 67959 (mysqld) total-vm:5127600kB, anon-rss:756064kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:2388kB oom_score_adj:0 Feb 15 19:26:35 ubuntu-focal-ovh-gra1-0028467853 kernel: oom_reaper: reaped process 67959 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB The error only comes in nova-ceph-multistore job. (see recent occurrences via logsearch: https://paste.opendev.org/show/bQNKfoaMafUyNFCyQ0kN/ ) Mostly happens on current master branch (yoga), but example error found in wallaby as well: https://zuul.opendev.org/t/openstack/build/d8a6a9c1496346dda6986db00c06a616 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1961068/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1998789] Re: PooledLDAPHandler.result3 does not release pool connection back when an exception is raised
Reviewed: https://review.opendev.org/c/openstack/keystone/+/866723 Committed: https://opendev.org/openstack/keystone/commit/ff632a81fb09e6d9f3298e494d53eb6df50269cf Submitter: "Zuul (22348)" Branch:master commit ff632a81fb09e6d9f3298e494d53eb6df50269cf Author: Mustafa Kemal Gilor Date: Mon Dec 5 17:33:47 2022 +0300 [PooledLDAPHandler] Ensure result3() invokes message.clean() result3 does not invoke message.clean() when an exception is thrown by `message.connection.result3()` call, causing pool connection associated with the message to be marked active forever. This causes a denial-of-service on ldappool. The fix ensures message.clean() is invoked by wrapping the offending call in try-except-finally and putting the message.clean() in finally block. Closes-Bug: #1998789 Change-Id: I59ebf0fa77391d49b2349e918fc55f96318c42a6 Signed-off-by: Mustafa Kemal Gilor ** Changed in: keystone Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1998789 Title: PooledLDAPHandler.result3 does not release pool connection back when an exception is raised Status in OpenStack Identity (keystone): Fix Released Bug description: This is a follow-up issue for LP#1896125. This problem has happened when LDAP connection pooling is on (use_pool=True), page_size > 0 and pool_connection_timeout is < 'ldap server response time'. The scenario is as follows: - An user tries to log in to a domain that is attached to LDAP backend. - LDAP server does not respond in `pool_connection_timeout` seconds, causing LDAP connection to raise a ldap.TIMEOUT() exception - From now on, all subsequent LDAP requests will fail with ldappool.MaxConnectionReachedError An in-depth analysis explains why it happens: - LDAP query initiated for user login request with BaseLdap._ldap_get() function call, which grabs a connection with self.get_connection() and invokes conn.search_s() - conn.search_s() invokes conn._paged_search_s() since page_size is > 0 - conn._paged_search_s() calls conn.search_ext() (PooledLDAPHandler.search_ext) method - conn.search_ext() initiates an asynchronous LDAP request and returns an AsynchronousMessage object to the _paged_search_s(), representing the request. - conn._paged_search_s() tries to obtain asynchronous LDAP request results via calling conn.result3() (PooledLDAPHandler.result3) - conn.result3() calls message.connection.result3() - the server cannot respond in pool_connection_timeout seconds, - message.connection.result3() raises a ldap.TIMEOUT(), causes subsequent connection release function, message.clean() to be not called - the connection is kept active forever, subsequent requests cannot use it anymore Reproducer: - Deploy an LDAP server of your choice - Fill it with many data so the search takes more than `pool_connection_timeout` seconds - Define a keystone domain with the LDAP driver with following options: [ldap] use_pool = True page_size = 100 pool_connection_timeout = 3 pool_retry_max = 3 pool_size = 10 - Point the domain to the LDAP server - Try to login to the OpenStack dashboard, or try to do anything that uses the LDAP user - Observe the /var/log/apache2/keystone_error.log, it should contain ldap.TIMEOUT() stack traces followed by `ldappool.MaxConnectionReachedError` stack traces Known workarounds: - Disable LDAP pooling by setting use_pool=Flase - Set page_size to 0 To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1998789/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2008341] [NEW] Lock, migrate, and unshelve server actions don't enforce request body schema for certain microversions
Public bug reported: Description === Basically $summary. For lock, migrate, and unshelve, we have decorators for validation schema that _start_ at a certain microversion (exact microversion varies), meaning anything below that is not checked. A client could send a request that is only valid in higher microversion, omit sending a microversion (probably by mistake), and be surprised when the request is accepted but not honoured. Steps to reproduce == 1. Send a request with random stuff in the body ex: curl -g -i -X POST http://10.0.77.83/compute/v2.1/servers/a45ae810-89ef-44fb-b751-013a8740647b/action \ -H "Accept: application/json" \ -H "Content-Type: application/json" \ -H "User-Agent: python-novaclient" \ -H "X-Auth-Token: " \ -H "X-OpenStack-Nova-API-Version: 2.1" \ -d '{"lock": {"foo": "bar"}}' OR -d '{"migrate": {"foo": "bar"}}' OR -d '{"unshelve": {"foo": "bar"}}' Expected result === 400 Bad Request (or similar) Actual result = HTTP/1.1 202 Accepted Environment === Reproduced on master with devstack+kvm. Originally reported on wallaby https://bugzilla.redhat.com/show_bug.cgi?id=2172851 Additional info === I (manually, so there could be errors) went through the code, and those are the only 3 instances of this that I found. Every other API controller method correctly validates its request body across the entire range of the microversions where it's supported. ** Affects: nova Importance: Undecided Status: New ** Summary changed: - Lock, migrate, and shelve server actions don't enforce request body schema for certain microversions + Lock, migrate, and unshelve server actions don't enforce request body schema for certain microversions -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2008341 Title: Lock, migrate, and unshelve server actions don't enforce request body schema for certain microversions Status in OpenStack Compute (nova): New Bug description: Description === Basically $summary. For lock, migrate, and unshelve, we have decorators for validation schema that _start_ at a certain microversion (exact microversion varies), meaning anything below that is not checked. A client could send a request that is only valid in higher microversion, omit sending a microversion (probably by mistake), and be surprised when the request is accepted but not honoured. Steps to reproduce == 1. Send a request with random stuff in the body ex: curl -g -i -X POST http://10.0.77.83/compute/v2.1/servers/a45ae810-89ef-44fb-b751-013a8740647b/action \ -H "Accept: application/json" \ -H "Content-Type: application/json" \ -H "User-Agent: python-novaclient" \ -H "X-Auth-Token: " \ -H "X-OpenStack-Nova-API-Version: 2.1" \ -d '{"lock": {"foo": "bar"}}' OR -d '{"migrate": {"foo": "bar"}}' OR -d '{"unshelve": {"foo": "bar"}}' Expected result === 400 Bad Request (or similar) Actual result = HTTP/1.1 202 Accepted Environment === Reproduced on master with devstack+kvm. Originally reported on wallaby https://bugzilla.redhat.com/show_bug.cgi?id=2172851 Additional info === I (manually, so there could be errors) went through the code, and those are the only 3 instances of this that I found. Every other API controller method correctly validates its request body across the entire range of the microversions where it's supported. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2008341/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2008276] [NEW] [sqlalchemy-20] The Session.begin.subtransactions flag is deprecated
Public bug reported: Testing patch: https://review.opendev.org/c/openstack/neutron/+/874778 Logs: https://c06c4109be1832423601-1eb2471c773c922210e88856273ba212.ssl.cf5.rackcdn.com/874778/1/check/neutron- functional-with-uwsgi/4c704c1/job-output.txt Error: 2023-02-22 20:14:32.646654 | controller | /home/zuul/src/opendev.org/openstack/neutron/neutron/db/migration/alembic_migrations/versions/mitaka/contract/8a6d8bdae39_migrate_neutron_resources_table.py:72: RemovedIn20Warning: The Session.begin.subtransactions flag is deprecated and will be removed in SQLAlchemy version 2.0. See the documentation at session_subtransactions for background on a compatible alternative pattern. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) ** Affects: neutron Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2008276 Title: [sqlalchemy-20] The Session.begin.subtransactions flag is deprecated Status in neutron: In Progress Bug description: Testing patch: https://review.opendev.org/c/openstack/neutron/+/874778 Logs: https://c06c4109be1832423601-1eb2471c773c922210e88856273ba212.ssl.cf5.rackcdn.com/874778/1/check/neutron- functional-with-uwsgi/4c704c1/job-output.txt Error: 2023-02-22 20:14:32.646654 | controller | /home/zuul/src/opendev.org/openstack/neutron/neutron/db/migration/alembic_migrations/versions/mitaka/contract/8a6d8bdae39_migrate_neutron_resources_table.py:72: RemovedIn20Warning: The Session.begin.subtransactions flag is deprecated and will be removed in SQLAlchemy version 2.0. See the documentation at session_subtransactions for background on a compatible alternative pattern. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2008276/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2008270] [NEW] Neutron allows you to delete router_ha_interface ports, which can lead to issues
Public bug reported: We ran into a problem with a customer when some external integration tries to remove all ports using the neutron API, including router prots. It seems only the router ports with the router_ha_interface device owner are allowed to delete, all other router ports cannot be deleted directly through the API. Here is a simple example that demonstrates the doubling of ARP responses if such a port is deleted: [root@dev0 ~]# openstack router create r1 --ha --external-gateway public -c id +---+--+ | Field | Value| +---+--+ | id| 5d9d6fee-6652-4843-9f7c-54c11899d721 | +---+--+ [root@dev0 ~]# neutron l3-agent-list-hosting-router r1 neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead. +--+--++---+--+ | id | host | admin_state_up | alive | ha_state | +--+--++---+--+ | 9dd0920a-cb0c-47f1-a976-3e208e3e2e6c | dev0 | True | :-) | active | | 6fa92056-ca25-42e0-aee4-c4e744008239 | dev2 | True | :-) | standby | | 8fbda128-dc9c-4b3b-be1b-bb3f11ad1447 | dev1 | True | :-) | standby | +--+--++---+--+ [root@dev0 ~]# openstack port list --device-id 5d9d6fee-6652-4843-9f7c-54c11899d721 -c id -c device_owner -c fixed_ips --long +--+-++ | ID | Device Owner| Fixed IP Addresses | +--+-++ | 555a9272-c9df-4a05-9f08-752c91c5a4c9 | network:router_ha_interface | ip_address='169.254.192.147', subnet_id='20c159f7-13f8-4093-9a4a-8380bdcfea60' | | 6a196ff7-f3d4-4bee-aed0-b5d7ba727741 | network:router_ha_interface | ip_address='169.254.193.243', subnet_id='20c159f7-13f8-4093-9a4a-8380bdcfea60' | | 7a849dcc-eac4-4d5b-a547-7ce3986ffb95 | network:router_ha_interface | ip_address='169.254.192.155', subnet_id='20c159f7-13f8-4093-9a4a-8380bdcfea60' | | d77e624d-87a2-4135-9118-3d8e78539cee | network:router_gateway | ip_address='10.136.17.172', subnet_id='ee15c548-e497-449e-b46d-50e9ccc0f70c' | +--+-++ [root@dev0 ~]# [root@dev0 ~]# ip netns exec snat-5d9d6fee-6652-4843-9f7c-54c11899d721 ip a ... 25: ha-555a9272-c9: mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:7d:cf:a0 brd ff:ff:ff:ff:ff:ff inet 169.254.192.147/18 brd 169.254.255.255 scope global ha-555a9272-c9 valid_lft forever preferred_lft forever inet 169.254.0.189/24 scope global ha-555a9272-c9 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe7d:cfa0/64 scope link valid_lft forever preferred_lft forever 28: qg-d77e624d-87: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:a8:54:29 brd ff:ff:ff:ff:ff:ff inet 10.136.17.172/20 scope global qg-d77e624d-87 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fea8:5429/64 scope link nodad valid_lft forever preferred_lft forever [root@dev0 ~]# [root@dev0 ~]# openstack port delete 555a9272-c9df-4a05-9f08-752c91c5a4c9 [root@dev0 ~]# neutron l3-agent-list-hosting-router r1 neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead. +--+--++---+--+ | id | host | admin_state_up | alive | ha_state | +--+--++---+--+ | 6fa92056-ca25-42e0-aee4-c4e744008239 | dev2 | True | :-) | active | | 8fbda128-dc9c-4b3b-be1b-bb3f11ad1447 | dev1 | True | :-) | standby | +--+--++---+--+ [root@dev0 ~]# [root@dev0 ~]# ip netns exec snat-5d9d6fee-6652-4843-9f7c-54c11899d721 ip a s qg-d77e624d-87 28: qg-d77e624d-87: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:a8:54:29 brd ff:ff:ff:ff:ff:ff inet 10.136.17.172/20 scope global qg-d77e624d-87 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fea8:5429/64 scope link nodad valid_lft forever preferred_lft forever [root@dev0 ~]# ssh dev2 ip netns exec
[Yahoo-eng-team] [Bug 2008257] [NEW] OVN agent heartbeat timestamp format changed unexpectedly
Public bug reported: Following an upgrade from Wallaby to Yoga of a Neutron OVN deployment, I discovered that openstack-exporter would fail to return metrics for Neutron agent states with the following error: time="2023-02-23T10:49:33Z" level=error msg="Failed to collect metric for exporter: neutron, error: failed to collect metric: agent_state, error: parsing time \"2023-02-23 10:48:55.729000+00:00\": extra text: \"+00:00\"" source="exporter.go:123" I tracked it down to the following change which has also been backported to stable/branches (so a recent Wallaby would also be affected): https://review.opendev.org/c/openstack/neutron/+/844179 With this change, heartbeat timestamp includes timezone when previously it didn't, causing clients such as the gophercloud library (which openstack-exporter uses) to fail parsing of the response. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2008257 Title: OVN agent heartbeat timestamp format changed unexpectedly Status in neutron: New Bug description: Following an upgrade from Wallaby to Yoga of a Neutron OVN deployment, I discovered that openstack-exporter would fail to return metrics for Neutron agent states with the following error: time="2023-02-23T10:49:33Z" level=error msg="Failed to collect metric for exporter: neutron, error: failed to collect metric: agent_state, error: parsing time \"2023-02-23 10:48:55.729000+00:00\": extra text: \"+00:00\"" source="exporter.go:123" I tracked it down to the following change which has also been backported to stable/branches (so a recent Wallaby would also be affected): https://review.opendev.org/c/openstack/neutron/+/844179 With this change, heartbeat timestamp includes timezone when previously it didn't, causing clients such as the gophercloud library (which openstack-exporter uses) to fail parsing of the response. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2008257/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2008116] Re: In train, current version is 2.1 instead of 2.10
** Changed in: glance Status: Invalid => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/2008116 Title: In train, current version is 2.1 instead of 2.10 Status in Glance: Confirmed Bug description: In train, current version is 2.1 instead of 2.10: (undercloud) [stack@undercloud-0 ~]$ curl http://10.10.10.10:9292 | jq . { "versions": [ { "id": "v2.1", "status": "CURRENT", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.9", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.8", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.7", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.6", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.5", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.4", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.3", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.2", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.1", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.0", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] } ] } To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/2008116/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2008116] Re: In train, current version is 2.1 instead of 2.10
Stable Train does not have API v.2.10. this is RH Downstream product related issue. Not really a bug. ** Changed in: glance Importance: Undecided => Wishlist ** Changed in: glance Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/2008116 Title: In train, current version is 2.1 instead of 2.10 Status in Glance: Invalid Bug description: In train, current version is 2.1 instead of 2.10: (undercloud) [stack@undercloud-0 ~]$ curl http://10.10.10.10:9292 | jq . { "versions": [ { "id": "v2.1", "status": "CURRENT", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.9", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.8", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.7", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.6", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.5", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.4", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.3", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.2", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.1", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] }, { "id": "v2.0", "status": "SUPPORTED", "links": [ { "rel": "self", "href": "http://10.10.10.10:9292/v2/; } ] } ] } To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/2008116/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2008238] [NEW] SRIOV port binding_profile attributes for OVS hardware offload are stripped on instance deletion or port detachment
Public bug reported: Description === This issue applies for systems using SRIOV with Mellanox ASAP2 SDN offloads. An SRIOV port capable for ASAP2 SDN acceleration (OVS hardware offloads) has 'capabilities=[switchdev]' added to the port binding_profile. After a VM has been created with SRIOV port attached, the port can no longer be used for subsequent VM builds. Attempt to reuse the port results in an error of the form "Cannot set interface MAC/vlanid to / for ifname ens1f0 vf 7: Operation not supported" The underlying issue appears to be that when an SRIOV port is detached from a VM, or the VM is destroyed, the capabilities=[switchdev] property is removed from the port binding_profile. This converts the port from ASAP2 to “Legacy SRIOV” (in Mellanox-speak) and makes it unusable. If the port binding_profile property is restored then the port can be successfully reused. The property is preserved during live migration, instance resizes and rebuilds. It only appears to be instance depletion or port detachment where the binding_profile property is removed. Steps to reproduce == 1. Create SRIOV port with ASAP2 capability: openstack port create --project --network --vnic- type=direct --binding-profile '{"capabilities": ["switchdev"]}' sriov- port-1 2. Check the port binding_profile property: openstack port show -c binding_profile sriov-port-1 3. Create an instance using the port: openstack server create --flavor --image --key-name --nic port-id=sriov-port-1 sriov-vm-1 4. Delete the instance: openstack server delete sriov-vm-1 5. Check the port binding_profile property: openstack port show -c binding_profile sriov-port-1 Expected Result === Nova sets properties in the binding_profile while the instance is in use. Alongside those properties the capabilities='['switchdev']' property should be preserved. Actual Result = After the instance is deleted (or port detached), the binding_profile is empty. Environment === This has been observed with the following configuration: - OpenStack Yoga - OVN Neutron driver Logs From Nova Compute: 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest Traceback (most recent call last): 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 165, in launch 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest return self._domain.createWithFlags(flags) 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest result = proxy_call(self._autowrap, f, *args, **kwargs) 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest rv = execute(f, *args, **kwargs) 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest six.reraise(c, e, tb) 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest raise value 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest rv = meth(*args, **kwargs) 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1385, in createWithFlags 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest raise libvirtError('virDomainCreateWithFlags() failed') 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest libvirt.libvirtError: Cannot set interface MAC/vlanid to fa:16:3e:43:1e:ce/2107 for ifname ens1f0 vf 7: Operation not supported 2023-01-24 19:55:32.270 7 ERROR nova.virt.libvirt.guest 2023-01-24 19:55:32.273 7 ERROR nova.virt.libvirt.driver [req-581cd9e8-11c8-44be-9ed2-a03a5f70d0f4 802a31d98b364da79be43fe6e9566d63 76f401abee7b4e80b7efd86f2f26e3ca - default default] [instance: d2091824-1f7a-4de1-8776-8f781956130a] Failed to start libvirt guest: libvirt.libvirtError: Cannot set interface MAC/vlanid to fa:16:3e:43:1e:ce/2107 for ifname ens1f0 vf 7: Operation not supported ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2008238 Title: SRIOV port binding_profile attributes for OVS hardware offload are stripped on instance deletion or port detachment Status
[Yahoo-eng-team] [Bug 2008227] [NEW] [sqlalchemy-20] The Connection.connect() method is considered legacy as of the 1.x series of SQLAlchemy
Public bug reported: Testing patch: https://review.opendev.org/c/openstack/neutron/+/874778 Logs: https://c06c4109be1832423601-1eb2471c773c922210e88856273ba212.ssl.cf5.rackcdn.com/874778/1/check/neutron- functional-with-uwsgi/4c704c1/job-output.txt Error: 2023-02-22 20:05:33.641012 | controller | /home/zuul/src/opendev.org/openstack/neutron/neutron/db/migration/alembic_migrations/versions/newton/expand/030a959ceafa_uniq_routerports0port_id.py:64: RemovedIn20Warning: The Connection.connect() method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) ** Affects: neutron Importance: High Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: New ** Changed in: neutron Importance: Undecided => High ** Changed in: neutron Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2008227 Title: [sqlalchemy-20] The Connection.connect() method is considered legacy as of the 1.x series of SQLAlchemy Status in neutron: New Bug description: Testing patch: https://review.opendev.org/c/openstack/neutron/+/874778 Logs: https://c06c4109be1832423601-1eb2471c773c922210e88856273ba212.ssl.cf5.rackcdn.com/874778/1/check/neutron- functional-with-uwsgi/4c704c1/job-output.txt Error: 2023-02-22 20:05:33.641012 | controller | /home/zuul/src/opendev.org/openstack/neutron/neutron/db/migration/alembic_migrations/versions/newton/expand/030a959ceafa_uniq_routerports0port_id.py:64: RemovedIn20Warning: The Connection.connect() method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2008227/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2008226] [NEW] [sqlalchemy-20] Use the .begin() method of Engine
Public bug reported: Testing patch: https://review.opendev.org/c/openstack/neutron/+/874778 Logs: https://c06c4109be1832423601-1eb2471c773c922210e88856273ba212.ssl.cf5.rackcdn.com/874778/1/check/neutron- functional-with-uwsgi/4c704c1/job-output.txt Error: 2023-02-22 20:05:33.640982 | controller | /home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/db/test_migrations.py:422: RemovedIn20Warning: The current statement is being autocommitted using implicit autocommit, which will be removed in SQLAlchemy 2.0. Use the .begin() method of Engine or Connection in order to use an explicit transaction for DML and DDL statements. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) ** Affects: neutron Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2008226 Title: [sqlalchemy-20] Use the .begin() method of Engine Status in neutron: In Progress Bug description: Testing patch: https://review.opendev.org/c/openstack/neutron/+/874778 Logs: https://c06c4109be1832423601-1eb2471c773c922210e88856273ba212.ssl.cf5.rackcdn.com/874778/1/check/neutron- functional-with-uwsgi/4c704c1/job-output.txt Error: 2023-02-22 20:05:33.640982 | controller | /home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/db/test_migrations.py:422: RemovedIn20Warning: The current statement is being autocommitted using implicit autocommit, which will be removed in SQLAlchemy 2.0. Use the .begin() method of Engine or Connection in order to use an explicit transaction for DML and DDL statements. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2008226/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2008223] [NEW] [sqlalchemy-20] The Engine.execute() method is considered legacy as of the 1.x series
Public bug reported: Testing patch: https://review.opendev.org/c/openstack/neutron/+/874778 Logs: https://c06c4109be1832423601-1eb2471c773c922210e88856273ba212.ssl.cf5.rackcdn.com/874778/1/check/neutron- functional-with-uwsgi/4c704c1/job-output.txt Error: 2023-02-22 20:05:21.282046 | controller | engine = engines.create_engine( 2023-02-22 20:05:21.282060 | controller | /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional-gate/lib/python3.10/site-packages/neutron_lib/fixture.py:103: RemovedIn20Warning: The Engine.execute() method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0. All statement execution in SQLAlchemy 2.0 is performed by the Connection.execute() method of Connection, or in the ORM by the Session.execute() method of Session. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) 2023-02-22 20:05:21.282074 | controller | self.engine.execute("PRAGMA foreign_keys=ON") ** Affects: neutron Importance: High Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2008223 Title: [sqlalchemy-20] The Engine.execute() method is considered legacy as of the 1.x series Status in neutron: In Progress Bug description: Testing patch: https://review.opendev.org/c/openstack/neutron/+/874778 Logs: https://c06c4109be1832423601-1eb2471c773c922210e88856273ba212.ssl.cf5.rackcdn.com/874778/1/check/neutron- functional-with-uwsgi/4c704c1/job-output.txt Error: 2023-02-22 20:05:21.282046 | controller | engine = engines.create_engine( 2023-02-22 20:05:21.282060 | controller | /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional-gate/lib/python3.10/site-packages/neutron_lib/fixture.py:103: RemovedIn20Warning: The Engine.execute() method is considered legacy as of the 1.x series of SQLAlchemy and will be removed in 2.0. All statement execution in SQLAlchemy 2.0 is performed by the Connection.execute() method of Connection, or in the ORM by the Session.execute() method of Session. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9) 2023-02-22 20:05:21.282074 | controller | self.engine.execute("PRAGMA foreign_keys=ON") To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2008223/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2006962] Re: Running instance not visible in project context
Was messing up some things. No Bug. Sorry BR Remo ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2006962 Title: Running instance not visible in project context Status in OpenStack Compute (nova): Invalid Bug description: I'm running an Openstack cluster (ZED) with 3 controllers and 30 worker nodes. The cluster has been created on Ocata. It has been upgraded to each newer version available. I have some old instances. They are running and doing their job. Doing: openstack server list --all-projects, the instance is listed +++---+++---+ | ID | Name | Status | Networks | Image | Flavor | +++---+++---+ | df9871d8-efff-486d-aa9f- | turtle | ACTIVE | internal=10.0.3.82,| N/A (booted from volume) | p1.project_turtle.1 | | 67681be25513 || | xxx.xx.xxx.xxx || | Doing: openstack server list --project 5c41bffc00d54776a01c8c5173e9764c --print-empty, the instance is not listed ++--++--+---++ | ID | Name | Status | Networks | Image | Flavor | ++--++--+---++ ++--++--+---++ Doing: openstack server show df9871d8-efff-486d-aa9f-67681be25513, you see, that the instance is assigned to the project 5c41bffc00d54776a01c8c5173e9764c. +++ | Field | Value | +++ | OS-DCF:diskConfig | AUTO | | OS-EXT-AZ:availability_zone| nova | | OS-EXT-SRV-ATTR:host | apu-worker09.internal | | OS-EXT-SRV-ATTR:hostname | turtle | | OS-EXT-SRV-| apu-worker09.internal | | ATTR:hypervisor_hostname || | OS-EXT-SRV-ATTR:instance_name | instance-19e9 | | OS-EXT-SRV-ATTR:kernel_id || | OS-EXT-SRV-ATTR:launch_index | 0 | | OS-EXT-SRV-ATTR:ramdisk_id || | OS-EXT-SRV-ATTR:reservation_id | r-z6u1w4kw | | OS-EXT-SRV-ATTR:root_device_name | /dev/vda | | OS-EXT-SRV-ATTR:user_data | None | | OS-EXT-STS:power_state | Running| | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state| active | | OS-SRV-USG:launched_at | 2020-05-18T11:32:31.00 | | OS-SRV-USG:terminated_at | None | | accessIPv4 || | accessIPv6 || | addresses | internal=10.0.3.82, xxx.xx.xxx.xxx | | config_drive || | created| 2020-05-18T11:32:13Z | | description| None | | flavor | disk='200', ephemeral='0', , origi | || nal_name='p1.project_turtle.1',| || ram='16384', swap='0', vcpus='16' | | hostId | 94b099c90d9b40e93ab62568a78b6e9098 | || ca7f78a1833ace100d127c | | host_status| UP | | id | df9871d8-efff-486d-aa9f- | || 67681be25513 | | image | N/A (booted from volume) | | key_name | rema | | locked