[Yahoo-eng-team] [Bug 2023018] Re: scaling governors are optional for some OS platforms
Reviewed: https://review.opendev.org/c/openstack/nova/+/885352 Committed: https://opendev.org/openstack/nova/commit/2c4421568ea62e66257b55c08092de3e0303fb0a Submitter: "Zuul (22348)" Branch:master commit 2c4421568ea62e66257b55c08092de3e0303fb0a Author: Sylvain Bauza Date: Tue Jun 6 11:56:32 2023 +0200 cpu: make governors to be optional Change-Id: Ifb7d001cfdb95b1b0aa29f45c0ef71c0673e1760 Closes-Bug: #2023018 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2023018 Title: scaling governors are optional for some OS platforms Status in OpenStack Compute (nova): Fix Released Bug description: Some OS platforms don't use cpufreq, so operators should be able to just offline their CPUs. For the moment, even if the config option CPU management strategy is 'cpu_state', we return an exception if so. Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Traceback (most recent call last): Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/filesystem.py", line 37, in read_sys Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service with open(os.path.join(SYS, path), mode='r') as data: Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service FileNotFoundError: [Errno 2] No such file or directory: '/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor' Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service The above exception was the direct cause of the following exception: Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Traceback (most recent call last): Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/usr/local/lib/python3.10/dist-packages/oslo_service/service.py", line 806, in run_service Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service service.start() Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/service.py", line 162, in start Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service self.manager.init_host(self.service_ref) Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/compute/manager.py", line 1608, in init_host Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service self.driver.init_host(host=self.host) Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 825, in init_host Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service libvirt_cpu.validate_all_dedicated_cpus() Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 143, in validate_all_dedicated_cpus Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service governors.add(pcpu.governor) Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/cpu/api.py", line 63, in governor Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service return core.get_governor(self.ident) Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/virt/libvirt/cpu/core.py", line 69, in get_governor Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service return filesystem.read_sys( Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service File "/opt/stack/nova/nova/filesystem.py", line 40, in read_sys Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service raise exception.FileNotFound(file_path=path) from exc Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service nova.exception.FileNotFound: File /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor could not be found. Jun 05 14:47:45 sbauza-dev2 nova-compute[75181]: ERROR oslo_service.service Let's just to support the CPU state if so. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2023018/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2038840] [NEW] CPU state management fails if cpu0 is in dedicated set
Public bug reported: If an operator configures cpu0 in the dedicated set and enables state management, nova-compute will fail on startup with this obscure error: Oct 06 20:08:43.195137 np0035436890 nova-compute[104711]: ERROR oslo_service.service nova.exception.FileNotFound: File /sys/devices/system/cpu/cpu0/online could not be found. The problem is that cpu0 is not hot-pluggable and thus has no online knob. Nova should log a better error message in this case, at least. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2038840 Title: CPU state management fails if cpu0 is in dedicated set Status in OpenStack Compute (nova): New Bug description: If an operator configures cpu0 in the dedicated set and enables state management, nova-compute will fail on startup with this obscure error: Oct 06 20:08:43.195137 np0035436890 nova-compute[104711]: ERROR oslo_service.service nova.exception.FileNotFound: File /sys/devices/system/cpu/cpu0/online could not be found. The problem is that cpu0 is not hot-pluggable and thus has no online knob. Nova should log a better error message in this case, at least. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2038840/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2032770] Re: [OVN] port creation with --enable-uplink-status-propagation does not work with OVN mechanism driver
** No longer affects: neutron (Ubuntu Focal) ** No longer affects: cloud-archive/victoria -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2032770 Title: [OVN] port creation with --enable-uplink-status-propagation does not work with OVN mechanism driver Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive antelope series: New Status in Ubuntu Cloud Archive bobcat series: New Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive wallaby series: New Status in Ubuntu Cloud Archive xena series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: New Status in neutron: Fix Released Status in neutron package in Ubuntu: New Status in neutron source package in Jammy: New Status in neutron source package in Lunar: New Bug description: The port "uplink_status_propagation" feature does not work when OVN is used as the mechanism driver. The reproducer below is working fine with openvswitch as the mechanism driver, but not with the OVN: openstack port create --binding-profile trusted=true --enable-uplink- status-propagation --net private --vnic-type direct test-sriov-bond- enable-uplink-status-propagation-vm-1-port-1 The command fails with the following error when OVN is the mech driver: BadRequestException: 400: Client Error for url: https://10.5.3.81:9696/v2.0/ports, Unrecognized attribute(s) 'propagate_uplink_status' With ML2/OVS, the port creation command above succeeds without any errors. As for the ml2_conf, "uplink_status_propagation" is listed in the extension drivers: [ml2] extension_drivers=port_security,dns_domain_ports,uplink_status_propagation type_drivers = geneve,gre,vlan,flat,local tenant_network_types = geneve,gre,vlan,flat,local mechanism_drivers = ovn,sriovnicswitch /*...*/ I also found the following document which shows the feature gap between ML2/OVS and OVN, but the uplink_status_propagation is not listed: https://docs.openstack.org/neutron/latest/ovn/gaps.html#id9 , maybe this page can be updated as well. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2032770/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2032770] Re: [OVN] port creation with --enable-uplink-status-propagation does not work with OVN mechanism driver
** Also affects: neutron (Ubuntu) Importance: Undecided Status: New ** Changed in: neutron (Ubuntu) Assignee: (unassigned) => Mustafa Kemal Gilor (mustafakemalgilor) ** Also affects: neutron (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: neutron (Ubuntu Mantic) Importance: Undecided Assignee: Mustafa Kemal Gilor (mustafakemalgilor) Status: New ** Also affects: neutron (Ubuntu Lunar) Importance: Undecided Status: New ** Also affects: neutron (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: cloud-archive Importance: Undecided Status: New ** Also affects: cloud-archive/yoga Importance: Undecided Status: New ** Also affects: cloud-archive/victoria Importance: Undecided Status: New ** Also affects: cloud-archive/zed Importance: Undecided Status: New ** Also affects: cloud-archive/bobcat Importance: Undecided Status: New ** Also affects: cloud-archive/antelope Importance: Undecided Status: New ** Also affects: cloud-archive/ussuri Importance: Undecided Status: New ** Also affects: cloud-archive/xena Importance: Undecided Status: New ** Also affects: cloud-archive/wallaby Importance: Undecided Status: New ** No longer affects: neutron (Ubuntu Mantic) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2032770 Title: [OVN] port creation with --enable-uplink-status-propagation does not work with OVN mechanism driver Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive antelope series: New Status in Ubuntu Cloud Archive bobcat series: New Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive wallaby series: New Status in Ubuntu Cloud Archive xena series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: New Status in neutron: Fix Released Status in neutron package in Ubuntu: New Status in neutron source package in Jammy: New Status in neutron source package in Lunar: New Bug description: The port "uplink_status_propagation" feature does not work when OVN is used as the mechanism driver. The reproducer below is working fine with openvswitch as the mechanism driver, but not with the OVN: openstack port create --binding-profile trusted=true --enable-uplink- status-propagation --net private --vnic-type direct test-sriov-bond- enable-uplink-status-propagation-vm-1-port-1 The command fails with the following error when OVN is the mech driver: BadRequestException: 400: Client Error for url: https://10.5.3.81:9696/v2.0/ports, Unrecognized attribute(s) 'propagate_uplink_status' With ML2/OVS, the port creation command above succeeds without any errors. As for the ml2_conf, "uplink_status_propagation" is listed in the extension drivers: [ml2] extension_drivers=port_security,dns_domain_ports,uplink_status_propagation type_drivers = geneve,gre,vlan,flat,local tenant_network_types = geneve,gre,vlan,flat,local mechanism_drivers = ovn,sriovnicswitch /*...*/ I also found the following document which shows the feature gap between ML2/OVS and OVN, but the uplink_status_propagation is not listed: https://docs.openstack.org/neutron/latest/ovn/gaps.html#id9 , maybe this page can be updated as well. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2032770/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2025946] Re: Neutron 504 Gateway Timeout Openstack Kolla-Ansible : Ussuri
Hello Adelia: The problem you have is that there are other processes running in this host on this TCP port [1]. Try first to list what processes are using this port, stop them and then restart the OVN controller. I'm closing this bug because it doesn't seem to be a Neutron problem but a system/backend issue. Regards. [1]https://mail.openvswitch.org/pipermail/ovs- discuss/2017-February/043597.html ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2025946 Title: Neutron 504 Gateway Timeout Openstack Kolla-Ansible : Ussuri Status in neutron: Invalid Bug description: i have 3 controller openstack, but the network agent often 504 Gateway timeout. when i see neutron_server.log, this logs showed up in one of my controllers 2023-05-24 10:00:23.314 687 ERROR neutron.api.v2.resource [req-a1f3e58a-00f7-4ed9-b8e5-6c538dc5d5a3 3fe50ccef00f49e3b1b0bbd58705a930 c7d2001e7a2c4c32b9f2a3657f29b6b0 - default default] index failed: No details.: ovsdbapp.exceptions.TimeoutException: Commands [] exceeded timeout 180 seconds 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource Traceback (most recent call last): 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource File "/var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 153, in queue_txn 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource self.txns.put(txn, timeout=self.timeout) 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource File "/var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 51, in put 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource super(TransactionQueue, self).put(*args, **kwargs) 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/queue.py", line 264, in put 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource result = waiter.wait() 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource return get_hub().switch() 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource return self.greenlet.switch() 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource queue.Full 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource During handling of the above exception, another exception occurred: 2023-07-05 09:49:03.453 670 ERROR neutron.api.v2.resource How do i solve this? To manage notifications about this bug go to:
[Yahoo-eng-team] [Bug 1998789] Fix included in openstack/keystone 22.0.1
This issue was fixed in the openstack/keystone 22.0.1 release. ** Changed in: cloud-archive/zed Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1998789 Title: [SRU] PooledLDAPHandler.result3 does not release pool connection back when an exception is raised Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive victoria series: New Status in Ubuntu Cloud Archive wallaby series: New Status in Ubuntu Cloud Archive xena series: New Status in Ubuntu Cloud Archive yoga series: Fix Released Status in Ubuntu Cloud Archive zed series: Fix Released Status in OpenStack Identity (keystone): Fix Released Status in keystone package in Ubuntu: New Status in keystone source package in Focal: New Status in keystone source package in Jammy: New Bug description: [Impact] This SRU is a backport of https://review.opendev.org/c/openstack/keystone/+/866723 to the respective Ubuntu and UCA releases. The patch is merged to the all respective upstream branches (master & stable/[u,v,w,x,y,z]). This SRU intends to fix a denial-of-service bug that happens when keystone uses pooled ldap connections. In pooled ldap connection mode, keystone borrows a connection from the pool, do the LDAP operation and release it back to the pool. But, if an exception or error happens while the LDAP connection is still borrowed, Keystone fails to release the connection back to the pool, hogging it forever. If this happens for all the pooled connections, the connection pool will be exhausted and Keystone will no longer be able to perform LDAP operations. The fix corrects this behavior by allowing the connection to release back to the pool even if an exception/error happens during the LDAP operation. [Test Case] - Deploy an LDAP server of your choice - Fill it with many data so the search takes more than `pool_connection_timeout` seconds - Define a keystone domain with the LDAP driver with following options: [ldap] use_pool = True page_size = 100 pool_connection_timeout = 3 pool_retry_max = 3 pool_size = 10 - Point the domain to the LDAP server - Try to login to the OpenStack dashboard, or try to do anything that uses the LDAP user - Observe the /var/log/apache2/keystone_error.log, it should contain ldap.TIMEOUT() stack traces followed by `ldappool.MaxConnectionReachedError` stack traces To confirm the fix, repeat the scenario and observe that the "/var/log/apache2/keystone_error.log" does not contain `ldappool.MaxConnectionReachedError` stack traces and LDAP operation in motion is successful (e.g. OpenStack Dashboard login) [Regression Potential] The patch is quite trivial and should not affect any deployment in a negative way. The LDAP pool functionality can be disabled by setting "use_pool=False" in case of any regression. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1998789/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1998789] Fix included in openstack/keystone 21.0.1
This issue was fixed in the openstack/keystone 21.0.1 release. ** Changed in: cloud-archive/yoga Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1998789 Title: [SRU] PooledLDAPHandler.result3 does not release pool connection back when an exception is raised Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive victoria series: New Status in Ubuntu Cloud Archive wallaby series: New Status in Ubuntu Cloud Archive xena series: New Status in Ubuntu Cloud Archive yoga series: Fix Released Status in Ubuntu Cloud Archive zed series: Fix Released Status in OpenStack Identity (keystone): Fix Released Status in keystone package in Ubuntu: New Status in keystone source package in Focal: New Status in keystone source package in Jammy: New Bug description: [Impact] This SRU is a backport of https://review.opendev.org/c/openstack/keystone/+/866723 to the respective Ubuntu and UCA releases. The patch is merged to the all respective upstream branches (master & stable/[u,v,w,x,y,z]). This SRU intends to fix a denial-of-service bug that happens when keystone uses pooled ldap connections. In pooled ldap connection mode, keystone borrows a connection from the pool, do the LDAP operation and release it back to the pool. But, if an exception or error happens while the LDAP connection is still borrowed, Keystone fails to release the connection back to the pool, hogging it forever. If this happens for all the pooled connections, the connection pool will be exhausted and Keystone will no longer be able to perform LDAP operations. The fix corrects this behavior by allowing the connection to release back to the pool even if an exception/error happens during the LDAP operation. [Test Case] - Deploy an LDAP server of your choice - Fill it with many data so the search takes more than `pool_connection_timeout` seconds - Define a keystone domain with the LDAP driver with following options: [ldap] use_pool = True page_size = 100 pool_connection_timeout = 3 pool_retry_max = 3 pool_size = 10 - Point the domain to the LDAP server - Try to login to the OpenStack dashboard, or try to do anything that uses the LDAP user - Observe the /var/log/apache2/keystone_error.log, it should contain ldap.TIMEOUT() stack traces followed by `ldappool.MaxConnectionReachedError` stack traces To confirm the fix, repeat the scenario and observe that the "/var/log/apache2/keystone_error.log" does not contain `ldappool.MaxConnectionReachedError` stack traces and LDAP operation in motion is successful (e.g. OpenStack Dashboard login) [Regression Potential] The patch is quite trivial and should not affect any deployment in a negative way. The LDAP pool functionality can be disabled by setting "use_pool=False" in case of any regression. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1998789/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp