[Yahoo-eng-team] [Bug 1845243] [NEW] Nested 'path' query param in console URL breaks serialproxy
Public bug reported: Description === Change I2ddf0f4d768b698e980594dd67206464a9cea37b changed all console URLs to have the token attached as a nested query parameter inside an outer "path" query parameter, e.g. "?path=?token=***". While this was necessary for NoVNC support, it appears to have broken Ironic serial consoles, which use the nova-serialproxy service, which apparently is not aware that it needs to parse the token in this manner. It uses websockify. To test, I enabled debug mode and added some extra logging in the nova- serialproxy to prove that "token" was empty in this function: https://github.com/openstack/nova/blob/stable/rocky/nova/objects/console_auth_token.py#L143 Steps to reproduce == 1. Have Ironic set up to allow web/serial consoles (https://docs.openstack.org/ironic/pike/admin/console.html). I believe this also requires having nova-serialproxy deployed. 2. Launch an Ironic instance and attempt to access the console via Horizon. Expected result === The serial console loads in the web interface; "Status: Opened" is displayed in the bottom. Console is interactive assuming the node has booted properly. Actual result = The serial console loads, but is blank; "Status: Closed" is displayed in the bottom. nova-serialproxy logs indicate the token was expired or invalid. The console never becomes interactive, but does not indicate there is an error in Horizon (at least on my deployment.) Environment === OpenStack Rocky release, deployed with Kolla-Ansible. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1845243 Title: Nested 'path' query param in console URL breaks serialproxy Status in OpenStack Compute (nova): New Bug description: Description === Change I2ddf0f4d768b698e980594dd67206464a9cea37b changed all console URLs to have the token attached as a nested query parameter inside an outer "path" query parameter, e.g. "?path=?token=***". While this was necessary for NoVNC support, it appears to have broken Ironic serial consoles, which use the nova-serialproxy service, which apparently is not aware that it needs to parse the token in this manner. It uses websockify. To test, I enabled debug mode and added some extra logging in the nova-serialproxy to prove that "token" was empty in this function: https://github.com/openstack/nova/blob/stable/rocky/nova/objects/console_auth_token.py#L143 Steps to reproduce == 1. Have Ironic set up to allow web/serial consoles (https://docs.openstack.org/ironic/pike/admin/console.html). I believe this also requires having nova-serialproxy deployed. 2. Launch an Ironic instance and attempt to access the console via Horizon. Expected result === The serial console loads in the web interface; "Status: Opened" is displayed in the bottom. Console is interactive assuming the node has booted properly. Actual result = The serial console loads, but is blank; "Status: Closed" is displayed in the bottom. nova-serialproxy logs indicate the token was expired or invalid. The console never becomes interactive, but does not indicate there is an error in Horizon (at least on my deployment.) Environment === OpenStack Rocky release, deployed with Kolla-Ansible. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1845243/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1864122] [NEW] Instances (bare metal) queue for long time when managing a large amount of Ironic nodes
Public bug reported: Description === We have two deployments, one with ~150 bare metal nodes, and another with ~300. These are each managed by one nova-compute process running the Ironic driver. After upgrading from the Ocata release, we noticed that instance launches would be stuck in the spawning state for a long time, up to 30 minutes to an hour in some cases. After investigation, the root cause appeared to be contention between the update_resources periodic task and the instance claim step. There is one semaphore "compute_resources" that is used to control every access within the resource_tracker. In our case, what was happening was the update_resources job, which runs every minute by default, was constantly queuing up accesses to this semaphore, because each hypervisor is updated independently, in series. This meant that, for us, each Ironic node was being processed and was holding the semaphore during its update (which took about 2-5 seconds in practice.) Multiply this by 150 and our update task was running constantly. Because an instance claim also needs to access this semaphore, this led to instances getting stuck in the "Build" state, after scheduling, for tens of minutes on average. There seemed to be some probabilistic effect here, which I hypothesize is related to the locking mechanism not using a "fair" lock (first-come, first-served) by default. Steps to reproduce == I suspect this is only visible on deployments of >100 Ironic nodes or so (and, they have to be managed by one nova-compute-ironic service.) Due to the non-deterministic nature of the lock, the behavior is sporadic, but launching an instance is enough to observe the behavior. Expected result === Instance proceeds to networking phase of creation after <60 seconds. Actual result = Instance stuck in BUILD state for 30-60 minutes before proceeding to networking phase. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ Nova 20.0.1 2. Which hypervisor did you use? Ironic 2. Which storage type did you use? N/A 3. Which networking type did you use? Neutron/OVS Logs & Configs == Links = First report, on openstack-discuss: http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006192.html ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1864122 Title: Instances (bare metal) queue for long time when managing a large amount of Ironic nodes Status in OpenStack Compute (nova): New Bug description: Description === We have two deployments, one with ~150 bare metal nodes, and another with ~300. These are each managed by one nova-compute process running the Ironic driver. After upgrading from the Ocata release, we noticed that instance launches would be stuck in the spawning state for a long time, up to 30 minutes to an hour in some cases. After investigation, the root cause appeared to be contention between the update_resources periodic task and the instance claim step. There is one semaphore "compute_resources" that is used to control every access within the resource_tracker. In our case, what was happening was the update_resources job, which runs every minute by default, was constantly queuing up accesses to this semaphore, because each hypervisor is updated independently, in series. This meant that, for us, each Ironic node was being processed and was holding the semaphore during its update (which took about 2-5 seconds in practice.) Multiply this by 150 and our update task was running constantly. Because an instance claim also needs to access this semaphore, this led to instances getting stuck in the "Build" state, after scheduling, for tens of minutes on average. There seemed to be some probabilistic effect here, which I hypothesize is related to the locking mechanism not using a "fair" lock (first-come, first-served) by default. Steps to reproduce == I suspect this is only visible on deployments of >100 Ironic nodes or so (and, they have to be managed by one nova-compute-ironic service.) Due to the non-deterministic nature of the lock, the behavior is sporadic, but launching an instance is enough to observe the behavior. Expected result === Instance proceeds to networking phase of creation after <60 seconds. Actual result = Instance stuck in BUILD state for 30-60 minutes before proceeding to networking phase. Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ Nova 20.0.1 2. Which hypervisor did you use? Ironic
[Yahoo-eng-team] [Bug 1878496] [NEW] RFE: Support for direct-mapping auto-provisioned project/role names
Public bug reported: It is currently possible for an IdP to specify multiple values in an assertion (e.g., for groups a user is a member of) and have each of those values mapped to an individual entities. This allows to map a user into multiple Keystone groups. However, this functionality does not yet exist for the auto-provisioned Keystone projects. This RFE is for extending this functionality so that multiple projects can be provisioned if they are being mapped from a multi-value assertion. Consider that a user is a member of several groups in the IdP, and you want to provision one Keystone project per group. That is currently not supported, though it is very similar to the group functionality. This can be extended to project roles as well, though there will be a limitation: since the roles themselves are not auto-provisioned, they must already exist when the assertion is mapped. If the roles did exist, though, the mapping would work fine. ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1878496 Title: RFE: Support for direct-mapping auto-provisioned project/role names Status in OpenStack Identity (keystone): New Bug description: It is currently possible for an IdP to specify multiple values in an assertion (e.g., for groups a user is a member of) and have each of those values mapped to an individual entities. This allows to map a user into multiple Keystone groups. However, this functionality does not yet exist for the auto-provisioned Keystone projects. This RFE is for extending this functionality so that multiple projects can be provisioned if they are being mapped from a multi-value assertion. Consider that a user is a member of several groups in the IdP, and you want to provision one Keystone project per group. That is currently not supported, though it is very similar to the group functionality. This can be extended to project roles as well, though there will be a limitation: since the roles themselves are not auto-provisioned, they must already exist when the assertion is mapped. If the roles did exist, though, the mapping would work fine. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1878496/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1880252] [NEW] RFE: allow regexes in blacklist and whitelist conditionals
Public bug reported: Currently a regex can be used in the "any_of_one" and "not_any_of" conditionals, allowing operators to specify rules not bound to a static set of expected values. However, this is not supported for the "whitelist" or "blacklist" conditional type. Having regex support in these types would bring more flexibility when crafting mappings, for example to only map an IdP group to a Keystone group if it has a pattern like "CloudUsers-.*". ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1880252 Title: RFE: allow regexes in blacklist and whitelist conditionals Status in OpenStack Identity (keystone): New Bug description: Currently a regex can be used in the "any_of_one" and "not_any_of" conditionals, allowing operators to specify rules not bound to a static set of expected values. However, this is not supported for the "whitelist" or "blacklist" conditional type. Having regex support in these types would bring more flexibility when crafting mappings, for example to only map an IdP group to a Keystone group if it has a pattern like "CloudUsers-.*". To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1880252/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1888029] [NEW] RFE: allow automatically deprovisioning mapped assignments
Public bug reported: Currently there is support for auto-provisioning role assignments within projects that are created for the user when they authenticate. However, there is no way to offboard users from these projects, either due to mapping changes or due to dynamic changes in their group/role memberships from the IdP. It would be nice to have opt-in support for automatically deprovisioning them as well for security/policy reasons. ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1888029 Title: RFE: allow automatically deprovisioning mapped assignments Status in OpenStack Identity (keystone): New Bug description: Currently there is support for auto-provisioning role assignments within projects that are created for the user when they authenticate. However, there is no way to offboard users from these projects, either due to mapping changes or due to dynamic changes in their group/role memberships from the IdP. It would be nice to have opt-in support for automatically deprovisioning them as well for security/policy reasons. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1888029/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1942469] [NEW] Network delete notifications no longer container segment info
Public bug reported: Change I07c70db027f2ae03ffb5a95072e019e8a5fdc411 made it so PRECOMMIT_DELETE and AFTER_DELETE both receive the network dict fetched from the DB (decorated with any resource_extend hooks). However, this network representation does not include segment information like segmentation_id or network_type. The networking-generic-switch ML2 plugin assumes that such information is present on the delete postcommit hook and needs it to do its job: https://opendev.org/openstack/networking-generic- switch/src/branch/master/networking_generic_switch/generic_switch_mech.py#L164-L166 As a result networking-generic-switch cannot currently be deployed. Example error: ``` 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers [req-99b0b44f-171d-41a3-b99d-1cccb27b3006 bcb7ef06be674b9199b36e8f18b546f3 570aad8999f7499db99eae22fe9b29bb - default default] Mechanism driver 'genericswitch' failed in delete_network_postcommit: KeyError: 'provider:network_type' 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 479, in _call_on_drivers 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers File "/var/lib/kolla/venv/lib/python2.7/site-packages/networking_generic_switch/generic_switch_mech.py", line 315, in delete_network_postcommit 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers provider_type = network['provider:network_type'] 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers KeyError: 'provider:network_type' 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers 2021-09-02 12:27:57.440 30 ERROR neutron.plugins.ml2.plugin [req-99b0b44f-171d-41a3-b99d-1cccb27b3006 bcb7ef06be674b9199b36e8f18b546f3 570aad8999f7499db99eae22fe9b29bb - default default] mechanism_manager.delete_network_postcommit failed: MechanismDriverError ``` ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1942469 Title: Network delete notifications no longer container segment info Status in neutron: New Bug description: Change I07c70db027f2ae03ffb5a95072e019e8a5fdc411 made it so PRECOMMIT_DELETE and AFTER_DELETE both receive the network dict fetched from the DB (decorated with any resource_extend hooks). However, this network representation does not include segment information like segmentation_id or network_type. The networking-generic-switch ML2 plugin assumes that such information is present on the delete postcommit hook and needs it to do its job: https://opendev.org/openstack/networking-generic- switch/src/branch/master/networking_generic_switch/generic_switch_mech.py#L164-L166 As a result networking-generic-switch cannot currently be deployed. Example error: ``` 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers [req-99b0b44f-171d-41a3-b99d-1cccb27b3006 bcb7ef06be674b9199b36e8f18b546f3 570aad8999f7499db99eae22fe9b29bb - default default] Mechanism driver 'genericswitch' failed in delete_network_postcommit: KeyError: 'provider:network_type' 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers File "/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 479, in _call_on_drivers 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers File "/var/lib/kolla/venv/lib/python2.7/site-packages/networking_generic_switch/generic_switch_mech.py", line 315, in delete_network_postcommit 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers provider_type = network['provider:network_type'] 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers KeyError: 'provider:network_type' 2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers 2021-09-02 12:27:57.440 30 ERROR neutron.plugins.ml2.plugin [req-99b0b44f-171d-41a3-b99d-1cccb27b3006 bcb7ef06be674b9199b36e8f18b546f3 570aad8999f7499db99eae22fe9b29bb - default default] mechanism_manager.delete_network_postcommit failed: MechanismDriverError ``` To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1942469/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp