[Yahoo-eng-team] [Bug 1845243] [NEW] Nested 'path' query param in console URL breaks serialproxy

2019-09-24 Thread Jason Anderson
Public bug reported:

Description
===

Change I2ddf0f4d768b698e980594dd67206464a9cea37b changed all console
URLs to have the token attached as a nested query parameter inside an
outer "path" query parameter, e.g. "?path=?token=***".

While this was necessary for NoVNC support, it appears to have broken
Ironic serial consoles, which use the nova-serialproxy service, which
apparently is not aware that it needs to parse the token in this manner.
It uses websockify.

To test, I enabled debug mode and added some extra logging in the nova-
serialproxy to prove that "token" was empty in this function:
https://github.com/openstack/nova/blob/stable/rocky/nova/objects/console_auth_token.py#L143

Steps to reproduce
==

1. Have Ironic set up to allow web/serial consoles 
(https://docs.openstack.org/ironic/pike/admin/console.html). I believe this 
also requires having nova-serialproxy deployed.
2. Launch an Ironic instance and attempt to access the console via Horizon.


Expected result
===

The serial console loads in the web interface; "Status: Opened" is
displayed in the bottom. Console is interactive assuming the node has
booted properly.


Actual result
=

The serial console loads, but is blank; "Status: Closed" is displayed in
the bottom. nova-serialproxy logs indicate the token was expired or
invalid. The console never becomes interactive, but does not indicate
there is an error in Horizon (at least on my deployment.)

Environment
===

OpenStack Rocky release, deployed with Kolla-Ansible.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1845243

Title:
  Nested 'path' query param in console URL breaks serialproxy

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===

  Change I2ddf0f4d768b698e980594dd67206464a9cea37b changed all console
  URLs to have the token attached as a nested query parameter inside an
  outer "path" query parameter, e.g. "?path=?token=***".

  While this was necessary for NoVNC support, it appears to have broken
  Ironic serial consoles, which use the nova-serialproxy service, which
  apparently is not aware that it needs to parse the token in this
  manner. It uses websockify.

  To test, I enabled debug mode and added some extra logging in the
  nova-serialproxy to prove that "token" was empty in this function:
  
https://github.com/openstack/nova/blob/stable/rocky/nova/objects/console_auth_token.py#L143

  Steps to reproduce
  ==

  1. Have Ironic set up to allow web/serial consoles 
(https://docs.openstack.org/ironic/pike/admin/console.html). I believe this 
also requires having nova-serialproxy deployed.
  2. Launch an Ironic instance and attempt to access the console via Horizon.

  
  Expected result
  ===

  The serial console loads in the web interface; "Status: Opened" is
  displayed in the bottom. Console is interactive assuming the node has
  booted properly.

  
  Actual result
  =

  The serial console loads, but is blank; "Status: Closed" is displayed
  in the bottom. nova-serialproxy logs indicate the token was expired or
  invalid. The console never becomes interactive, but does not indicate
  there is an error in Horizon (at least on my deployment.)

  Environment
  ===

  OpenStack Rocky release, deployed with Kolla-Ansible.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1845243/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1864122] [NEW] Instances (bare metal) queue for long time when managing a large amount of Ironic nodes

2020-02-20 Thread Jason Anderson
Public bug reported:

Description
===
We have two deployments, one with ~150 bare metal nodes, and another with ~300. 
These are each managed by one nova-compute process running the Ironic driver. 
After upgrading from the Ocata release, we noticed that instance launches would 
be stuck in the spawning state for a long time, up to 30 minutes to an hour in 
some cases.

After investigation, the root cause appeared to be contention between
the update_resources periodic task and the instance claim step. There is
one semaphore "compute_resources" that is used to control every access
within the resource_tracker. In our case, what was happening was the
update_resources job, which runs every minute by default, was constantly
queuing up accesses to this semaphore, because each hypervisor is
updated independently, in series. This meant that, for us, each Ironic
node was being processed and was holding the semaphore during its update
(which took about 2-5 seconds in practice.) Multiply this by 150 and our
update task was running constantly. Because an instance claim also needs
to access this semaphore, this led to instances getting stuck in the
"Build" state, after scheduling, for tens of minutes on average. There
seemed to be some probabilistic effect here, which I hypothesize is
related to the locking mechanism not using a "fair" lock (first-come,
first-served) by default.

Steps to reproduce
==
I suspect this is only visible on deployments of >100 Ironic nodes or so (and, 
they have to be managed by one nova-compute-ironic service.) Due to the 
non-deterministic nature of the lock, the behavior is sporadic, but launching 
an instance is enough to observe the behavior.

Expected result
===
Instance proceeds to networking phase of creation after <60 seconds.

Actual result
=
Instance stuck in BUILD state for 30-60 minutes before proceeding to networking 
phase.

Environment
===
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/
   Nova 20.0.1

2. Which hypervisor did you use?
   Ironic

2. Which storage type did you use?
   N/A

3. Which networking type did you use?
   Neutron/OVS

Logs & Configs
==

Links
=
First report, on openstack-discuss: 
http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006192.html

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1864122

Title:
  Instances (bare metal) queue for long time when managing a large
  amount of Ironic nodes

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  We have two deployments, one with ~150 bare metal nodes, and another with 
~300. These are each managed by one nova-compute process running the Ironic 
driver. After upgrading from the Ocata release, we noticed that instance 
launches would be stuck in the spawning state for a long time, up to 30 minutes 
to an hour in some cases.

  After investigation, the root cause appeared to be contention between
  the update_resources periodic task and the instance claim step. There
  is one semaphore "compute_resources" that is used to control every
  access within the resource_tracker. In our case, what was happening
  was the update_resources job, which runs every minute by default, was
  constantly queuing up accesses to this semaphore, because each
  hypervisor is updated independently, in series. This meant that, for
  us, each Ironic node was being processed and was holding the semaphore
  during its update (which took about 2-5 seconds in practice.) Multiply
  this by 150 and our update task was running constantly. Because an
  instance claim also needs to access this semaphore, this led to
  instances getting stuck in the "Build" state, after scheduling, for
  tens of minutes on average. There seemed to be some probabilistic
  effect here, which I hypothesize is related to the locking mechanism
  not using a "fair" lock (first-come, first-served) by default.

  Steps to reproduce
  ==
  I suspect this is only visible on deployments of >100 Ironic nodes or so 
(and, they have to be managed by one nova-compute-ironic service.) Due to the 
non-deterministic nature of the lock, the behavior is sporadic, but launching 
an instance is enough to observe the behavior.

  Expected result
  ===
  Instance proceeds to networking phase of creation after <60 seconds.

  Actual result
  =
  Instance stuck in BUILD state for 30-60 minutes before proceeding to 
networking phase.

  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
list for all releases: http://docs.openstack.org/releases/
 Nova 20.0.1

  2. Which hypervisor did you use?
 Ironic

  

[Yahoo-eng-team] [Bug 1878496] [NEW] RFE: Support for direct-mapping auto-provisioned project/role names

2020-05-13 Thread Jason Anderson
Public bug reported:

It is currently possible for an IdP to specify multiple values in an
assertion (e.g., for groups a user is a member of) and have each of
those values mapped to an individual entities. This allows to map a user
into multiple Keystone groups. However, this functionality does not yet
exist for the auto-provisioned Keystone projects. This RFE is for
extending this functionality so that multiple projects can be
provisioned if they are being mapped from a multi-value assertion.

Consider that a user is a member of several groups in the IdP, and you
want to provision one Keystone project per group. That is currently not
supported, though it is very similar to the group functionality.

This can be extended to project roles as well, though there will be a
limitation: since the roles themselves are not auto-provisioned, they
must already exist when the assertion is mapped. If the roles did exist,
though, the mapping would work fine.

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1878496

Title:
  RFE: Support for direct-mapping auto-provisioned project/role names

Status in OpenStack Identity (keystone):
  New

Bug description:
  It is currently possible for an IdP to specify multiple values in an
  assertion (e.g., for groups a user is a member of) and have each of
  those values mapped to an individual entities. This allows to map a
  user into multiple Keystone groups. However, this functionality does
  not yet exist for the auto-provisioned Keystone projects. This RFE is
  for extending this functionality so that multiple projects can be
  provisioned if they are being mapped from a multi-value assertion.

  Consider that a user is a member of several groups in the IdP, and you
  want to provision one Keystone project per group. That is currently
  not supported, though it is very similar to the group functionality.

  This can be extended to project roles as well, though there will be a
  limitation: since the roles themselves are not auto-provisioned, they
  must already exist when the assertion is mapped. If the roles did
  exist, though, the mapping would work fine.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1878496/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1880252] [NEW] RFE: allow regexes in blacklist and whitelist conditionals

2020-05-22 Thread Jason Anderson
Public bug reported:

Currently a regex can be used in the "any_of_one" and "not_any_of"
conditionals, allowing operators to specify rules not bound to a static
set of expected values. However, this is not supported for the
"whitelist" or "blacklist" conditional type.

Having regex support in these types would bring more flexibility when
crafting mappings, for example to only map an IdP group to a Keystone
group if it has a pattern like "CloudUsers-.*".

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1880252

Title:
  RFE: allow regexes in blacklist and whitelist conditionals

Status in OpenStack Identity (keystone):
  New

Bug description:
  Currently a regex can be used in the "any_of_one" and "not_any_of"
  conditionals, allowing operators to specify rules not bound to a
  static set of expected values. However, this is not supported for the
  "whitelist" or "blacklist" conditional type.

  Having regex support in these types would bring more flexibility when
  crafting mappings, for example to only map an IdP group to a Keystone
  group if it has a pattern like "CloudUsers-.*".

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1880252/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1888029] [NEW] RFE: allow automatically deprovisioning mapped assignments

2020-07-17 Thread Jason Anderson
Public bug reported:

Currently there is support for auto-provisioning role assignments within
projects that are created for the user when they authenticate. However,
there is no way to offboard users from these projects, either due to
mapping changes or due to dynamic changes in their group/role
memberships from the IdP.

It would be nice to have opt-in support for automatically deprovisioning
them as well for security/policy reasons.

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1888029

Title:
  RFE: allow automatically deprovisioning mapped assignments

Status in OpenStack Identity (keystone):
  New

Bug description:
  Currently there is support for auto-provisioning role assignments
  within projects that are created for the user when they authenticate.
  However, there is no way to offboard users from these projects, either
  due to mapping changes or due to dynamic changes in their group/role
  memberships from the IdP.

  It would be nice to have opt-in support for automatically
  deprovisioning them as well for security/policy reasons.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1888029/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1942469] [NEW] Network delete notifications no longer container segment info

2021-09-02 Thread Jason Anderson
Public bug reported:

Change I07c70db027f2ae03ffb5a95072e019e8a5fdc411 made it so
PRECOMMIT_DELETE and AFTER_DELETE both receive the network dict fetched
from the DB (decorated with any resource_extend hooks). However, this
network representation does not include segment information like
segmentation_id or network_type.

The networking-generic-switch ML2 plugin assumes that such information
is present on the delete postcommit hook and needs it to do its job:
https://opendev.org/openstack/networking-generic-
switch/src/branch/master/networking_generic_switch/generic_switch_mech.py#L164-L166

As a result networking-generic-switch cannot currently be deployed.

Example error:

```
2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers 
[req-99b0b44f-171d-41a3-b99d-1cccb27b3006 bcb7ef06be674b9199b36e8f18b546f3 
570aad8999f7499db99eae22fe9b29bb - default default] Mechanism driver 
'genericswitch' failed in delete_network_postcommit: KeyError: 
'provider:network_type'
2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers Traceback (most 
recent call last):
2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py",
 line 479, in _call_on_drivers
2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers 
getattr(driver.obj, method_name)(context)
2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/networking_generic_switch/generic_switch_mech.py",
 line 315, in delete_network_postcommit
2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers provider_type 
= network['provider:network_type']
2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers KeyError: 
'provider:network_type'
2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers
2021-09-02 12:27:57.440 30 ERROR neutron.plugins.ml2.plugin 
[req-99b0b44f-171d-41a3-b99d-1cccb27b3006 bcb7ef06be674b9199b36e8f18b546f3 
570aad8999f7499db99eae22fe9b29bb - default default] 
mechanism_manager.delete_network_postcommit failed: MechanismDriverError
```

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1942469

Title:
  Network delete notifications no longer container segment info

Status in neutron:
  New

Bug description:
  Change I07c70db027f2ae03ffb5a95072e019e8a5fdc411 made it so
  PRECOMMIT_DELETE and AFTER_DELETE both receive the network dict
  fetched from the DB (decorated with any resource_extend hooks).
  However, this network representation does not include segment
  information like segmentation_id or network_type.

  The networking-generic-switch ML2 plugin assumes that such information
  is present on the delete postcommit hook and needs it to do its job:
  https://opendev.org/openstack/networking-generic-
  
switch/src/branch/master/networking_generic_switch/generic_switch_mech.py#L164-L166

  As a result networking-generic-switch cannot currently be deployed.

  Example error:

  ```
  2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers 
[req-99b0b44f-171d-41a3-b99d-1cccb27b3006 bcb7ef06be674b9199b36e8f18b546f3 
570aad8999f7499db99eae22fe9b29bb - default default] Mechanism driver 
'genericswitch' failed in delete_network_postcommit: KeyError: 
'provider:network_type'
  2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers Traceback (most 
recent call last):
  2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py",
 line 479, in _call_on_drivers
  2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers 
getattr(driver.obj, method_name)(context)
  2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers   File 
"/var/lib/kolla/venv/lib/python2.7/site-packages/networking_generic_switch/generic_switch_mech.py",
 line 315, in delete_network_postcommit
  2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers 
provider_type = network['provider:network_type']
  2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers KeyError: 
'provider:network_type'
  2021-09-02 12:27:57.438 30 ERROR neutron.plugins.ml2.managers
  2021-09-02 12:27:57.440 30 ERROR neutron.plugins.ml2.plugin 
[req-99b0b44f-171d-41a3-b99d-1cccb27b3006 bcb7ef06be674b9199b36e8f18b546f3 
570aad8999f7499db99eae22fe9b29bb - default default] 
mechanism_manager.delete_network_postcommit failed: MechanismDriverError
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1942469/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp