[Yahoo-eng-team] [Bug 1855902] Re: Inefficient Security Group listing

2019-12-10 Thread Brian Haley
*** This bug is a duplicate of bug 1830679 ***
https://bugs.launchpad.net/bugs/1830679

This was fixed in https://review.opendev.org/#/c/665566/ and backported
to stable/stein (15.0.0), it wasn't backported further.

Closing as a duplicate of
https://bugs.launchpad.net/neutron/+bug/1830679

** This bug has been marked a duplicate of bug 1830679
   Security groups RBAC cause a major performance degradation

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1855902

Title:
  Inefficient Security Group listing

Status in neutron:
  New

Bug description:
  Issue:

  Fetching a large Security Group list takes relatively long as several
  database queries are made for each Security Group.

  
  Context:

  Listing SG's takes around 9 seconds with ~500 existing SG's, 16
  seconds with ~1000 SG's and around 30 seconds with ~1500 existing
  SG's, so this time seems to grow at least linearly with number of
  SG's.

  We've looked at flamegraphs of the neutron controller which show that
  the stack frame `/usr/lib/python2.7/site-
  packages/neutron/db/securitygroups_db.py:get_security_groups:166`
  splits into two long running functions, each taking about half of the
  time (one at line 112 and the other at 115).

  ```python
  103 @classmethod
  104 def get_objects(cls, context, _pager=None, validate_filters=True,
  105 **kwargs):
  106 # We want to get the policy regardless of its tenant id. We'll make
  107 # sure the tenant has permission to access the policy later on.
  108 admin_context = context.elevated()
  109 with cls.db_context_reader(admin_context):
  110 objs = super(RbacNeutronDbObjectMixin,
  111  cls).get_objects(admin_context, _pager,
  112   validate_filters, **kwargs)
  113 result = []
  114 for obj in objs:
  115 if not cls.is_accessible(context, obj):
  116 continue
  117 result.append(obj)
  118 return result
  ```

  We've also seen that the number of database queries also seems to grow
  linearly:

  * Listing ~500 SG's performs ~2100 queries
  * Listing ~1000 SG's performs ~3500 queries
  * Listing ~1500 SG's performs ~5200 queries

  This does not scale well, we're expecting a neglectable increase in
  listing time.

  
  Reproduction:

  * Create 1000 SG's
  * Execute `time openstack security group list`
  * Create 500 more SG's
  * Execute `time openstack security group list`

  
  Version:

  We're using neutron 14.0.2-1 on CentOS 7.7.1908.

  
  Perceived Severity:

  MEDIUM

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1855902/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1855945] [NEW] Network Config Version 2 Device Configuration ID used as interface name when set-name is not specified

2019-12-10 Thread Gregory May
Public bug reported:

With Cloud-Init Network Config Version 2, the name of Device
Configuration ID object is overwriting the ethernet interface device
name when set-name is not specified.

Example - Version 2 metadata:

instance-id: "management-cluster-controlplane-0"
network:
  version: 2
  ethernets:
id0:
  match:
macaddress: "00:50:56:a5:1a:78"

When 'set-name' is not defined within the version 2 config, then cloud-
init network_state.py retrieves the name of the ethernet dict. In the
above case it would be "id0":


network_state.py code 
[https://github.com/canonical/cloud-init/blob/ec6924ea1d321cc87e7414bee7734074590045b8/cloudinit/net/network_state.py#L645]

for eth, cfg in command.items():
phy_cmd = {
'type': 'physical',
'name': cfg.get('set-name', eth),
}


See debug output where set-name is not specified and the ethernet device config 
id = id0:

2019-12-05 01:50:14,692 - network_state.py[DEBUG]: v2(ethernets) -> 
v1(physical):
{'subnets': [{'dns_nameservers': ['10.10.10.10'], 'type': 'static', 'gateway': 
'10.7.7.254', 'address': '10.7.5.102/21'}], 'name': 'id0', 'mac_address': 
'00:50:56:a5:75:b3', 'type': 'physical', 'wakeonlan': True, 'match': 
{'macaddress': '00:50:56:a5:75:b3'}}


Within CentOS the sysconfig renderer then later uses this Device Config ID for 
the /etc/sysconfig/network-scripts/ifcfg- name and the DEVICE= parameter.

sysconfig.py code [https://github.com/canonical/cloud-
init/blob/ec6924ea1d321cc87e7414bee7734074590045b8/cloudinit/net/sysconfig.py#L701-L702]


cloud-init.log - without set-name configured:


2019-12-05 21:11:38,913 - stages.py[DEBUG]: Using distro class 
2019-12-05 21:11:38,914 - __init__.py[DEBUG]: no interfaces to rename
2019-12-05 21:11:38,914 - __init__.py[DEBUG]: Datasource 
DataSourceVMwareGuestInfo not updated for events: System boot
2019-12-05 21:11:38,914 - stages.py[DEBUG]: No network config applied. Neither 
a new instance nor datasource network update on 'System boot' event
2019-12-05 21:11:38,914 - handlers.py[DEBUG]: start: 
init-network/setup-datasource: setting up datasource
2019-12-05 21:11:38,914 - DataSourceVMwareGuestInfo.py[INFO]: got host-info: 
{'network': {'interfaces': {'by-mac': OrderedDict([('00:50:56:a5:b2:b2', 
{'ipv6': [{'addr': 'fe80::250:56ff:fea5:b2b2%eth0', 'netmask': 
':::::/64'}]})]), 'by-ipv4': OrderedDict(), 'by-ipv6': 
OrderedDict([('fe80::250:56ff:fea5:b2b2%eth0', {'netmask': 
':::::/64', 'mac': '00:50:56:a5:b2:b2'})])}}, 'hostname': 
'localhost', 'local-hostname': 'localhost'}


cloud-init.log - with set-name: eth0 configured:

2019-12-05 21:57:19,179 - util.py[DEBUG]: Running command ['ip', '-6', 'addr', 
'show', 'permanent', 'scope', 'global'] with allowed return codes [0] 
(shell=False, capture=True)
2019-12-05 21:57:19,190 - util.py[DEBUG]: Running command ['ip', '-4', 'addr', 
'show'] with allowed return codes [0] (shell=False, capture=True)
2019-12-05 21:57:19,198 - __init__.py[DEBUG]: no work necessary for renaming of 
[['00:50:56:a5:b2:b2', 'eth0', 'vmxnet3', '0x07b0']]
2019-12-05 21:57:19,198 - stages.py[INFO]: Applying network configuration from 
system_cfg bringup=False: {'version': 2, 'ethernets': {'id0': {'match': 
{'macaddress': '00:50:56:a5:b2:b2'}, 'wakeonlan': True, 'set-name': 'eth0', 
'dhcp4': False, 'dhcp6': False, 'addresses': ['10.7.5.101/21'], 'gateway4': 
'10.7.7.254', 'nameservers': {'addresses': ['10.10.10.10']
2019-12-05 21:57:19,199 - __init__.py[WARNING]: apply_network_config is not 
currently implemented for distribution ''.  Attempting to use apply_network
2019-12-05 21:57:19,199 - network_state.py[DEBUG]: v2(ethernets) -> 
v1(physical):
{'type': 'physical', 'name': 'eth0', 'mac_address': '00:50:56:a5:b2:b2', 
'match': {'macaddress': '00:50:56:a5:b2:b2'}, 'wakeonlan': True, 'subnets': 
[{'type': 'static', 'address': '10.7.5.101/21', 'gateway': '10.7.7.254', 
'dns_nameservers': ['10.10.10.10']}]}
2019-12-05 21:57:19,206 - network_state.py[DEBUG]: v2_common: handling config:
{'id0': {'match': {'macaddress': '00:50:56:a5:b2:b2'}, 'wakeonlan': True, 
'set-name': 'eth0', 'dhcp4': False, 'dhcp6': False, 'addresses': 
['10.7.5.101/21'], 'gateway4': '10.7.7.254', 'nameservers': {'addresses': 
['10.10.10.10']}}}
2019-12-05 21:57:19,207 - photon.py[DEBUG]: Translated ubuntu style network 
settings # Converted from network_config for distro 
 Implementation of _write_network_config is needed.
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
hwaddress 00:50:56:a5:b2:b2
address 10.7.5.101/21
dns-nameservers 10.10.10.10
gateway 10.7.7.254
 into {'lo': {'ipv6': {}, 'auto': True}, 'eth0': {'ipv6': {}, 'bootproto': 
'static', 'address': '10.7.5.101', 'gateway': '10.7.7.254', 'netmask': 
'255.255.248.0', 'broadcast': '10.7.7.255', 'dns-nameservers': ['10.10.10.10'], 
'auto': True}}
2019-12-05 21:57:19,208 - util.py[DEBUG]: Writing to 

[Yahoo-eng-team] [Bug 1855934] [NEW] new versions of flake8 parse typeing coments

2019-12-10 Thread sean mooney
Public bug reported:

while playing with pre-commit i notice that new versions of flake8 parse type 
annotion comments.
if you have not imported the relevent typing module then  it fails with F821 
undefined name 

nova/virt/hardware.py:1396:5: F821 undefined name 'Optional'
nova/virt/hardware.py:1396:5: F821 undefined name 'List'
nova/virt/hardware.py:1396:5: F821 undefined name 'Set'
nova/virt/hardware.py:1426:5: F821 undefined name 'Optional'
nova/virt/hardware.py:1426:5: F821 undefined name 'List'
nova/virt/hardware.py:1426:5: F821 undefined name 'Set'
nova/virt/hardware.py:1456:5: F821 undefined name 'Optional'
nova/virt/hardware.py:1483:5: F821 undefined name 'Optional'
nova/virt/hardware.py:1525:5: F821 undefined name 'Optional'
nova/virt/hardware.py:1624:5: F821 undefined name 'Tuple'
nova/virt/hardware.py:1646:5: F821 undefined name 'Optional'
nova/virt/hardware.py:1658:5: F821 undefined name 'Optional'
nova/virt/hardware.py:1674:5: F821 undefined name 'List'
nova/virt/hardware.py:1696:5: F821 undefined name 'Optional'
nova/virt/hardware.py:1920:29: F821 undefined name 'List'
nova/virt/hardware.py:1939:31: F821 undefined name 'Set'

while this is not an issue today because we pin to an old version of
flake8 we should still fix this just as a code hygiene issue. given this
has no impact on the running code im going to triage this as low an push
a trivial patch.

** Affects: nova
 Importance: Low
 Assignee: sean mooney (sean-k-mooney)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1855934

Title:
  new versions of flake8 parse typeing coments

Status in OpenStack Compute (nova):
  New

Bug description:
  while playing with pre-commit i notice that new versions of flake8 parse type 
annotion comments.
  if you have not imported the relevent typing module then  it fails with F821 
undefined name 

  nova/virt/hardware.py:1396:5: F821 undefined name 'Optional'
  nova/virt/hardware.py:1396:5: F821 undefined name 'List'
  nova/virt/hardware.py:1396:5: F821 undefined name 'Set'
  nova/virt/hardware.py:1426:5: F821 undefined name 'Optional'
  nova/virt/hardware.py:1426:5: F821 undefined name 'List'
  nova/virt/hardware.py:1426:5: F821 undefined name 'Set'
  nova/virt/hardware.py:1456:5: F821 undefined name 'Optional'
  nova/virt/hardware.py:1483:5: F821 undefined name 'Optional'
  nova/virt/hardware.py:1525:5: F821 undefined name 'Optional'
  nova/virt/hardware.py:1624:5: F821 undefined name 'Tuple'
  nova/virt/hardware.py:1646:5: F821 undefined name 'Optional'
  nova/virt/hardware.py:1658:5: F821 undefined name 'Optional'
  nova/virt/hardware.py:1674:5: F821 undefined name 'List'
  nova/virt/hardware.py:1696:5: F821 undefined name 'Optional'
  nova/virt/hardware.py:1920:29: F821 undefined name 'List'
  nova/virt/hardware.py:1939:31: F821 undefined name 'Set'

  while this is not an issue today because we pin to an old version of
  flake8 we should still fix this just as a code hygiene issue. given
  this has no impact on the running code im going to triage this as low
  an push a trivial patch.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1855934/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1855927] [NEW] _poll_unconfirmed_resizes may not retry later if confirm_resize fails in API

2019-12-10 Thread Matt Riedemann
Public bug reported:

This is based on code inspection but let's say I have configured my
computes to set resize_confirm_window=3600 to automatically confirm a
resized server after 1 hour. Within that hour, let's say the source
compute service is down.

The periodic task gets the unconfirmed migrations with status='finished'
which have been updated some time older than the given configurable
window:

https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/manager.py#L8793

https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/db/sqlalchemy/api.py#L4342

The periodic task then calls the compute API code to confirm the resize:

https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7160

which changes the migration status to 'confirming':

https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3684

And casts off to the source compute:

https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/rpcapi.py#L600

Now if the source compute is down and that fails, the compute manager
task code will handle it and say it will retry later:

https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7163

However, because the migration status was changed from 'finished' to
'confirming' the task will not retry because it won't find the migration
given the DB query. And trying to confirm the resize via the API will
fail as well because we'll get MigrationNotFoundByStatus since the
migration status is no longer 'finished':

https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3681

The compute manager code should probably mark the migration status as
'finished' again if it's really going to try later, or mark the
migration status as 'error'. Note that the confirm_resize method in the
compute manager doesn't mark the migration status as 'error' if
something fails there either:

https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L3807

** Affects: nova
 Importance: Low
 Status: New


** Tags: error-handling migrate resize

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1855927

Title:
  _poll_unconfirmed_resizes may not retry later if confirm_resize fails
  in API

Status in OpenStack Compute (nova):
  New

Bug description:
  This is based on code inspection but let's say I have configured my
  computes to set resize_confirm_window=3600 to automatically confirm a
  resized server after 1 hour. Within that hour, let's say the source
  compute service is down.

  The periodic task gets the unconfirmed migrations with
  status='finished' which have been updated some time older than the
  given configurable window:

  
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/manager.py#L8793

  
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/db/sqlalchemy/api.py#L4342

  The periodic task then calls the compute API code to confirm the
  resize:

  https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7160

  which changes the migration status to 'confirming':

  
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3684

  And casts off to the source compute:

  
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/rpcapi.py#L600

  Now if the source compute is down and that fails, the compute manager
  task code will handle it and say it will retry later:

  https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7163

  However, because the migration status was changed from 'finished' to
  'confirming' the task will not retry because it won't find the
  migration given the DB query. And trying to confirm the resize via the
  API will fail as well because we'll get MigrationNotFoundByStatus
  since the migration status is no longer 'finished':

  
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3681

  The compute manager code should probably mark the migration status as
  'finished' again if it's really going to try later, or mark the
  migration status as 'error'. Note that the confirm_resize method in
  the compute manager doesn't mark the migration status as 'error' if
  something fails there either:

  https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L3807

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1855927/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1855919] [NEW] roken pipe erros cause neutron metadata agent to fail

2019-12-10 Thread Albert Braden
Public bug reported:

After we increased computes to 200, we started seeing "broken pipe"
errors in neutron-metadata-agent.log on the controllers. After a neutron
restart the errors are reduced, then they increase until the log is
mostly errors, and the neutron metadata service fails, and VMs cannot
boot. Another symptom is that unacked RMQ messages build up in the
q-plugin queue. This is the first error we see; this one occurs as the
server is starting:


2019-12-10 10:56:01.942 1838536 INFO eventlet.wsgi.server [-] (1838536) wsgi 
starting up on http:/var/lib/neutron/metadata_proxy
2019-12-10 10:56:01.943 1838538 INFO eventlet.wsgi.server [-] (1838538) wsgi 
starting up on http:/var/lib/neutron/metadata_proxy
2019-12-10 10:56:01.945 1838539 INFO eventlet.wsgi.server [-] (1838539) wsgi 
starting up on http:/var/lib/neutron/metadata_proxy
2019-12-10 10:56:21.138 1838538 INFO eventlet.wsgi.server [-] Traceback (most 
recent call last):
  File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 521, in 
handle_one_response
write(b''.join(towrite))
  File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 462, in write
wfile.flush()
  File "/usr/lib/python2.7/socket.py", line 307, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
  File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 390, 
in sendall
tail = self.send(data, flags)
  File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 384, 
in send
return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 371, 
in _send_loop
return send_method(data, *args)
error: [Errno 32] Broken pipe

2019-12-10 10:56:21.138 1838538 INFO eventlet.wsgi.server [-] 
10.195.74.25, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  
len: 0 time: 19.0296111
2019-12-10 10:56:25.059 1838516 INFO eventlet.wsgi.server [-] 
10.195.74.28, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  
len: 146 time: 0.2840948
2019-12-10 10:56:25.181 1838529 INFO eventlet.wsgi.server [-] 
10.195.74.68, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  
len: 146 time: 0.2695429
2019-12-10 10:56:25.259 1838518 INFO eventlet.wsgi.server [-] 
10.195.74.28, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  
len: 146 time: 0.1980510

Then we see some "call queues" warnings and the threshold increases to
40:

2019-12-10 10:56:31.414 1838515 WARNING
oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11,
greater than warning threshold: 10. There could be a leak. Increasing
threshold to: 20

Next we see RPC timeout errors:

2019-12-10 10:57:02.043 1838520 WARNING oslo_messaging._drivers.amqpdriver [-] 
Number of call queues is 11, greater than warning threshold: 10. There could be 
a leak. Increasing threshold to: 20
2019-12-10 10:57:02.059 1838534 ERROR neutron.common.rpc [-] Timeout in RPC 
method get_ports. Waiting for 37 seconds before next attempt. If the server is 
not down, consider increasing the rpc_response_timeout option as Neutron 
server(s) may be overloaded and unable to respond quickly enough.: 
MessagingTimeout: Timed out waiting for a reply to message ID 
1ed3e021607e466f8b9b84cd3b05b188
2019-12-10 10:57:02.059 1838534 WARNING neutron.common.rpc [-] Increasing 
timeout for get_ports calls to 120 seconds. Restart the agent to restore it to 
the default value.: MessagingTimeout: Timed out waiting for a reply to message 
ID 1ed3e021607e466f8b9b84cd3b05b188
2019-12-10 10:57:02.285 1838521 INFO eventlet.wsgi.server [-] 
10.195.74.27, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  
len: 146 time: 0.7959940

2019-12-10 10:57:16.215 1838531 WARNING
oslo_messaging._drivers.amqpdriver [-] Number of call queues is 21,
greater than warning threshold: 20. There could be a leak. Increasing
threshold to: 40

2019-12-10 10:57:17.339 1838539 WARNING
oslo_messaging._drivers.amqpdriver [-] Number of call queues is 11,
greater than warning threshold: 10. There could be a leak. Increasing
threshold to: 20

2019-12-10 10:57:24.838 1838524 INFO eventlet.wsgi.server [-] 
10.195.73.242, "GET /latest/meta-data/instance-id HTTP/1.0" status: 200  
len: 146 time: 0.6842020
2019-12-10 10:57:24.882 1838524 ERROR neutron.common.rpc [-] Timeout in RPC 
method get_ports. Waiting for 3 seconds before next attempt. If the server is 
not down, consider increasing the rpc_response_timeout option as Neutron 
server(s) may be overloaded and unable to respond quickly enough.: 
MessagingTimeout: Timed out waiting for a reply to message ID 
2bb5faa3ec8d4f5b9d3bd3e2fe095f9e
2019-12-10 10:57:24.883 1838524 WARNING neutron.common.rpc [-] Increasing 
timeout for get_ports calls to 120 seconds. Restart the agent to restore it to 
the default value.: MessagingTimeout: Timed out waiting for a reply to message 
ID 2bb5faa3ec8d4f5b9d3bd3e2fe095f9e
2019-12-10 10:57:24.887 1838525 INFO eventlet.wsgi.server [-] 
10.195.74.26, "GET 

[Yahoo-eng-team] [Bug 1855912] [NEW] MariaDB 10.1 fails during alembic migration

2019-12-10 Thread Rodolfo Alonso
Public bug reported:

New CI job running with MariaDB [1] fails during the alembic migration.
According to [2] the problems seems to be solved in v10.2.2.

LOG:
https://b12f79f00ace923cb903-227be9d6f8442281010ef49b8394f34d.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master
/neutron-tempest-mariadb-full/18fecee/job-output.txt

SNIPPET: http://paste.openstack.org/show/787390/

[1] https://review.opendev.org/#/c/681202/
[2] 
https://laracasts.com/discuss/channels/general-discussion/specified-key-was-too-long-max-key-length-is-767-bytes-1

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1855912

Title:
  MariaDB 10.1 fails during alembic migration

Status in neutron:
  New

Bug description:
  New CI job running with MariaDB [1] fails during the alembic
  migration. According to [2] the problems seems to be solved in
  v10.2.2.

  LOG:
  
https://b12f79f00ace923cb903-227be9d6f8442281010ef49b8394f34d.ssl.cf5.rackcdn.com/periodic/opendev.org/openstack/neutron/master
  /neutron-tempest-mariadb-full/18fecee/job-output.txt

  SNIPPET: http://paste.openstack.org/show/787390/

  [1] https://review.opendev.org/#/c/681202/
  [2] 
https://laracasts.com/discuss/channels/general-discussion/specified-key-was-too-long-max-key-length-is-767-bytes-1

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1855912/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1839009] Re: os-server-external-events does not behave correctly for failed single events

2019-12-10 Thread Eric Fried
*** This bug is a duplicate of bug 1855752 ***
https://bugs.launchpad.net/bugs/1855752

Sorry, I didn't know about this bug when we opened 1855752. The issue
has been fixed under that bug.

** This bug has been marked a duplicate of bug 1855752
   Inappropriate HTTP error status from os-server-external-events

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839009

Title:
  os-server-external-events does not behave correctly for failed single
  events

Status in OpenStack Compute (nova):
  New

Bug description:
  The "os-server-external-events" API does not behave correctly when the
  request body contains a list of one event and if that event ends up in
  a non-200 state, i.e if the event ends up in either 400 or 404 or 422
  states, the function executes all the way to L147
  
(https://github.com/openstack/nova/blob/433b1662e48db57aaa42e11756fa4a6d8722b386/nova/api/openstack/compute/server_external_events.py#L147)
  and overall returns a 404 HTTP response without any body. This is
  wrong since as per the documentation it should return the respective
  codes (422/404/400) to the client.

  Infact correctly speaking, if out of the list of provided events, if
  at least one of them doesn't get into the "accepted_events" list, rest
  of them are discarded without returning the correct response against
  each event.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839009/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1855902] [NEW] Inefficient Security Group listing

2019-12-10 Thread Joris Hartog
Public bug reported:

Issue:

Fetching a large Security Group list takes relatively long as several
database queries are made for each Security Group.


Context:

Listing SG's takes around 9 seconds with ~500 existing SG's, 16 seconds
with ~1000 SG's and around 30 seconds with ~1500 existing SG's, so this
time seems to grow at least linearly with number of SG's.

We've looked at flamegraphs of the neutron controller which show that
the stack frame `/usr/lib/python2.7/site-
packages/neutron/db/securitygroups_db.py:get_security_groups:166` splits
into two long running functions, each taking about half of the time (one
at line 112 and the other at 115).

```python
103 @classmethod
104 def get_objects(cls, context, _pager=None, validate_filters=True,
105 **kwargs):
106 # We want to get the policy regardless of its tenant id. We'll make
107 # sure the tenant has permission to access the policy later on.
108 admin_context = context.elevated()
109 with cls.db_context_reader(admin_context):
110 objs = super(RbacNeutronDbObjectMixin,
111  cls).get_objects(admin_context, _pager,
112   validate_filters, **kwargs)
113 result = []
114 for obj in objs:
115 if not cls.is_accessible(context, obj):
116 continue
117 result.append(obj)
118 return result
```

We've also seen that the number of database queries also seems to grow
linearly:

* Listing ~500 SG's performs ~2100 queries
* Listing ~1000 SG's performs ~3500 queries
* Listing ~1500 SG's performs ~5200 queries

This does not scale well, we're expecting a neglectable increase in
listing time.


Reproduction:

* Create 1000 SG's
* Execute `time openstack security group list`
* Create 500 more SG's
* Execute `time openstack security group list`


Version:

We're using neutron 14.0.2-1 on CentOS 7.7.1908.


Perceived Severity:

MEDIUM

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: group list security time

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1855902

Title:
  Inefficient Security Group listing

Status in neutron:
  New

Bug description:
  Issue:

  Fetching a large Security Group list takes relatively long as several
  database queries are made for each Security Group.

  
  Context:

  Listing SG's takes around 9 seconds with ~500 existing SG's, 16
  seconds with ~1000 SG's and around 30 seconds with ~1500 existing
  SG's, so this time seems to grow at least linearly with number of
  SG's.

  We've looked at flamegraphs of the neutron controller which show that
  the stack frame `/usr/lib/python2.7/site-
  packages/neutron/db/securitygroups_db.py:get_security_groups:166`
  splits into two long running functions, each taking about half of the
  time (one at line 112 and the other at 115).

  ```python
  103 @classmethod
  104 def get_objects(cls, context, _pager=None, validate_filters=True,
  105 **kwargs):
  106 # We want to get the policy regardless of its tenant id. We'll make
  107 # sure the tenant has permission to access the policy later on.
  108 admin_context = context.elevated()
  109 with cls.db_context_reader(admin_context):
  110 objs = super(RbacNeutronDbObjectMixin,
  111  cls).get_objects(admin_context, _pager,
  112   validate_filters, **kwargs)
  113 result = []
  114 for obj in objs:
  115 if not cls.is_accessible(context, obj):
  116 continue
  117 result.append(obj)
  118 return result
  ```

  We've also seen that the number of database queries also seems to grow
  linearly:

  * Listing ~500 SG's performs ~2100 queries
  * Listing ~1000 SG's performs ~3500 queries
  * Listing ~1500 SG's performs ~5200 queries

  This does not scale well, we're expecting a neglectable increase in
  listing time.

  
  Reproduction:

  * Create 1000 SG's
  * Execute `time openstack security group list`
  * Create 500 more SG's
  * Execute `time openstack security group list`

  
  Version:

  We're using neutron 14.0.2-1 on CentOS 7.7.1908.

  
  Perceived Severity:

  MEDIUM

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1855902/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1855875] Re: When create a new server instance this error occurred.

2019-12-10 Thread Matt Riedemann
This looks like a configuration issue. What is the value of your
transport_url config option in both the nova config and cell_mappings
table?

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1855875

Title:
  When create a new server instance this error occurred.

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  2019-12-10 17:17:31.679 31059 ERROR oslo.messaging._drivers.impl_rabbit 
[req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba 
b52c7945d86f428e8cf16a4d886f1f9a - default default] Failed to publish message 
to topic 'nova': 'NoneType' object has no attribute '__getitem__'
  2019-12-10 17:17:31.679 31059 ERROR oslo.messaging._drivers.impl_rabbit 
[req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba 
b52c7945d86f428e8cf16a4d886f1f9a - default default] Unable to connect to AMQP 
server on 192.168.0.204:5672 after inf tries: 'NoneType' object has no 
attribute '__getitem__'
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi 
[req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba 
b52c7945d86f428e8cf16a4d886f1f9a - default default] Unexpected exception in API 
method: MessageDeliveryFailure: Unable to connect to AMQP server on 
192.168.0.204:5672 after inf tries: 'NoneType' object has no attribute 
'__getitem__'
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 671, in 
wrapped
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
f(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 
686, in create
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi 
**create_kwargs)
  2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 

[Yahoo-eng-team] [Bug 1855888] [NEW] ovs-offload with vxlan is broken due to adding skb mark

2019-12-10 Thread Moshe Levi
Public bug reported:

The following patch [1] add use of egress_pkt_mark which is not support with 
ovs hardware offload. 
This cause regression break in openstack when using ovs hardware offload when 
using vxlan


[1] - https://review.opendev.org/#/c/675054/

** Affects: neutron
 Importance: High
 Assignee: Moshe Levi (moshele)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1855888

Title:
  ovs-offload with vxlan is broken due to adding skb mark

Status in neutron:
  In Progress

Bug description:
  The following patch [1] add use of egress_pkt_mark which is not support with 
ovs hardware offload. 
  This cause regression break in openstack when using ovs hardware offload when 
using vxlan

  
  [1] - https://review.opendev.org/#/c/675054/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1855888/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1855883] [NEW] can not migrate serve on aarch64

2019-12-10 Thread Eric Xie
Public bug reported:

Description
===
We setup OpenStack env on aarch64 KylinOS.
Failed to live migrate instance cause of 'This operating system kernel does not 
 support vITS migration'.

Steps to reproduce
==
1. Setup OpenStack on aarch64 servers with openstack-helm
2. Live migrate instance from compute02 to compute03

Expected result
===
Success, instance locate on compute03

Actual result
=
Failed, instnace locate on compute02


Environment
===
1. Exact version of OpenStack you are running. See the following
stable/rocky
# apt list --installed |egrep "libvirt|qemu"
ipxe-qemu/now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~cloud0 all 
[installed,local]
ipxe-qemu-256k-compat-efi-roms/now 1.0.0+git-20150424.a25a16d-0ubuntu2~cloud0 
all [installed,local]
libvirt-bin/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
libvirt-clients/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
libvirt-daemon/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
libvirt-daemon-system/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
libvirt0/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
qemu/now 1:2.11+dfsg-1ubuntu7.15~cloud1 arm64 [installed,local]

# uname -a
Linux compute03 4.4.131-20190726.kylin.server-generic #kylin SMP Tue Jul 30 
16:44:09 CST 2019 aarch64 aarch64 aarch64 GNU/Linux

2. Which hypervisor did you use?
libvirt+kvm

Logs & Configs
==
nova-compute:
File "/var/lib/openstack/local/lib/python2.7/site-packages/libvirt.py", line 
1745, in migrateToURI3
if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', 
dom=self)
libvirtError: internal error: unable to execute QEMU command 'migrate': This 
operating system kernel does not support vITS migration

libvirt:
2019-12-07 05:34:34.820+: 57546: error : qemuMonitorJSONCheckError:392 : 
internal error: unable to execute QEMU command 'migrate': This operating system 
kernel does not support vITS migration
2019-12-07 05:34:35.226+: 57546: error : 
virNetClientProgramDispatchError:177 : internal error: qemu unexpectedly closed 
the monitor: 2019-12-07T05:34:29.355638Z qemu-system-aarch64: Not a migration 
stream
2019-12-07T05:34:29.355781Z qemu-system-aarch64: load of migration failed: 
Invalid argument

** Affects: nova
 Importance: Undecided
 Assignee: Eric Xie (eric-xie)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1855883

Title:
  can not migrate serve on aarch64

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  We setup OpenStack env on aarch64 KylinOS.
  Failed to live migrate instance cause of 'This operating system kernel does 
not  support vITS migration'.

  Steps to reproduce
  ==
  1. Setup OpenStack on aarch64 servers with openstack-helm
  2. Live migrate instance from compute02 to compute03

  Expected result
  ===
  Success, instance locate on compute03

  Actual result
  =
  Failed, instnace locate on compute02

  
  Environment
  ===
  1. Exact version of OpenStack you are running. See the following
  stable/rocky
  # apt list --installed |egrep "libvirt|qemu"
  ipxe-qemu/now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~cloud0 all 
[installed,local]
  ipxe-qemu-256k-compat-efi-roms/now 1.0.0+git-20150424.a25a16d-0ubuntu2~cloud0 
all [installed,local]
  libvirt-bin/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
  libvirt-clients/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
  libvirt-daemon/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
  libvirt-daemon-system/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
  libvirt0/now 4.0.0-1ubuntu8.11~cloud0 arm64 [installed,local]
  qemu/now 1:2.11+dfsg-1ubuntu7.15~cloud1 arm64 [installed,local]

  # uname -a
  Linux compute03 4.4.131-20190726.kylin.server-generic #kylin SMP Tue Jul 30 
16:44:09 CST 2019 aarch64 aarch64 aarch64 GNU/Linux

  2. Which hypervisor did you use?
  libvirt+kvm

  Logs & Configs
  ==
  nova-compute:
  File "/var/lib/openstack/local/lib/python2.7/site-packages/libvirt.py", line 
1745, in migrateToURI3
  if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', 
dom=self)
  libvirtError: internal error: unable to execute QEMU command 'migrate': This 
operating system kernel does not support vITS migration

  libvirt:
  2019-12-07 05:34:34.820+: 57546: error : qemuMonitorJSONCheckError:392 : 
internal error: unable to execute QEMU command 'migrate': This operating system 
kernel does not support vITS migration
  2019-12-07 05:34:35.226+: 57546: error : 
virNetClientProgramDispatchError:177 : internal error: qemu unexpectedly closed 
the monitor: 2019-12-07T05:34:29.355638Z qemu-system-aarch64: Not a migration 
stream
  2019-12-07T05:34:29.355781Z qemu-system-aarch64: load of migration failed: 

[Yahoo-eng-team] [Bug 1804502] Re: Rebuild server with NUMATopologyFilter enabled fails (in some cases)

2019-12-10 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/689861
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=3f9411071d4c1a04ab0b68fd635597bf6959c0ca
Submitter: Zuul
Branch:master

commit 3f9411071d4c1a04ab0b68fd635597bf6959c0ca
Author: Sean Mooney 
Date:   Mon Oct 21 16:17:17 2019 +

Disable NUMATopologyFilter on rebuild

This change leverages the new NUMA constraint checking added in
in I0322d872bdff68936033a6f5a54e8296a6fb3434 to allow the
NUMATopologyFilter to be skipped on rebuild.

As the new behavior of rebuild enfroces that no changes
to the numa constraints are allowed on rebuild we no longer
need to execute the NUMATopologyFilter. Previously
the NUMATopologyFilter would process the rebuild request
as if it was a request to spawn a new instnace as the
numa_fit_instance_to_host function is not rebuild aware.

As such prior to this change a rebuild would only succeed
if a host had enough additional capacity for a second instance
on the same host meeting the requirement of the new image and
existing flavor. This behavior was incorrect on two counts as
a rebuild uses a noop claim. First the resouce usage cannot
change so it was incorrect to require the addtional capacity
to rebuild an instance. Secondly it was incorrect not to assert
the resouce usage remained the same.

I0322d872bdff68936033a6f5a54e8296a6fb3434 adressed guarding the
rebuild against altering the resouce usage and this change
allows in place rebuild.

This change found a latent bug that will be adressed in a follow
up change and updated the functional tests to note the incorrect
behavior.

Change-Id: I48bccc4b9adcac3c7a3e42769c11fdeb8f6fd132
Closes-Bug: #1804502
Implements: blueprint inplace-rebuild-of-numa-instances


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1804502

Title:
  Rebuild server with NUMATopologyFilter enabled fails (in some cases)

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===
  server rebuild will fail in nova scheduler on NUMATopologyFilter if the 
computes do not have enough capacity (even though clearly the running server is 
already accounted into that calculation)

  to resolve the issue a fix is required in NUMATopologyFilter to not
  perform the rebuild operation in the case that the request is due to
  rebuild.

  the result of such a case will be that server rebuild will fail with
  error of "no valid host found"

  (do not mix resize with rebuild functions...)

  Steps to reproduce
  ==

  1. create a flavor that contain metadata that will point to a specific 
compute (use host aggregate with same key:value metadata
  make sure flavor contain topology related metadata:
  hw:cpu_cores='1', hw:cpu_policy='dedicated', hw:cpu_sockets='6', 
hw:cpu_thread_policy='prefer', hw:cpu_threads='1', hw:mem_page_size='large', 
location='area51'

  2. create a server on that compute (preferably using heat stack)
  3. (try to) rebuild the server using stack update
  4. issue reproduced

  Expected result
  ===
  server in an active running state (if image was replaced in the rebuild 
command than with a reference to the new image in the server details.

  Actual result
  =
  server in error state with error of no valid host found.

  Message
  No valid host was found. There are not enough hosts available.
  Code
  500
  Details
  File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 966, 
in rebuild_instance return_alternates=False) File 
"/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 723, in 
_schedule_instances return_alternates=return_alternates) File 
"/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 907, in 
wrapped return func(*args, **kwargs) File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 53, 
in select_destinations instance_uuids, return_objects, return_alternates) File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, 
in __run_method return getattr(self.instance, __name)(*args, **kwargs) File 
"/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 42, in 
select_destinations instance_uuids, return_objects, return_alternates) File 
"/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 158, in 
select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) 
File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, 
in call retry=self.retry) File 
"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, in 
_send retry=retry) File 

[Yahoo-eng-team] [Bug 1855752] Re: Inappropriate HTTP error status from os-server-external-events

2019-12-10 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/698037
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=e6f742544432d6066f1fba4666580919eb7859bd
Submitter: Zuul
Branch:master

commit e6f742544432d6066f1fba4666580919eb7859bd
Author: Eric Fried 
Date:   Mon Dec 9 09:58:53 2019 -0600

Nix os-server-external-events 404 condition

The POST /os-server-external-events API had the following confusing
behavior:

With multiple events in the payload, if *some* (but not all) were
dropped, the HTTP response was 207, with per-event 4xx error codes in
the payload. But if *all* of the events were dropped, the overall HTTP
response was 404 with no payload. Thus, especially for consumers sending
only one event at a time, it was impossible to distinguish e.g. "you
tried to send an event for a nonexistent instance" from "the instance
you specified hasn't landed on a host yet".

This fix gets rid of that sweeping 404 condition, so if *any* subset of
the events are dropped (including *all* of them), the HTTP response will
always be 207, and the payload will always contain granular per-event
error codes.

This effectively means the API can no longer return 404, ever.

Closes-Bug: #1855752
Change-Id: Ibad1b51e2cf50d00102295039b6e82bc00bec058


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1855752

Title:
  Inappropriate HTTP error status from os-server-external-events

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The handling of os-server-external-events API [1] has a bug. It is designed 
to handle multiple events, with the following expected behavior:
  * If all events are successfully handled, it should return HTTP 200.
  * If no event is successfully handled, it should return HTTP 404.
  * If some are handled successfully but not all, it should return HTTP 207, 
with per-event status codes.

  However, when Cyborg sends a single event for a single instance, and
  that instance is not yet associated with a host [*], the 'else' clause
  in Line 137 [1] will set HTTP 207 as return code; but, since
  accepted_events is [] in Line 146, that will throw an exception and
  return 404. IOW, the expected return is 207 but the actual return is
  404.

  This has been discussed in IRC [2]. A patch has been proposed [3] to
  address this.

  [*] This happens because Nova calls into Cyborg from the conductor to
  initiate binding of accelerator requests (ARQs), lets it proceed
  asynchronously, and waits for the binding notification event in the
  compute manager. The notification event could come before the compute
  manager has called self._rt.instance_claim(), which would associate
  the instance with a host and a node. That race condition triggers the
  behavior above.

  [1]
  
https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/api/openstack/compute/server_external_events.py

  [2] http://eavesdrop.openstack.org/irclogs/%23openstack-nova
  /%23openstack-nova.2019-12-09.log.html#t2019-12-09T15:45:18

  [3] https://review.opendev.org/#/c/698037/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1855752/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1855875] [NEW] When create a new server instance this error occurred.

2019-12-10 Thread sameera madushan
Public bug reported:

2019-12-10 17:17:31.679 31059 ERROR oslo.messaging._drivers.impl_rabbit 
[req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba 
b52c7945d86f428e8cf16a4d886f1f9a - default default] Failed to publish message 
to topic 'nova': 'NoneType' object has no attribute '__getitem__'
2019-12-10 17:17:31.679 31059 ERROR oslo.messaging._drivers.impl_rabbit 
[req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba 
b52c7945d86f428e8cf16a4d886f1f9a - default default] Unable to connect to AMQP 
server on 192.168.0.204:5672 after inf tries: 'NoneType' object has no 
attribute '__getitem__'
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi 
[req-a5d14694-ebda-449d-9030-0998fc27eb3e c035f19ef5af4a108d5a3704f59362ba 
b52c7945d86f428e8cf16a4d886f1f9a - default default] Unexpected exception in API 
method: MessageDeliveryFailure: Unable to connect to AMQP server on 
192.168.0.204:5672 after inf tries: 'NoneType' object has no attribute 
'__getitem__'
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 671, in 
wrapped
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return f(*args, 
**kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 110, 
in wrapper
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi return 
func(*args, **kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 
686, in create
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi **create_kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/hooks.py", line 154, in inner
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi rv = f(*args, 
**kwargs)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1857, in create
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi 
supports_port_resource_request=supports_port_resource_request)
2019-12-10 17:17:31.680 31059 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/compute/api.py", line 

[Yahoo-eng-team] [Bug 1855869] [NEW] federation role mapping does not add users to groups

2019-12-10 Thread Robert Duncan
Public bug reported:

I'm using AzureAD and keystone oidc 
mapping remote users into local groups does not work as expected.
I'm using the auto generated domain for ephemeral cloud users, a remote 
attribute of OIDC_DEPARTMENT is used for mapping federated users to local 
groups, the groups and projects have been created in the default domain, users 
should inherit the roles of their mapped group or in other words "group based 
role based access".

my expectation when following the docs for oidc or openid or mapped is that 
users inherit roles of their mapped groups
how to reproduce

1 - create idp
2 - create protocol
3 - create mapping
4 - create project
5 - create group
6 - assign group to project
7 - assign roles to group in project

WEB SSO is working and a certain amount of the mapping seems to be
working, for example if I grant group access to a project, the federated
user will be granted access to the project in horizon - but they won't
inherit the roles of that group, i.e. they will not become group members

in Horizon >> Identity >> Users (Select a federated User) >> Groups (no groups)
In Horizon >> Identity >> Groups >> Members (no members)

Is this intended? The federated users domain id is the auto generated
federation domain, but I am mapping them into Default domain / project /
group

here is the mapping from oidc group to openstack group

{
  "rules": [
{
  "local": [
{
  "group": {
"domain": {
  "name": "Default"
},
"name": "itdept"
  },
  "user": {
"name": "{0}",
"email": "{1}"
  }
}
  ],
  "remote": [
{
  "type": "HTTP_OIDC_EMAIL"
},
{
  "type": "HTTP_OIDC_EMAIL"
},
{
  "type": "HTTP_OIDC_DEPARTMENT",
  "any_one_of": [
"7050",
"7051"
  ]
}
  ]
}

There is nothing in the mapping regarding projects as I would not like
to use such a mechanism for simple access to projects, but if I assign
the local group to another project then I *can* switch to that project
in horizon - but, I do not have the roles of the group, I have the
member role only - I'm guessing because this is bestowed by default or
by horizon.

So in summary
Configured a working SSO 
- users not being added to groups, seems to be ephemeral
- Users do inherit groups projects, so project enrolment works as expected
- User do not inherit groups roles on projects

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1855869

Title:
  federation role mapping does not add users to groups

Status in OpenStack Identity (keystone):
  New

Bug description:
  I'm using AzureAD and keystone oidc 
  mapping remote users into local groups does not work as expected.
  I'm using the auto generated domain for ephemeral cloud users, a remote 
attribute of OIDC_DEPARTMENT is used for mapping federated users to local 
groups, the groups and projects have been created in the default domain, users 
should inherit the roles of their mapped group or in other words "group based 
role based access".

  my expectation when following the docs for oidc or openid or mapped is that 
users inherit roles of their mapped groups
  how to reproduce

  1 - create idp
  2 - create protocol
  3 - create mapping
  4 - create project
  5 - create group
  6 - assign group to project
  7 - assign roles to group in project

  WEB SSO is working and a certain amount of the mapping seems to be
  working, for example if I grant group access to a project, the
  federated user will be granted access to the project in horizon - but
  they won't inherit the roles of that group, i.e. they will not become
  group members

  in Horizon >> Identity >> Users (Select a federated User) >> Groups (no 
groups)
  In Horizon >> Identity >> Groups >> Members (no members)

  Is this intended? The federated users domain id is the auto generated
  federation domain, but I am mapping them into Default domain / project
  / group

  here is the mapping from oidc group to openstack group

  {
"rules": [
  {
"local": [
  {
"group": {
  "domain": {
"name": "Default"
  },
  "name": "itdept"
},
"user": {
  "name": "{0}",
  "email": "{1}"
}
  }
],
"remote": [
  {
"type": "HTTP_OIDC_EMAIL"
  },
  {
"type": "HTTP_OIDC_EMAIL"
  },
  {
"type": "HTTP_OIDC_DEPARTMENT",
"any_one_of": [
  "7050",
  "7051"
]
  }
]
  }

  There is nothing in the mapping 

[Yahoo-eng-team] [Bug 1855854] [NEW] [RFE] Dynamic DHCP allocation pool

2019-12-10 Thread Harald Jensås
Public bug reported:

Neuton currently only support configuring the DHCP agent with static
/fixed-address allocations. The DHCP client id (the client MAC address
in neutron's case) is mapped to a specific IP address in the dhcp server
configuration. No range of addresses are made available for clients
without a pre-allocated fixed-address.

When network booting on IPv6 this become an issue, the DHCPv6
specification which mandetes use of the DHCP Unique Identifier (DUID)
and Identity Association Identifier (IAID) to identify a lease. When
network booting, an instance will move trough a minimum of two DHCP
clients, and these rareley end up using identical DUID's and IAID's.

The combination of static/fixed-address allocations in the DHCP server
and the changeing DUID/IAID of the clients causes the second DHCP client
request to get a ``no address available`` reply from the server, and
thus the network boot process errors.

NOTE:
  In some cases just the UEFI PXE6 end up doing two cycles
  of DHCPv6 S.A.R.R (Solicit, Advertise, Request, Reply)
  with different IAID's. Because some UEFI firwmare use's a
  non-RFC compliant random generator for the IAID see bug:
https://bugzilla.tianocore.org/show_bug.cgi?id=1518.

  While this is a bug in UEFI firmware, it's a common enough
  implementation that is used by various hardware vendors
  that it makes sense to workaround the issue where possible.


This RFE is for adding the possibility to make a subnet with dynamic
allocation pool(s).

This would solve the network booting issue with changing IAID's
described above. A new lease with a new address will be offered during
each step of network booting.

For example an instance deployment via Openstack Baremetal service
(ironic) would typically include three DHCP clients during provisioning:
UEFI firmware, iPXE, ironic-python-agent ramdisk. So a toatal of 3
leases would be consumed to complete the provisioning.

If this RFE is implemented, the dhcp server (dnsmasq) would configure
the dhcp-range for a dynamic subnet (or dynamic allocation pool of a
subnet) without the ``mode`` set to ``static``. To ensure that the dhcp
server only provide a dynamic allocation for the desired ports the
``ignore`` option is used in a ``dhcp-host`` entry with a wildcard ``*``
host (``dhcp-host="*",ignore``). Ports that require dynamic addressing
would get a ``dhcp-host`` entry with ``dhcp-host=``
(whithout the ``ignore``) so that these specific ports get addresses
from the dynamic allocation pool.

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: rfe

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1855854

Title:
  [RFE] Dynamic DHCP allocation pool

Status in neutron:
  New

Bug description:
  Neuton currently only support configuring the DHCP agent with static
  /fixed-address allocations. The DHCP client id (the client MAC address
  in neutron's case) is mapped to a specific IP address in the dhcp
  server configuration. No range of addresses are made available for
  clients without a pre-allocated fixed-address.

  When network booting on IPv6 this become an issue, the DHCPv6
  specification which mandetes use of the DHCP Unique Identifier (DUID)
  and Identity Association Identifier (IAID) to identify a lease. When
  network booting, an instance will move trough a minimum of two DHCP
  clients, and these rareley end up using identical DUID's and IAID's.

  The combination of static/fixed-address allocations in the DHCP server
  and the changeing DUID/IAID of the clients causes the second DHCP
  client request to get a ``no address available`` reply from the
  server, and thus the network boot process errors.

  NOTE:
In some cases just the UEFI PXE6 end up doing two cycles
of DHCPv6 S.A.R.R (Solicit, Advertise, Request, Reply)
with different IAID's. Because some UEFI firwmare use's a
non-RFC compliant random generator for the IAID see bug:
  https://bugzilla.tianocore.org/show_bug.cgi?id=1518.

While this is a bug in UEFI firmware, it's a common enough
implementation that is used by various hardware vendors
that it makes sense to workaround the issue where possible.


  This RFE is for adding the possibility to make a subnet with dynamic
  allocation pool(s).

  This would solve the network booting issue with changing IAID's
  described above. A new lease with a new address will be offered during
  each step of network booting.

  For example an instance deployment via Openstack Baremetal service
  (ironic) would typically include three DHCP clients during
  provisioning: UEFI firmware, iPXE, ironic-python-agent ramdisk. So a
  toatal of 3 leases would be consumed to complete the provisioning.

  If this RFE is implemented, the dhcp server (dnsmasq) would configure
  the dhcp-range for a dynamic subnet (or dynamic allocation pool of a
  subnet) without the 

[Yahoo-eng-team] [Bug 1832768] Re: Horizon: AgularJS pages do not display dates in system's timezone

2019-12-10 Thread hutianhao27
** Changed in: starlingx
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1832768

Title:
  Horizon: AgularJS pages do not display dates in system's timezone

Status in OpenStack Dashboard (Horizon):
  Fix Released
Status in StarlingX:
  Fix Released

Bug description:
  Brief Description
  -
  Horizon's AngularJS pages (for example Images) does not display timestamp in 
system's timezone. The timezone used is the browser's.

  Severity
  
  Minor

  Steps to Reproduce
  --
  - If Timezone is not set under Settings (menu on the top right):
  Set the controller's system timezone to a one different from UTC and the 
browser, for example "system modify --timezone America/Regina"
  The Image timestamps (for example 'Created At') are displayed in the 
browser's timezone and not the system's timezone.

  or

  - If Timezone is set under Settings (menu on the top right): 
  Set Horizon's Settings Timezone to one that is different from UTC and the 
browser.
  The Image timestamps (for example 'Created At') are displayed in the 
browser's timezone and not the Settings' timezone.

  Expected Behavior
  --
  AngularJS pages should align with Django pages which use the timezone from 
the Horizon's cookie (which is set under the Settings menu on the top right), 
or if that is not set use the controller’s timezone.

  Actual Behavior
  
  The AngularJS pages use the browser's timezone.

  Reproducibility
  ---
  Reproducible

  System Configuration
  
  Any configuration

  Branch/Pull Time/Commit
  ---

  Last Pass
  -
  NA

  Timestamp/Logs
  --
  NA

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1832768/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp