[Yahoo-eng-team] [Bug 1878719] Re: DHCP Agent's iptables CHECKSUM rule causes skb_warn_bad_offload kernel

2020-07-13 Thread Launchpad Bug Tracker
[Expired for neutron because there has been no activity for 60 days.]

** Changed in: neutron
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1878719

Title:
  DHCP Agent's iptables CHECKSUM rule causes  skb_warn_bad_offload
  kernel

Status in neutron:
  Expired

Bug description:
  We are hitting this kernel issue due to a DHCP agent CHECKSUM rule
  that is probably obsolete/not needed:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840619

  Upgrading the kernel is one workaround, but more disruptive,
  especially since still using CentOS7, and kernel fix only made it into
  4.19. We should just remove this rule altogether. As per the kernel
  issue:

  "The changes are limited only to users which have CHECKSUM rules
  enabled in their iptables configs. Openstack commonly configures such
  rules on deployment, even though they are not necessary, as almost all
  packets have their checksum calculated by NICs these days, and
  CHECKSUM is only around to service old dhcp clients which would
  discard UDP packets with empty checksums.

  This commit was selected for upstream -stable 4.18.13, and has made
  its way into bionic 4.15.0-58.64 by LP #1836426. There have been no
  reported problems and those kernels would have had sufficient testing
  with Openstack and its configured iptables rules.

  If any users are affected by regression, then they can simply delete
  any CHECKSUM entries in their iptables configs."

  
  I can see the metadata agent's CHECKSUM rule was alreayd removed last year: 
https://github.com/openstack/neutron/commit/04e995be9898ceaa009344509dc16ca7f589d814

  Is there any reason the DHCP agent's was not? Is it safe to just
  remove this function and where it is invoked from altogether?

  
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/dhcp.py#L1739
  
https://github.com/openstack/neutron/blob/cb55643a0695ebc5b41f50f6edb1546bcc676b71/neutron/agent/linux/dhcp.py#L1691

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1878719/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1886537] Re: Missing parameters in python-glanceclient image-import command documentation

2020-07-13 Thread Erno Kuvaja
Thanks for pointing this out. I targeted it to python-glanceclient
instead of the service.

** Project changed: glance => python-glanceclient

** Changed in: python-glanceclient
   Status: New => Triaged

** Changed in: python-glanceclient
   Importance: Undecided => Medium

** Tags added: low-hanging-fruit

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1886537

Title:
  Missing parameters in python-glanceclient image-import command
  documentation

Status in Glance Client:
  Triaged

Bug description:
  Description
  

  python-glanceclient documentation [0] for image-import command shows
  just an optional parameter:

  ```
  usage: glance image-import [--import-method ]
 
  ```

  But in the actual CLI there're more parameters available:
  ```
  $ glance help image-import
  usage: glance image-import [--import-method ] [--uri ]
 [--store ] [--stores ]
 [--all-stores [True|False]]
 [--allow-failure [True|False]]
 
  ```

  How to reproduce
  

  1. Open python-glanceclient image-import command documentation [0] 
  2. Check the params

  Expected behavior
  --

  To have all the parameters documented

  Actual behavior
  --

  Parameters are missing, compared to an actual `glance help image-
  import` command output.

  [0] https://docs.openstack.org/python-
  glanceclient/latest/cli/details.html#glance-image-import

To manage notifications about this bug go to:
https://bugs.launchpad.net/python-glanceclient/+bug/1886537/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1887405] [NEW] Race condition while processing security_groups_member_updated events (ipset)

2020-07-13 Thread Charles Farquhar
Public bug reported:

# Summary

Race condition while processing security_groups_member_updated events
(ipset)

# Overview

We have a customer that uses heat templates to deploy large environments
(e.g. 21 instances) with a significant number of security groups (e.g.
60) that use bi-directional remote group references for both ingress and
egress filtering.  These heat stacks are deployed using a CI pipeline
and intermittently suffer from application layer failures due to broken
network connectivity.  We found that this was caused by the ipsets used
to implement remote_group memberships missing IPs from their member
lists.  Troubleshooting suggests this is caused by a race condition,
which I've attempted to describe in detail below.

Version: `54e1a6b1bc378c0745afc03987d0fea241b826ae` (HEAD of
stable/rocky as of Jan 26, 2020), though I suspect this issue persists
through master.

I'm working on getting some multi-node environments deployed (I don't
think it's possible to reproduce this with a single hypervisor) and hope
to provide reproduction steps on Rocky and master soon.  I wanted to get
this report submitted as-is with the hopes that an experienced Neutron
dev might be able to spot possible solutions or provide diagnostic
insight that I am not yet able to produce.

I suspect this report may be easier to read with some markdown, so
please feel free to read it in a gist:
https://gist.github.com/cfarquhar/20fddf2000a83216021bd15b512f772b

Also, this diagram is probably critical to following along: https
://user-images.githubusercontent.com/1253665/87317744-0a75b180-c4ed-
11ea-9bad-085019c0f954.png

# Race condition symptoms

Given the following security groups/rules:

```
| secgroup name | secgroup id  | direction | remote 
group | dest port |
|---|--|---|--|---|
| server| fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | ingress   | 
b52c8c54-b97a-477d-8b68-f4075e7595d9 | 9092  |
| client| b52c8c54-b97a-477d-8b68-f4075e7595d9 | egress| 
fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | 9092  |
```

And the following instances:

```
| instance name | hypervisor | ip  | secgroup assignment |
|---||-|-|
| server01  | compute01  | 192.168.0.1 | server  |
| server02  | compute02  | 192.168.0.2 | server  |
| server03  | compute03  | 192.168.0.3 | server  |
| client01  | compute04  | 192.168.0.4 | client  |
```

We would expect to find the following ipset representing the `server`
security group members on `compute04`:

```
# ipset list NIPv4fcd6cf12-2ac9-4704-9208-
Name: NIPv4fcd6cf12-2ac9-4704-9208-
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 536
References: 4
Number of entries: 3
Members:
192.168.0.1
192.168.0.2
192.168.0.3
```

What we actually get when the race condition is triggered is an
incomplete list of members in the ipset.  The member list could contain
anywhere between zero and two of the expected IPs.

# Triggering the race condition

The problem occurs when `security_group_member_updated` events arrive between 
`port_update` steps 12 and 22 (see diagram and process details below).  
  - `port_update` step 12 retrieves the remote security groups' member lists, 
which are not necessarily complete yet.
  - `port_update` step 22 adds the port to `IptablesFirewallDriver.ports()`.

This results in `security_group_member_updated` step 3 looking for the
port to apply the updated member list to (in
`IptablesFirewallDriver.ports()`) BEFORE it has been added by
`port_update`'s step 22. This causes the membership update event to
effectively be discarded.  We are then left with whatever the remote
security group's member list was when the `port_update` process
retrieved it at step 12.  This state persists until something triggers
the port being re-added to the `updated_ports` list (e.g. agent restart,
another remote group membership change, local security group
addition/removal, etc).


# Race condition details

The race condition occurs in the linuxbridge agent between the two following 
operations:
  1) Processing a `port_update` event when an instance is first created
  2) Processing `security_group_member_updated` events for the instance's 
remote security groups.

Either of these operations can result in creating or mutating an ipset
from `IpsetManager.set_members()`.  The relevant control flow sequence
for each operation is listed below.  I've left out any branches that did
not seem to be relevant to the race condition.

## Processing a `port_update` event:
  1) We receive an RPC port_update event via 
`LinuxBridgeRpcCallbacks.port_update()`, which adds the tap device to the 
`LinuxBridgeRpcCallbacks.updated_devices` list
  2) Sleep until the next `CommonAgentLoop.daemon_loop()` 

[Yahoo-eng-team] [Bug 1837882] Re: while creating external network subnet range through Horizon UI helper message says give subnet range as comma separated but accept hyphen("-")

2020-07-13 Thread varun kumar yadav
** Project changed: horizon-cisco-ui => horizon

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1837882

Title:
  while creating external network subnet range through Horizon UI helper
  message says give subnet range as comma separated but accept
  hyphen("-")

Status in OpenStack Dashboard (Horizon):
  New

Bug description:
  Description: When user is trying to create an external network with
  subnet range using  hyphen "-" delimiter (20.x.x.10-20.x.x.100) even
  when the helper message says clearly notify user  the
  "(start_ip_range,end_ip_range) as comma separated delimeter"   is
  going through successfully without error message. please see the Ui
  attachment  for more info.

  Also after giving  Hyphen range in the external subnet. External
  network window leads to an unexpected error saying "Specify additional
  attributes for the subnet" without giving any proper error message.

  Pre - condition :
  create a tenant router r1  router with gateway external network .

  Step1 ->Create external network using admin user from Horizon.
  Step2 ->on the create network window . name -> external
 project->admin
 provider network type->external
 leave the physical network blank(but 
make sure user has tier 0 gateway set)
  Step3-> click next-> subnet name -> subnet1
   network address-> Try to give external network range as 
that of the gateway ip.
   Gateway IP-> Give the gateway IP of the subnet

  Step4-> under subnet details. Uncheck the "Enable Dhcp".
allocation pool-> give subnet range as 
(20.x.x.10-20.x.x.100)

  Step5-> Once user click "next" window comes back to step2  and further
  clicking  next leads to message "Specify additional attributes for the
  subnet"

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1837882/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1837882] [NEW] while creating external network subnet range through Horizon UI helper message says give subnet range as comma separated but accept hyphen("-")

2020-07-13 Thread Launchpad Bug Tracker
You have been subscribed to a public bug:

Description: When user is trying to create an external network with
subnet range using  hyphen "-" delimiter (20.x.x.10-20.x.x.100) even
when the helper message says clearly notify user  the
"(start_ip_range,end_ip_range) as comma separated delimeter"   is going
through successfully without error message. please see the Ui attachment
for more info.

Also after giving  Hyphen range in the external subnet. External
network window leads to an unexpected error saying "Specify additional
attributes for the subnet" without giving any proper error message.

Pre - condition :
create a tenant router r1  router with gateway external network .

Step1 ->Create external network using admin user from Horizon.
Step2 ->on the create network window . name -> external
   project->admin
   provider network type->external
   leave the physical network blank(but 
make sure user has tier 0 gateway set)
Step3-> click next-> subnet name -> subnet1
 network address-> Try to give external network range as 
that of the gateway ip.
 Gateway IP-> Give the gateway IP of the subnet

Step4-> under subnet details. Uncheck the "Enable Dhcp".
  allocation pool-> give subnet range as 
(20.x.x.10-20.x.x.100)

Step5-> Once user click "next" window comes back to step2  and further
clicking  next leads to message "Specify additional attributes for the
subnet"

** Affects: horizon
 Importance: Undecided
 Status: New

-- 
while creating external network subnet range through Horizon UI helper message 
says give subnet range as comma separated but accept hyphen("-")
https://bugs.launchpad.net/bugs/1837882
You received this bug notification because you are a member of Yahoo! 
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1688673] Re: cpu_realtime_mask handling is not intuitive

2020-07-13 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/461456
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=9fc63c764429c10f9041e6b53659e0cbd595bf6b
Submitter: Zuul
Branch:master

commit 9fc63c764429c10f9041e6b53659e0cbd595bf6b
Author: Chris Friesen 
Date:   Mon May 1 11:24:06 2017 -0600

hardware: Tweak the 'cpu_realtime_mask' handling slightly

If the end-user specifies a cpu_realtime_mask that does not begin
with a carat (i.e. it is not a purely-exclusion mask) it's likely
that they're expecting us to use the exact mask that they have
specified, not realizing that we default to all-vCPUs-are-RT.

Let's make nova's behaviour a bit more friendly by correctly
handling this scenario.

Note that the end-user impact of this is minimal/non-existent. As
discussed in bug #1884231, the only way a user could have used this
before would be if they'd configured an emulator thread and purposefully
set an invalid 'hw:cpu_realtime_mask' set. In fact, they wouldn't have
been able to use this value at all if they used API microversion 2.86
(extra spec validation).

Part of blueprint use-pcpu-and-vcpu-in-one-instance

Change-Id: Id81859186de6fb6b728ad566a532244008fe77d0
Closes-Bug: #1688673


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1688673

Title:
  cpu_realtime_mask handling is not intuitive

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The nova code implicitly assumes that all vCPUs are realtime in
  nova.virt.hardware.vcpus_realtime_topology(), and then it appends the
  user-specified mask.

  This only makes sense if the user-specified cpu_realtime_mask is an
  exclusion mask, but this isn't documented anywhere.

  It would make more sense to simply use the mask as passed-in from the
  end-user.

  In order to preserve the current behaviour we should probably special-
  case the scenario where the passed-in cpu_realtime_mask starts with a
  "^" (indicating an exclusion).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1688673/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1887385] [NEW] String to byte conversion should provide the encoding type

2020-07-13 Thread Rodolfo Alonso
Public bug reported:

In [1], in case self.port is a string, the encoding method should be
provided.

[1]https://github.com/openstack/neutron/blob/73557abefcba1c6ce0cef709d1082674c0217485/neutron/tests/functional/test_server.py#L231

** Affects: neutron
 Importance: Undecided
 Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1887385

Title:
  String to byte conversion should provide the encoding type

Status in neutron:
  In Progress

Bug description:
  In [1], in case self.port is a string, the encoding method should be
  provided.

  
[1]https://github.com/openstack/neutron/blob/73557abefcba1c6ce0cef709d1082674c0217485/neutron/tests/functional/test_server.py#L231

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1887385/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1887380] [NEW] Attaching virtual GPU devices to guests in nova

2020-07-13 Thread ryan
Public bug reported:


This bug tracker is for errors with the documentation, use the following
as a template and remove or add fields as you see fit. Convert [ ] into
[x] to check boxes:

- [X] This is a doc addition request.

Hi, a problem came up when we are using nova(Queens) configured with the
vGPU feature to create several instances. It seems multiple instances
preempt the same vGPU resource, in our case, on the exact same instance
which has acquired a vGPU already. Here is the error reported in the
log:

"libvirt.libvirtError: Requested operation is not valid: mediated device
/sys/bus/mdev/devices/xxx is in use by driver QEMU, domain xxx"

Apparently, nova is trying to allocate the vGPU resource that is already
being used by another instance. Also, we ruled out a situation that
there is not enough vGPU resources on the host. In our case, 25% of
instances fell into error-creating state while we are only creating
instances which only need 50% of all vGPU resources. From our
perspective, the problem is with the nova-scheduler. Any idea how to
work this out?

Thanks

Ruien Zhang
zhangru...@bytedance.com

---
Release: 21.1.0.dev214 on 2020-04-28 20:09:00
SHA: d19f1ac47b0a5fe1dd80b7187087e5810501f16c
Source: https://opendev.org/openstack/nova/src/doc/source/admin/virtual-gpu.rst
URL: https://docs.openstack.org/nova/latest/admin/virtual-gpu.html

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: doc

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1887380

Title:
  Attaching virtual GPU devices to guests in nova

Status in OpenStack Compute (nova):
  New

Bug description:

  This bug tracker is for errors with the documentation, use the
  following as a template and remove or add fields as you see fit.
  Convert [ ] into [x] to check boxes:

  - [X] This is a doc addition request.

  Hi, a problem came up when we are using nova(Queens) configured with
  the vGPU feature to create several instances. It seems multiple
  instances preempt the same vGPU resource, in our case, on the exact
  same instance which has acquired a vGPU already. Here is the error
  reported in the log:

  "libvirt.libvirtError: Requested operation is not valid: mediated
  device /sys/bus/mdev/devices/xxx is in use by driver QEMU, domain xxx"

  Apparently, nova is trying to allocate the vGPU resource that is
  already being used by another instance. Also, we ruled out a situation
  that there is not enough vGPU resources on the host. In our case, 25%
  of instances fell into error-creating state while we are only creating
  instances which only need 50% of all vGPU resources. From our
  perspective, the problem is with the nova-scheduler. Any idea how to
  work this out?

  Thanks

  Ruien Zhang
  zhangru...@bytedance.com

  ---
  Release: 21.1.0.dev214 on 2020-04-28 20:09:00
  SHA: d19f1ac47b0a5fe1dd80b7187087e5810501f16c
  Source: 
https://opendev.org/openstack/nova/src/doc/source/admin/virtual-gpu.rst
  URL: https://docs.openstack.org/nova/latest/admin/virtual-gpu.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1887380/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1872671] Re: [Focal] Policy files are missing

2020-07-13 Thread Corey Bryant
This bug was fixed in the package horizon - 
3:18.4.2~git2020070209.392bc2482-0ubuntu1~cloud0
---

 horizon (3:18.4.2~git2020070209.392bc2482-0ubuntu1~cloud0) focal-victoria; 
urgency=medium
 .
   * New upstream release for the Ubuntu Cloud Archive.
 .
 horizon (3:18.4.2~git2020070209.392bc2482-0ubuntu1) groovy; urgency=medium
 .
   * New upstream snapshot for OpenStack Victoria.
   * d/control: Align (Build-)Depends with upstream.
   * d/p/fix-skipped-config-files.patch: Dropped. Fixed upstream.
   * d/control: Update Standards-Version to 4.5.0.
 .
 horizon (3:18.3.2-0ubuntu2) groovy; urgency=medium
 .
   * d/p/fix-skipped-config-files.patch: Ensure that config files
 are included in the package (LP: #1872671).
 .
 horizon (3:18.3.2-0ubuntu1) groovy; urgency=medium
 .
   * New upstream release for OpenStack Ussuri (LP: #1877642).


** Changed in: cloud-archive
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1872671

Title:
  [Focal] Policy files are missing

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Triaged
Status in OpenStack Dashboard (Horizon):
  Fix Released
Status in horizon package in Ubuntu:
  Fix Released
Status in horizon source package in Focal:
  Triaged

Bug description:
  [Impact]
  [Test Case]
  python3-django-horizon:
    Installed: 3:18.2.1~git2020032709.2c4470272-0ubuntu1

  After a fresh install of openstack dashboard on focal, apache2
  error.log contains hundreds of error message about missing policy
  files:

  [Tue Apr 14 09:12:34.558183 2020] [wsgi:error] [pid 3062:tid 140253993006848] 
[remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for 
service 'identity' in 
/usr/lib/python3/dist-packages/openstack_dashboard/conf/keystone_policy.json
  [Tue Apr 14 09:12:34.559486 2020] [wsgi:error] [pid 3062:tid 140253993006848] 
[remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for 
service 'compute' in 
/usr/lib/python3/dist-packages/openstack_dashboard/conf/nova_policy.json and 
files under 
['/usr/lib/python3/dist-packages/openstack_dashboard/conf/nova_policy.d']
  [Tue Apr 14 09:12:34.560622 2020] [wsgi:error] [pid 3062:tid 140253993006848] 
[remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for 
service 'volume' in 
/usr/lib/python3/dist-packages/openstack_dashboard/conf/cinder_policy.json and 
files under 
['/usr/lib/python3/dist-packages/openstack_dashboard/conf/cinder_policy.d']
  [Tue Apr 14 09:12:34.561703 2020] [wsgi:error] [pid 3062:tid 140253993006848] 
[remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for 
service 'image' in 
/usr/lib/python3/dist-packages/openstack_dashboard/conf/glance_policy.json
  [Tue Apr 14 09:12:34.562703 2020] [wsgi:error] [pid 3062:tid 140253993006848] 
[remote 10.64.255.1:50364] WARNING openstack_auth.policy No policy rules for 
service 'network' in 
/usr/lib/python3/dist-packages/openstack_dashboard/conf/neutron_policy.json

  The policy files are indeed missing from the package:

  dpkg -L python3-django-horizon | grep json$
  /usr/lib/python3/dist-packages/horizon/xstatic/pkg/angular/data/errors.json
  /usr/lib/python3/dist-packages/horizon/xstatic/pkg/angular/data/version.json
  /usr/lib/python3/dist-packages/horizon-18.2.1.dev1.egg-info/pbr.json

  Logging in with a normal user account (without admin role) still shows
  the admin panel and buttons a normal user cannot use, like
  identity/users "create user". Trying to use these either doesn't work
  or throws errors like "Unable to retrieve xxx".

  Copying the policy files from the source package solves the problem.

  [Regression Potential]
  Very low, this patch is just adding necessary policy files back into the 
package that have already existed in prior releases.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1872671/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1887377] [NEW] nova does not loadbalance asignmnet of resources on a host based on avaiablity of pci device, hugepages or pcpus.

2020-07-13 Thread sean mooney
Public bug reported:

Nova has supported hugpages, cpu pinning and pci numa affintiy for a very long
time. since its introduction the advice has always been to create a flavor that 
mimic your
typeical hardware toplogy. i.e. if all your compute host have 2 numa nodes the 
you should create
flavor that request 2 numa nodes. for along time operators have ignored this 
advice
and continued to create singel numa node flavor sighting that after 5+ year of 
hardware venders
working with VNF vendor to make there product numa aware, vnf often still do 
not optimize
properly for a multi numa environment.

as a result many operator still deploy single numa vms although that is
becoming less common over time.  when you deploy a vm with a single numa
node today we more or less iterate over the host numa node in order and
assign the vm to the first numa nodes where it fits. on a host without
any pci devices whitelisted for openstack management this behvaior
result in numa nodes being filled linerally form numa 0 to numa n. that
mean if a host had 100G of hugepage on both numa node 0 and 1 and you
schduled 101 1G singel numa vms to the host, 100 vm would spawn on numa0
and 1 vm would spwan on numa node 1.

that means that the first 100 vms would all contened for cpu resouces on
the first numa node while the last vm had all of the secound numa ndoe
to its own use.

the correct behavior woudl be for nova to round robin asign the vms
attepmetin to keep the resouce avapiableity  blanced. this will
maxiumise performance for indivigual vms while pessimisng the schduling
of large vms on a host.

to this end a new numa blancing config option (unset, pack or spread)
should be added and we should sort numa nodes in decending(spread) or
acending(pack) order based on pMEM, pCPUs, mempages and pci devices in
that sequence.

in future release when numa is in placment this sorting will need to be
done in a weigher that sorts the allocation caindiates based on the same
pack/spread cirtira.

i am filing this as a bug not a feature as this will have a significant
impact for existing deployment that either expected
https://specs.openstack.org/openstack/nova-specs/specs/pike/implemented
/reserve-numa-with-pci.html to implement this logic already or who do
not follow our existing guidance on creating flavor that align to the
host topology.

** Affects: nova
 Importance: Undecided
 Assignee: sean mooney (sean-k-mooney)
 Status: New


** Tags: numa

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1887377

Title:
  nova does not loadbalance asignmnet of resources on a host based on
  avaiablity of pci device, hugepages or pcpus.

Status in OpenStack Compute (nova):
  New

Bug description:
  Nova has supported hugpages, cpu pinning and pci numa affintiy for a very long
  time. since its introduction the advice has always been to create a flavor 
that mimic your
  typeical hardware toplogy. i.e. if all your compute host have 2 numa nodes 
the you should create
  flavor that request 2 numa nodes. for along time operators have ignored this 
advice
  and continued to create singel numa node flavor sighting that after 5+ year 
of hardware venders
  working with VNF vendor to make there product numa aware, vnf often still do 
not optimize
  properly for a multi numa environment.

  as a result many operator still deploy single numa vms although that
  is becoming less common over time.  when you deploy a vm with a single
  numa node today we more or less iterate over the host numa node in
  order and assign the vm to the first numa nodes where it fits. on a
  host without any pci devices whitelisted for openstack management this
  behvaior result in numa nodes being filled linerally form numa 0 to
  numa n. that mean if a host had 100G of hugepage on both numa node 0
  and 1 and you schduled 101 1G singel numa vms to the host, 100 vm
  would spawn on numa0 and 1 vm would spwan on numa node 1.

  that means that the first 100 vms would all contened for cpu resouces
  on the first numa node while the last vm had all of the secound numa
  ndoe to its own use.

  the correct behavior woudl be for nova to round robin asign the vms
  attepmetin to keep the resouce avapiableity  blanced. this will
  maxiumise performance for indivigual vms while pessimisng the
  schduling of large vms on a host.

  to this end a new numa blancing config option (unset, pack or spread)
  should be added and we should sort numa nodes in decending(spread) or
  acending(pack) order based on pMEM, pCPUs, mempages and pci devices in
  that sequence.

  in future release when numa is in placment this sorting will need to
  be done in a weigher that sorts the allocation caindiates based on the
  same pack/spread cirtira.

  i am filing this as a bug not a feature as this will have a
  significant impact for existing deployment that either 

[Yahoo-eng-team] [Bug 1887363] [NEW] [ovn-octavia-provider] Functional tests job fails

2020-07-13 Thread Maciej Jozefczyk
Public bug reported:

Functional tests job fails on:

2020-07-13 08:22:50.145117 | controller | + 
/home/zuul/src/opendev.org/openstack/neutron/tools/configure_for_func_testing.sh:_install_base_deps:113
 :   source 
/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/devstack/lib/ovs
2020-07-13 08:22:50.145252 | controller | 
/home/zuul/src/opendev.org/openstack/neutron/tools/configure_for_func_testing.sh:
 line 113: 
/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/devstack/lib/ovs: No 
such file or directory

https://9ce43a75e3387ceb8909-2b4f2fa211fea8445ec0f4a568f6056b.ssl.cf2.rackcdn.com/740625/1/check
/ovn-octavia-provider-functional/714ba02/job-output.txt

** Affects: neutron
 Importance: Undecided
 Assignee: Maciej Jozefczyk (maciej.jozefczyk)
 Status: New


** Tags: ovn-octavia-provider

** Changed in: neutron
 Assignee: (unassigned) => Maciej Jozefczyk (maciej.jozefczyk)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1887363

Title:
  [ovn-octavia-provider] Functional tests job fails

Status in neutron:
  New

Bug description:
  Functional tests job fails on:

  2020-07-13 08:22:50.145117 | controller | + 
/home/zuul/src/opendev.org/openstack/neutron/tools/configure_for_func_testing.sh:_install_base_deps:113
 :   source 
/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/devstack/lib/ovs
  2020-07-13 08:22:50.145252 | controller | 
/home/zuul/src/opendev.org/openstack/neutron/tools/configure_for_func_testing.sh:
 line 113: 
/home/zuul/src/opendev.org/openstack/ovn-octavia-provider/devstack/lib/ovs: No 
such file or directory

  
https://9ce43a75e3387ceb8909-2b4f2fa211fea8445ec0f4a568f6056b.ssl.cf2.rackcdn.com/740625/1/check
  /ovn-octavia-provider-functional/714ba02/job-output.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1887363/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp