[Yahoo-eng-team] [Bug 1896603] Re: ovn-octavia-provider: Cannot create listener due to alowed_cidrs validation

2020-09-24 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/753302
Committed: 
https://git.openstack.org/cgit/openstack/ovn-octavia-provider/commit/?id=76b20882aa9fef3c693e45c2b504224a44e84ce8
Submitter: Zuul
Branch:master

commit 76b20882aa9fef3c693e45c2b504224a44e84ce8
Author: Brian Haley 
Date:   Tue Sep 22 08:34:29 2020 -0400

Fix the check for allowed_cidrs in listeners

The allowed_cidrs value could be an empty list if the
request involves the sdk, so change the check to
account for that.

Change-Id: I2df7e5a944cbd40c60943ad105f6e09f7afa85a9
Closes-bug: #1896603


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1896603

Title:
  ovn-octavia-provider: Cannot create listener due to alowed_cidrs
  validation

Status in neutron:
  Fix Released

Bug description:
  Kuryr-Kubernetes tests running with ovn-octavia-provider started to
  fail with "Provider 'ovn' does not support a requested option: OVN
  provider does not support allowed_cidrs option" showing up in the
  o-api logs.

  We've tracked that to check [1] getting introduced. Apparently it's
  broken and makes the request explode even if the property isn't set at
  all. Please take a look at output from python-openstackclient [2]
  where body I used is just '{"listener": {"loadbalancer_id": "faca9a1b-
  30dc-45cb-80ce-2ab1c26b5521", "protocol": "TCP", "protocol_port": 80,
  "admin_state_up": true}}'.

  Also this is all over your gates as well, see o-api log [3]. Somehow
  ovn-octavia-provider tests skip 171 results there, so that's why it's
  green.

  [1] 
https://opendev.org/openstack/ovn-octavia-provider/src/branch/master/ovn_octavia_provider/driver.py#L142
  [2] http://paste.openstack.org/show/798197/
  [3] 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_4ba/751085/7/gate/ovn-octavia-provider-v2-dsvm-scenario/4bac575/controller/logs/screen-o-api.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1896603/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1896617] Re: Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri

2020-09-24 Thread Corey Bryant
As background, adding libvirt-qemu user to the nova group was an attempt
to make /var/lib/nova/* directories more restricted, but that proved to
be difficult with ownership changes between changes nova and
libvirt/qemu.

** Summary changed:

- Creation of image (or live snapshot) from the existing VM fails if 
libvirt-image-backend is configured to qcow2 starting from Ussuri
+ [SRU] Creation of image (or live snapshot) from the existing VM fails if 
libvirt-image-backend is configured to qcow2 starting from Ussuri

** Also affects: nova (Ubuntu Groovy)
   Importance: Critical
 Assignee: Corey Bryant (corey.bryant)
   Status: Triaged

** Also affects: nova (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Changed in: nova (Ubuntu Focal)
   Status: New => Triaged

** Changed in: nova (Ubuntu Focal)
   Importance: Undecided => Critical

** Changed in: nova (Ubuntu Focal)
 Assignee: (unassigned) => Corey Bryant (corey.bryant)

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/ussuri
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/victoria
   Importance: Undecided
   Status: New

** Changed in: cloud-archive/ussuri
   Status: New => Triaged

** Changed in: cloud-archive/victoria
   Status: New => Triaged

** Changed in: cloud-archive/victoria
   Importance: Undecided => Critical

** Changed in: cloud-archive/ussuri
   Importance: Undecided => Critical

** Changed in: cloud-archive/victoria
 Assignee: (unassigned) => Corey Bryant (corey.bryant)

** Changed in: cloud-archive/ussuri
 Assignee: (unassigned) => Corey Bryant (corey.bryant)

** Description changed:

+ [Impact]
+ 
  tl;dr
  
  1) creating the image from the existing VM fails if qcow2 image backend is 
used, but everything is fine if using rbd image backend in nova-compute.
  2) openstack server image create --name   fails with some unrelated error:
  
  $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc
  HTTP 404 Not Found: No image found with ID 
f4693860-cd8d-4088-91b9-56b2f173ffc7
  
  == Details ==
  
  Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists [0]
  are failing with the following exception:
  
  49701867-bedc-4d7d-aa71-7383d877d90c
  Traceback (most recent call last):
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py",
 line 369, in create_image_from_server
  waiters.wait_for_image_status(client, image_id, wait_until)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py",
 line 161, in wait_for_image_status
  image = show_image(image_id)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py",
 line 74, in show_image
  resp, body = self.get("images/%s" % image_id)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 298, in get
  return self.request('GET', url, extra_headers, headers)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py",
 line 48, in request
  method, url, extra_headers, headers, body, chunked)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 687, in request
  self._error_checker(resp, resp_body)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 793, in _error_checker
  raise exceptions.NotFound(resp_body, resp=resp)
  tempest.lib.exceptions.NotFound: Object not found
  Details: {'code': 404, 'message': 'Image not found.'}
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py",
 line 69, in test_create_delete_image
  wait_until='ACTIVE')
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py",
 line 384, in create_image_from_server
  image_id=image_id)
  tempest.exceptions.SnapshotNotFoundException: Server snapshot image 
d82e95b0-9c62-492d-a08c-5bb118d3bf56 not found.
  
  So far I was able to identify the following:
  
  1) 
https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_images_oneserver.py#L69
 invokes a "create image from server"
  2) It fails with the following error message in the 

[Yahoo-eng-team] [Bug 1896617] Re: Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri

2020-09-24 Thread Corey Bryant
This is caused because the libvirt-qemu user is added to the nova group
as part of the nova-compute-libvirt package post-install script.

Following up on comment #17 above, the user/group of the delta file
changes from nova:nova to libvirt-qemu:kvm, whereas in comment #21
above, the user/group of the delta file changes to nova:kvm.

Dropping libvirt-qemu from nova in /etc/group fixes this as a work-
around. I'm building packages with a fix now and will get this fixed for
ussuri and victoria.

Marking the upstream bug as invalid.


** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896617

Title:
  [SRU] Creation of image (or live snapshot) from the existing VM fails
  if libvirt-image-backend is configured to qcow2 starting from Ussuri

Status in OpenStack nova-compute charm:
  Invalid
Status in Ubuntu Cloud Archive:
  Triaged
Status in Ubuntu Cloud Archive ussuri series:
  Triaged
Status in Ubuntu Cloud Archive victoria series:
  Triaged
Status in OpenStack Compute (nova):
  Invalid
Status in nova package in Ubuntu:
  Triaged
Status in nova source package in Focal:
  Triaged
Status in nova source package in Groovy:
  Triaged

Bug description:
  [Impact]

  tl;dr

  1) creating the image from the existing VM fails if qcow2 image backend is 
used, but everything is fine if using rbd image backend in nova-compute.
  2) openstack server image create --name   fails with some unrelated error:

  $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc
  HTTP 404 Not Found: No image found with ID 
f4693860-cd8d-4088-91b9-56b2f173ffc7

  == Details ==

  Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists
  [0] are failing with the following exception:

  49701867-bedc-4d7d-aa71-7383d877d90c
  Traceback (most recent call last):
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py",
 line 369, in create_image_from_server
  waiters.wait_for_image_status(client, image_id, wait_until)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py",
 line 161, in wait_for_image_status
  image = show_image(image_id)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py",
 line 74, in show_image
  resp, body = self.get("images/%s" % image_id)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 298, in get
  return self.request('GET', url, extra_headers, headers)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py",
 line 48, in request
  method, url, extra_headers, headers, body, chunked)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 687, in request
  self._error_checker(resp, resp_body)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 793, in _error_checker
  raise exceptions.NotFound(resp_body, resp=resp)
  tempest.lib.exceptions.NotFound: Object not found
  Details: {'code': 404, 'message': 'Image not found.'}

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py",
 line 69, in test_create_delete_image
  wait_until='ACTIVE')
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py",
 line 384, in create_image_from_server
  image_id=image_id)
  tempest.exceptions.SnapshotNotFoundException: Server snapshot image 
d82e95b0-9c62-492d-a08c-5bb118d3bf56 not found.

  So far I was able to identify the following:

  1) 
https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_images_oneserver.py#L69
 invokes a "create image from server"
  2) It fails with the following error message in the nova-compute logs: 
https://pastebin.canonical.com/p/h6ZXdqjRRm/

  The same occurs if the "openstack server image create --wait" will be
  executed; however, according to
  https://docs.openstack.org/nova/ussuri/admin/migrate-instance-with-
  snapshot.html the VM has to be shut down before the image creation:

  "Shut down the source VM before you take the snapshot to ensure that
  all data is flushed to 

[Yahoo-eng-team] [Bug 1886298] Re: Few of the lower constraints are not compatible with python3.8

2020-09-24 Thread Radosław Piliszek
** Changed in: masakari
   Status: In Progress => Fix Released

** Changed in: masakari
Milestone: None => 10.0.0.0rc1

** Changed in: masakari
 Assignee: ZHOU LINHUI (zhoulinhui) => Radosław Piliszek (yoctozepto)

** Also affects: masakari/victoria
   Importance: Undecided
 Assignee: Radosław Piliszek (yoctozepto)
   Status: Fix Released

** Changed in: masakari/victoria
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1886298

Title:
  Few of the lower constraints are not compatible with python3.8

Status in castellan:
  In Progress
Status in ec2-api:
  In Progress
Status in futurist:
  Fix Released
Status in OpenStack Dashboard (Horizon):
  Fix Released
Status in kolla:
  Fix Released
Status in kolla-ansible:
  Fix Released
Status in OpenStack Shared File Systems Service (Manila):
  Fix Committed
Status in manila-ui:
  Fix Released
Status in masakari:
  Fix Released
Status in masakari victoria series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in os-win:
  New
Status in oslo.messaging:
  In Progress
Status in oslo.policy:
  In Progress
Status in oslo.privsep:
  Fix Released
Status in oslo.reports:
  Fix Released
Status in oslo.vmware:
  Fix Released
Status in Glance Client:
  New
Status in python-keystoneclient:
  Fix Committed
Status in python-manilaclient:
  Fix Released
Status in python-novaclient:
  Fix Released
Status in python-senlinclient:
  New
Status in python-troveclient:
  New
Status in python-watcherclient:
  New
Status in Solum:
  New
Status in tacker:
  Fix Released
Status in taskflow:
  New
Status in tripleo-validations:
  New
Status in watcher:
  New

Bug description:
  lower constraint are being tested with python.6 till now and jobs
  running fine. With the migration of testing to ubuntu focal where
  python3.8 is default, lower-constraints job started failing due to
  multiple issues.

  For example,

  Markupsafe 1.0 not compatible with new setuptools:
  - https://github.com/pallets/markupsafe/issues/116

  paramiko 2.7.1 fixed the compatiblity for python3.7 onwards:
  https://github.com/paramiko/paramiko/issues/1108

  greenlet 0.4.15 added wheels for python 3.8:
  https://github.com/python-greenlet/greenlet/issues/151

  numpy 1.19.1 added python 3.8 support and testing:
  https://github.com/numpy/numpy/pull/14775

  paramiko 2.7.1 fixed the compatibility for python3.7 onwards:
  
https://github.com/paramiko/paramiko/commit/4753881223e0ff5e3b3be35bb687a18dfec4f672

  Similarly there are many dependencies which added the python3.8
  support in their later version. So we need to bump their lower
  constraints to compatible version.

  Approach to identify the required bump is by running lower-constraint job on 
Focal and star bumping for the failed things. I started with nova repos
  and found below version bump:

  For Nova:
  Markupsafe==1.1.1
  cffi==1.14.0
  greenlet==0.4.15
  PyYAML==3.13
  lxml==4.5.0
  numpy==1.19.0
  psycopg2==2.8
  paramiko==2.7.1

  For python-novaclient:
  Markupsafe==1.1.1
  cffi==1.14.0
  greenlet==0.4.15
  PyYAML==3.13

  For os-vif:
  Markupsafe==1.1.1
  cffi==1.14.0

To manage notifications about this bug go to:
https://bugs.launchpad.net/castellan/+bug/1886298/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1897118] Re: nova-compute does not start in devstack-platform-opensuse-15 job due to < 4.0.0 qemu version

2020-09-24 Thread Ghanshyam Mann
adding devstack also as devstack needs to move to opensuse 15.2[1] which
has qemu 4.2.0 available

also move jobs to run on opensuse 15.2 will fix this.

[1]https://github.com/openstack/devstack/blob/5aa38f51b3dd0660a0622aecd65937d3c56eedc2/stack.sh#L224


** Also affects: devstack
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1897118

Title:
  nova-compute does not start in devstack-platform-opensuse-15 job due
  to < 4.0.0 qemu version

Status in devstack:
  New
Status in OpenStack Compute (nova):
  New

Bug description:
  Nova bumped minimum qemu version to 4.0.0 [1]. It seem that the
  devstack-platform-opensuse-15 non-voting job has older qemu version
  than that. Therefore nova-compute service does not start[2].

  [1] https://review.opendev.org/#/c/746981
  [2] 
https://a5f2733c1907b1f26b90-5593d50c131879f6a486eeedbad80e3c.ssl.cf5.rackcdn.com/743800/14/check/devstack-platform-opensuse-15/91eeaf7/controller/logs/screen-n-cpu.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/1897118/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1897118] [NEW] nova-compute does not start in devstack-platform-opensuse-15 job due to < 4.0.0 qemu version

2020-09-24 Thread Balazs Gibizer
Public bug reported:

Nova bumped minimum qemu version to 4.0.0 [1]. It seem that the
devstack-platform-opensuse-15 non-voting job has older qemu version than
that. Therefore nova-compute service does not start[2].

[1] https://review.opendev.org/#/c/746981
[2] 
https://a5f2733c1907b1f26b90-5593d50c131879f6a486eeedbad80e3c.ssl.cf5.rackcdn.com/743800/14/check/devstack-platform-opensuse-15/91eeaf7/controller/logs/screen-n-cpu.txt

** Affects: devstack
 Importance: Undecided
 Status: New

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: gate-failure

** Tags added: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1897118

Title:
  nova-compute does not start in devstack-platform-opensuse-15 job due
  to < 4.0.0 qemu version

Status in devstack:
  New
Status in OpenStack Compute (nova):
  New

Bug description:
  Nova bumped minimum qemu version to 4.0.0 [1]. It seem that the
  devstack-platform-opensuse-15 non-voting job has older qemu version
  than that. Therefore nova-compute service does not start[2].

  [1] https://review.opendev.org/#/c/746981
  [2] 
https://a5f2733c1907b1f26b90-5593d50c131879f6a486eeedbad80e3c.ssl.cf5.rackcdn.com/743800/14/check/devstack-platform-opensuse-15/91eeaf7/controller/logs/screen-n-cpu.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/1897118/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1896741] Re: Intel mediated device info doesn't provide a name attribute

2020-09-24 Thread Sylvain Bauza
** Changed in: nova/ussuri
   Status: New => Confirmed

** Also affects: nova/victoria
   Importance: Low
 Assignee: Sylvain Bauza (sylvain-bauza)
   Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896741

Title:
  Intel mediated device info doesn't provide a name attribute

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) train series:
  Confirmed
Status in OpenStack Compute (nova) ussuri series:
  Confirmed
Status in OpenStack Compute (nova) victoria series:
  In Progress

Bug description:
  When testing some Xeon server for virtual GPU support, I saw that Nova
  provides an exception as the i915 driver doesn't provide a name for
  mdev types :

  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager Traceback (most recent call last):
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", line 
9824, in _update_available_resource_for_node
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 896, in update_available_resource
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update_available_resource(context, resources, 
startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/oslo_concurrency/lockutils.py", line 
360, in inner
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return f(*args, **kwargs)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 981, in _update_available_resource
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update(context, cn, startup=startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 1233, in _update
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self._update_to_placement(context, compute_node, 
startup)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 49, in wrapped_f
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 206, in call
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager return attempt.get(self._wrap_exception)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 247, in get
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager six.reraise(self.value[0], self.value[1], 
self.value[2])
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/usr/local/lib/python3.7/site-packages/six.py", 
line 703, in reraise
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager raise value
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File 
"/usr/local/lib/python3.7/site-packages/retrying.py", line 200, in call
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, 
False)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/compute/resource_tracker.py", 
line 1169, in _update_to_placement
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager self.driver.update_provider_tree(prov_tree, nodename)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 
7857, in update_provider_tree
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager provider_tree, nodename, allocations=allocations)
  Sep 23 06:00:19 mymachine.redhat.com nova-compute[195458]: ERROR 
nova.compute.manager   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 
8250, in _update_provider_tree_for_vgpu
  Sep 23 06:00:19 

[Yahoo-eng-team] [Bug 1896617] Re: Creation of image (or live snapshot) from the existing VM fails if libvirt-image-backend is configured to qcow2 starting from Ussuri

2020-09-24 Thread Corey Bryant
I'm still really confused by this but some thoughts on the nova
os.chmod() call mentioned in an earlier commit that would fix this.

If I chmod the tmp dir that gets created by nova (e.g.
/var/lib/nova/instances/snapshots/tmpkajuir8o) to 755 just before the
snapshot (after the nova chmod), the snapshot is successful.

As mentioned in
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1896617/comments/18,
the upstream nova code sets permissions for the tmp dir with:

os.chmod(tmpdir, 0o701)

That code has been that way since 2015, so it's not new in ussuri, see
git blame:

824c3706a3e nova/virt/libvirt/driver.py (Nicolas Simonds   
2015-07-23 12:47:24 -0500  2388) # NOTE(xqueralt): 
libvirt needs o+x in the tempdir
824c3706a3e nova/virt/libvirt/driver.py (Nicolas Simonds   
2015-07-23 12:47:24 -0500  2389) os.chmod(tmpdir, 0o701)

However, this seems like a heavy handed chmod if the goal, as the
comment above it mentions, is to give libvirt o+x in the tempdir. I say
this because it overrides any default permissions that were set
previously by the operating system.

It seems that this should really be a lighter touch such as the
following (equivalent to chmod o+x tmpdir):

st = os.stat(tmpdir)
os.chmod(tmpdir, st.st_mode | stat.S_IXOTH)

That would fix this bug for us, but still doesn't explain what changed
in Ubuntu to cause this to fail. We did make some permissions changes in
the nova package in focal but as compared above (with ussuri-proposed)
file/directory permissions above in comment #21 I'm seeing no
differences.

** Changed in: nova
   Status: Invalid => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896617

Title:
  Creation of image (or live snapshot) from the existing VM fails if
  libvirt-image-backend is configured to qcow2 starting from Ussuri

Status in OpenStack nova-compute charm:
  Invalid
Status in OpenStack Compute (nova):
  New
Status in nova package in Ubuntu:
  Triaged

Bug description:
  tl;dr

  1) creating the image from the existing VM fails if qcow2 image backend is 
used, but everything is fine if using rbd image backend in nova-compute.
  2) openstack server image create --name   fails with some unrelated error:

  $ openstack server image create --wait 842fa12c-19ee-44cb-bb31-36d27ec9d8fc
  HTTP 404 Not Found: No image found with ID 
f4693860-cd8d-4088-91b9-56b2f173ffc7

  == Details ==

  Two Tempest tests ([1] and [2]) from the 2018.02 Refstack test lists
  [0] are failing with the following exception:

  49701867-bedc-4d7d-aa71-7383d877d90c
  Traceback (most recent call last):
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py",
 line 369, in create_image_from_server
  waiters.wait_for_image_status(client, image_id, wait_until)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/common/waiters.py",
 line 161, in wait_for_image_status
  image = show_image(image_id)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/images_client.py",
 line 74, in show_image
  resp, body = self.get("images/%s" % image_id)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 298, in get
  return self.request('GET', url, extra_headers, headers)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/services/compute/base_compute_client.py",
 line 48, in request
  method, url, extra_headers, headers, body, chunked)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 687, in request
  self._error_checker(resp, resp_body)
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/lib/common/rest_client.py",
 line 793, in _error_checker
  raise exceptions.NotFound(resp_body, resp=resp)
  tempest.lib.exceptions.NotFound: Object not found
  Details: {'code': 404, 'message': 'Image not found.'}

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/images/test_images_oneserver.py",
 line 69, in test_create_delete_image
  wait_until='ACTIVE')
    File 
"/home/ubuntu/snap/fcbtest/14/.rally/verification/verifier-2d9cbf4d-fcbb-491d-848d-5137a9bde99e/repo/tempest/api/compute/base.py",
 line 384, in 

[Yahoo-eng-team] [Bug 1896463] Re: evacuation failed: Port update failed : Unable to correlate PCI slot

2020-09-24 Thread sean mooney
just adding the previous filed downstream redhat bug
https://bugzilla.redhat.com/show_bug.cgi?id=1852110

this can happen in queens for context so when we root cause the issue
and fix it it should like be backported to queens. tjere are other older
bugs form newton that look similar related to unshelve so its posible
that the same issue is affecting multiple move operations.

** Bug watch added: Red Hat Bugzilla #1852110
   https://bugzilla.redhat.com/show_bug.cgi?id=1852110

** Also affects: nova/train
   Importance: Undecided
   Status: New

** Also affects: nova/stein
   Importance: Undecided
   Status: New

** Also affects: nova/ussuri
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/victoria
   Importance: Low
 Assignee: Balazs Gibizer (balazs-gibizer)
   Status: Confirmed

** Also affects: nova/rocky
   Importance: Undecided
   Status: New

** Changed in: nova/ussuri
   Importance: Undecided => Low

** Changed in: nova/ussuri
   Status: New => Triaged

** Changed in: nova/train
   Importance: Undecided => Low

** Changed in: nova/train
   Status: New => Triaged

** Changed in: nova/stein
   Importance: Undecided => Low

** Changed in: nova/stein
   Status: New => Triaged

** Changed in: nova/rocky
   Importance: Undecided => Low

** Changed in: nova/rocky
   Status: New => Triaged

** Changed in: nova/queens
   Importance: Undecided => Low

** Changed in: nova/queens
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896463

Title:
  evacuation failed: Port update failed : Unable to correlate PCI slot

Status in OpenStack Compute (nova):
  Confirmed
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged
Status in OpenStack Compute (nova) stein series:
  Triaged
Status in OpenStack Compute (nova) train series:
  Triaged
Status in OpenStack Compute (nova) ussuri series:
  Triaged
Status in OpenStack Compute (nova) victoria series:
  Confirmed

Bug description:
  Description
  ===
  if the _update_available_resource() of resource_tracker is called between 
_do_rebuild_instance_with_claim() and instance.save() when evacuating VM 
instances on destination host,  

  nova/compute/manager.py

  2931 def rebuild_instance(self, context, instance, orig_image_ref, 
image_ref,
  2932 +-- 84 lines: injected_files, new_pass, 
orig_sys_metadata,---
  3016 claim_ctxt = rebuild_claim(
  3017 context, instance, scheduled_node,
  3018 limits=limits, image_meta=image_meta,
  3019 migration=migration)
  3020 self._do_rebuild_instance_with_claim(
  3021 +-- 47 lines: claim_ctxt, context, instance, 
orig_image_ref,-
  3068 instance.apply_migration_context()
  3069 # NOTE (ndipanov): This save will now update the host 
and node
  3070 # attributes making sure that next RT pass is consistent 
since
  3071 # it will be based on the instance and not the migration 
DB
  3072 # entry.
  3073 instance.host = self.host
  3074 instance.node = scheduled_node
  3075 instance.save()
  3076 instance.drop_migration_context()

  the instance is not handled as managed instance of the destination
  host because it is not updated on DB yet.

  2020-09-19 07:27:36.321 8 WARNING nova.compute.resource_tracker [req-
  b35d5b9a-0786-4809-bd81-ad306cdda8d5 - - - - -] Instance
  22f6ca0e-f964-4467-83a3-f2bf12bb05ae is not being actively managed by
  this compute host but has allocations referencing this compute host:
  {u'resources': {u'MEMORY_MB': 12288, u'VCPU': 2, u'DISK_GB': 10}}.
  Skipping heal of allocation because we do not know what to do.

  And so the SRIOV ports (PCI device) was free by clean_usage()
  eventhough the VM has the VF port already.

   743 def _update_available_resource(self, context, resources):
   744 +-- 45 lines: # initialize the compute node object, creating 
it--
   789 self.pci_tracker.clean_usage(instances, migrations, orphans)
   790 dev_pools_obj = self.pci_tracker.stats.to_device_pools_obj()

  After that, evacuated this VM to another compute host again, we got
  the error like below.


  Steps to reproduce
  ==
  1. create a VM on com1 with SRIOV VF ports.
  2. stop and disable nova-compute service on com1
  3. wait 60 sec (nova-compute reporting interval)
  4. evauate the VM to com2
  5. wait the VM is 

[Yahoo-eng-team] [Bug 1897100] [NEW] Improve port listing command

2020-09-24 Thread Rodolfo Alonso
Public bug reported:

As reported in https://bugzilla.redhat.com/show_bug.cgi?id=1772106,
between Queens and Train there was a performance degradation in the port
listing operation.

This could be caused because of the port DB object new relationships
(portuplinkstatuspropagation) or new extensions added.

In any case, improving the server performance could be an arduous task.
This bug proposes to improve the OSC query, adding a filter for those
parameters shown in the list command: ID, name, MAC address, fixed IPs
and status.

This bug and a possible solution are similar to
https://bugs.launchpad.net/neutron/+bug/1865223.

** Affects: neutron
 Importance: Low
 Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
 Status: New

** Changed in: neutron
 Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez)

** Changed in: neutron
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1897100

Title:
  Improve port listing command

Status in neutron:
  New

Bug description:
  As reported in https://bugzilla.redhat.com/show_bug.cgi?id=1772106,
  between Queens and Train there was a performance degradation in the
  port listing operation.

  This could be caused because of the port DB object new relationships
  (portuplinkstatuspropagation) or new extensions added.

  In any case, improving the server performance could be an arduous
  task. This bug proposes to improve the OSC query, adding a filter for
  those parameters shown in the list command: ID, name, MAC address,
  fixed IPs and status.

  This bug and a possible solution are similar to
  https://bugs.launchpad.net/neutron/+bug/1865223.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1897100/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1897099] [NEW] create_swap do not fallback to dd when fallocate fails

2020-09-24 Thread Evaggelos Balaskas
Public bug reported:

Name: cloud-init
Version : 20.2-1

Code in questioning:  cloudinit/config/cc_mounts.py

try:
create_swap(fname, size, "fallocate")
except util.ProcessExecutionError as e:
LOG.warning(errmsg, fname, size, "dd", e)
LOG.warning("Will attempt with dd.")
create_swap(fname, size, "dd")


as there is a kernel bug in latest's linux versions, fallocate creates swap 
images with holes.
The workaround is to move fallocate (make create_swap function to fail) so that 
cloud-init will fallback to dd.


Used bootcmd (or cloud-boothook) to rename (move) fallocate binary from my 
system but according the the logs, it didnt fallback to dd as it should. 
Probably the error was not ProcessExecutionError 


Logs:

/var/log/cloud-init-output.log:2020-09-24 09:13:16,470 - cc_mounts.py[WARNING]: 
Failed to create swapfile '/swapfile' of size 2048MB via fallocate: Unexpected
 error while running command.
/var/log/cloud-init-output.log:Command: ['fallocate', '-l', '2048M', 
'/swapfile']
/var/log/cloud-init-output.log:Reason: [Errno 2] No such file or directory: 
b'fallocate'
/var/log/cloud-init-output.log:2020-09-24 09:13:16,479 - cc_mounts.py[WARNING]: 
failed to setup swap: [Errno 2] No such file or directory: '/swapfile'
/var/log/cloud-init-output.log:chmod: cannot access '/usr/bin/fallocate': No 
such file or directory
/var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: 
Attempting to determine the real name of swap
/var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: changed 
default device swap => None
/var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: Ignoring 
nonexistent default named mount swap
/var/log/cloud-init.log:2020-09-24 09:13:16,461 - cc_mounts.py[DEBUG]: suggest 
2048.0 MB swap for 1983.953125 MB memory with '9030.296875 MB' disk given 
max=2048.0 MB [max=2048.0 MB]'
/var/log/cloud-init.log:2020-09-24 09:13:16,461 - cc_mounts.py[DEBUG]: Creating 
swapfile in '/swapfile' on fstype 'ext4' using 'fallocate'
/var/log/cloud-init.log:2020-09-24 09:13:16,461 - util.py[DEBUG]: Running 
command ['fallocate', '-l', '2048M', '/swapfile'] with allowed return codes [0] 
(she
ll=False, capture=True)
/var/log/cloud-init.log:2020-09-24 09:13:16,470 - cc_mounts.py[WARNING]: Failed 
to create swapfile '/swapfile' of size 2048MB via fallocate: Unexpected error 
while running command.
/var/log/cloud-init.log:Command: ['fallocate', '-l', '2048M', '/swapfile']
/var/log/cloud-init.log:Reason: [Errno 2] No such file or directory: 
b'fallocate'
/var/log/cloud-init.log:2020-09-24 09:13:16,479 - util.py[DEBUG]: Attempting to 
remove /swapfile
/var/log/cloud-init.log:2020-09-24 09:13:16,479 - util.py[DEBUG]: Setting up 
swap file took 0.019 seconds
/var/log/cloud-init.log:2020-09-24 09:13:16,479 - cc_mounts.py[WARNING]: failed 
to setup swap: [Errno 2] No such file or directory: '/swapfile'

** Affects: cloud-init
 Importance: Undecided
 Status: New


** Tags: fallocate swap

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1897099

Title:
  create_swap do not fallback to dd when fallocate fails

Status in cloud-init:
  New

Bug description:
  Name: cloud-init
  Version : 20.2-1

  Code in questioning:  cloudinit/config/cc_mounts.py

  try:
  create_swap(fname, size, "fallocate")
  except util.ProcessExecutionError as e:
  LOG.warning(errmsg, fname, size, "dd", e)
  LOG.warning("Will attempt with dd.")
  create_swap(fname, size, "dd")


  as there is a kernel bug in latest's linux versions, fallocate creates swap 
images with holes.
  The workaround is to move fallocate (make create_swap function to fail) so 
that cloud-init will fallback to dd.

  
  Used bootcmd (or cloud-boothook) to rename (move) fallocate binary from my 
system but according the the logs, it didnt fallback to dd as it should. 
Probably the error was not ProcessExecutionError 

  
  Logs:

  /var/log/cloud-init-output.log:2020-09-24 09:13:16,470 - 
cc_mounts.py[WARNING]: Failed to create swapfile '/swapfile' of size 2048MB via 
fallocate: Unexpected
   error while running command.
  /var/log/cloud-init-output.log:Command: ['fallocate', '-l', '2048M', 
'/swapfile']
  /var/log/cloud-init-output.log:Reason: [Errno 2] No such file or directory: 
b'fallocate'
  /var/log/cloud-init-output.log:2020-09-24 09:13:16,479 - 
cc_mounts.py[WARNING]: failed to setup swap: [Errno 2] No such file or 
directory: '/swapfile'
  /var/log/cloud-init-output.log:chmod: cannot access '/usr/bin/fallocate': No 
such file or directory
  /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: 
Attempting to determine the real name of swap
  /var/log/cloud-init.log:2020-09-24 09:13:16,460 - cc_mounts.py[DEBUG]: 
changed 

[Yahoo-eng-team] [Bug 1897095] [NEW] [OVN] ARP/MAC handling for routers connected to external network is scaling poorly

2020-09-24 Thread Krzysztof Klimonda
Public bug reported:

With current router configuration set by neutron, a number of logical
flows in lr_in_arp_resolve seems to have O(n^2) scaling where n is a
number of routers connected to the external network, for example this is
our test where we created 800 routers (I believe it was 800, and not 400
as stated in the linked discussion):

--8<--8<--8<--
# cat lflows.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | 
uniq -c |sort -n | tail -10
   3264 lr_in_learn_neighbor
   3386 ls_out_port_sec_l2
   4112 lr_in_admission
   4202 ls_in_port_sec_l2
   4898 lr_in_lookup_neighbor
   4900 lr_in_ip_routing
   9144 ls_in_l2_lkup
   9160 ls_in_arp_rsp
  22136 lr_in_ip_input
 671656 lr_in_arp_resolve
#
--8<--8<--8<--

I've opened a review where we set `always_learn_from_arp_request=false`
and `dynamic_neigh_routers=true` on all routers, which has a significant
impact on a number of logical flows:

--8<--8<--8<--
# cat lflows-new.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | 
uniq -c |sort -n | tail -10
   2170 ls_out_port_sec_l2
   2172 lr_in_learn_neighbor
   2666 lr_in_admission
   2690 ls_in_port_sec_l2
   3190 lr_in_ip_routing
   4276 lr_in_lookup_neighbor
   4873 lr_in_arp_resolve
   5864 ls_in_arp_rsp
   5873 ls_in_l2_lkup
  14343 lr_in_ip_input
# ovn-sbctl --timeout=120 lflow-list > lflows-new.txt
--8<--8<--8<--

There is however some performance penalty, which from my understanding affects 
east-west traffic between routers - I'm not quite sure how much of an effect it 
is, and it may be a good idea to make that change optional as mentioned in the 
mailing list discussion.  
 

See https://mail.openvswitch.org/pipermail/ovs-
discuss/2020-May/049994.html and http://lists.openstack.org/pipermail
/openstack-discuss/2020-September/017370.html for related discussions.

** Affects: neutron
 Importance: Undecided
 Assignee: Krzysztof Klimonda (kklimonda)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1897095

Title:
  [OVN] ARP/MAC handling for routers connected to external network is
  scaling poorly

Status in neutron:
  In Progress

Bug description:
  With current router configuration set by neutron, a number of logical
  flows in lr_in_arp_resolve seems to have O(n^2) scaling where n is a
  number of routers connected to the external network, for example this
  is our test where we created 800 routers (I believe it was 800, and
  not 400 as stated in the linked discussion):

  --8<--8<--8<--
  # cat lflows.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | 
uniq -c |sort -n | tail -10
 3264 lr_in_learn_neighbor
 3386 ls_out_port_sec_l2
 4112 lr_in_admission
 4202 ls_in_port_sec_l2
 4898 lr_in_lookup_neighbor
 4900 lr_in_ip_routing
 9144 ls_in_l2_lkup
 9160 ls_in_arp_rsp
22136 lr_in_ip_input
   671656 lr_in_arp_resolve
  #
  --8<--8<--8<--

  I've opened a review where we set
  `always_learn_from_arp_request=false` and `dynamic_neigh_routers=true`
  on all routers, which has a significant impact on a number of logical
  flows:

  --8<--8<--8<--
  # cat lflows-new.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort 
| uniq -c |sort -n | tail -10
 2170 ls_out_port_sec_l2
 2172 lr_in_learn_neighbor
 2666 lr_in_admission
 2690 ls_in_port_sec_l2
 3190 lr_in_ip_routing
 4276 lr_in_lookup_neighbor
 4873 lr_in_arp_resolve
 5864 ls_in_arp_rsp
 5873 ls_in_l2_lkup
14343 lr_in_ip_input
  # ovn-sbctl --timeout=120 lflow-list > lflows-new.txt
  --8<--8<--8<--

  There is however some performance penalty, which from my understanding 
affects east-west traffic between routers - I'm not quite sure how much of an 
effect it is, and it may be a good idea to make that change optional as 
mentioned in the mailing list discussion.  
   

  See https://mail.openvswitch.org/pipermail/ovs-
  discuss/2020-May/049994.html and http://lists.openstack.org/pipermail
  /openstack-discuss/2020-September/017370.html for related discussions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1897095/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1874273] Re: Horizon doesn't show all OS-EXT-SRV-ATTR attributes

2020-09-24 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/721992
Committed: 
https://git.openstack.org/cgit/openstack/horizon/commit/?id=d403b31d70e06d784bc644c525e89a7e1b0b549d
Submitter: Zuul
Branch:master

commit d403b31d70e06d784bc644c525e89a7e1b0b549d
Author: Ivan Kolodyazhny 
Date:   Wed Apr 22 17:41:13 2020 +0300

Show all os-extended-server-attributes

Patch I0cfe9090e8263f983fa5f42f42616a26407be47a adds hypervisor hostname
to the instance details view. This patch adds the rest of instance
attributes allowed by 'os_compute_api:os-extended-server-attributes'
policy.

Change-Id: Id39ee14e3054422a96248f8cdd0a5bf07c27f2fc
Closes-Bug: #1874273


** Changed in: horizon
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1874273

Title:
  Horizon doesn't show all OS-EXT-SRV-ATTR attributes

Status in OpenStack Dashboard (Horizon):
  Fix Released

Bug description:
  If nova policy os_compute_api:os-extended-server-attributes allows
  showing extended server attributes horizon should show them too.

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1874273/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1896945] [NEW] dnsmasq >= 2.81 not responding to DHCP requests with current q-dhcp configs

2020-09-24 Thread Lee Yarwood
Public bug reported:

* High level description:

I've been attempting to enable Fedora 32 support in devstack and
encountered the following issue where dnsmasq as configured by q-dhcp
isn't responding to DHCP requests from clients:

https://review.opendev.org/#/c/750292/

Looking at tcpdump and strace it appears that dnsmasq can see the
requests but doesn't reply suggesting a configuration issue either
caused by q-dhcp *or* a regression in dnsmasq itself:

$ openstack server reboot --hard test && sudo ip netns exec 
qdhcp-df64061e-0784-4bbe-909b-ae1c5f466981 tcpdump -i tapee679459-e1 -n port 67 
or port 68
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tapee679459-e1, link-type EN10MB (Ethernet), capture size 262144 
bytes
18:40:24.070796 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request 
from fa:16:3e:16:07:a0, length 300
18:41:24.118961 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request 
from fa:16:3e:16:07:a0, length 300
18:42:24.192716 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request 
from fa:16:3e:16:07:a0, length 300

$ openstack server reboot --hard test && sudo ip netns exec 
qdhcp-df64061e-0784-4bbe-909b-ae1c5f466981 strace -p 196856
strace: Process 196856 attached
restart_syscall(<... resuming interrupted read ...>) = 1
recvmsg(4, {msg_name={sa_family=AF_INET, sin_port=htons(68), 
sin_addr=inet_addr("0.0.0.0")}, msg_namelen=16, 
msg_iov=[{iov_base="\1\1\6\0\0041\326S\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\372\26>\26"...,
 iov_len=548}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, 
cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("tapee679459-e1"), 
ipi_spec_dst=inet_addr("10.0.0.2"), ipi_addr=inet_addr("255.255.255.255")}}], 
msg_controllen=32, msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 300
recvmsg(4, {msg_name={sa_family=AF_INET, sin_port=htons(68), 
sin_addr=inet_addr("0.0.0.0")}, msg_namelen=16, 
msg_iov=[{iov_base="\1\1\6\0\0041\326S\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\372\26>\26"...,
 iov_len=548}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, 
cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("tapee679459-e1"), 
ipi_spec_dst=inet_addr("10.0.0.2"), ipi_addr=inet_addr("255.255.255.255")}}], 
msg_controllen=32, msg_flags=0}, 0) = 300
ioctl(4, SIOCGIFNAME, {ifr_index=9, ifr_name="tapee679459-e1"}) = 0
ioctl(4, SIOCGIFFLAGS, {ifr_name="tapee679459-e1", 
ifr_flags=IFF_UP|IFF_BROADCAST|IFF_RUNNING|IFF_MULTICAST}) = 0
ioctl(4, SIOCGIFADDR, {ifr_name="tapee679459-e1", ifr_addr={sa_family=AF_INET, 
sin_port=htons(0), sin_addr=inet_addr("10.0.0.2")}}) = 0
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, 
{fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, 
events=POLLIN}, {fd=14, events=POLLIN}], 8, -1

The current configs are listed below:

http://paste.openstack.org/show/798334/

I was able to downgrade dnsmasq on f32 to 2.80 in order to workaround
this:

$ sudo dnf downgrade dnsmasq -y
[..]
$ rpm -qa | grep dnsmasq
dnsmasq-2.80-14.fc32.x86_64
$ sudo killall dnsmasq && sudo systemctl restart devstack@q-*
$ openstack server reboot --hard test && sudo ip netns exec 
qdhcp-df64061e-0784-4bbe-909b-ae1c5f466981 tcpdump -i tapee679459-e1 -n port 67 
or port 68
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tapee679459-e1, link-type EN10MB (Ethernet), capture size 262144 
bytes
12:06:57.028953 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request 
from fa:16:3e:16:07:a0, length 300
12:06:57.029994 IP 10.0.0.2.bootps > 10.0.0.49.bootpc: BOOTP/DHCP, Reply, 
length 328
12:06:57.042300 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request 
from fa:16:3e:16:07:a0, length 300
12:06:57.047014 IP 10.0.0.2.bootps > 10.0.0.49.bootpc: BOOTP/DHCP, Reply, 
length 344

* Pre-conditions:

F32 with dnsmasq >= 2.81 installed.

* Step-by-step reproduction steps:

Deploy F32 with dnsmasq >= 2.81 installed, attempt to spawn an instance
attached to a subnet with dhcp enabled.

* Expected output:

dnsmasq responds to DHCP request from instance.

* Actual output:

dnsmasq doesn't respond to DHCP request from instance.

* Version:
  ** OpenStack version (Specific stable branch, or git hash if from trunk);

  Neutron @ 0fdcc4b1b63dc90fbc9f46f5947f84626f8e5b41

  ** Linux distro, kernel. For a distro, it’s also worth knowing
specific versions of client and server;

  Fedora 32 with kernel 5.8.10-200.fc32.x86_64

  ** DevStack or other _deployment_ mechanism?

  Devstack @ https://review.opendev.org/#/c/750292/

* Environment: what types of services are you running (core services
like DB and AMQP broker, as well as Nova/hypervisor if it matters), and
which type of deployment (clustered servers)? Multi-node or single node,
etc.

  Single node devstack env.

* Perceived severity: is this a blocker for you?

  High, assuming other distros will 

[Yahoo-eng-team] [Bug 1882918] Re: Instances page crashes on repeated confirm/revert resize actions

2020-09-24 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/734814
Committed: 
https://git.openstack.org/cgit/openstack/horizon/commit/?id=a4a549a1814a9ea9f142fd8ec5928fe9cfebc269
Submitter: Zuul
Branch:master

commit a4a549a1814a9ea9f142fd8ec5928fe9cfebc269
Author: pedh 
Date:   Wed Jun 10 18:18:00 2020 +0800

Fix: Page crashes on instance confirm resize

Add error handling to instance confirm/revert resize methods.

Change-Id: I128049091f38e8db3c1524a5c4cb932f3e809714
Closes-Bug: #1882918


** Changed in: horizon
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1882918

Title:
  Instances page crashes on repeated confirm/revert resize actions

Status in OpenStack Dashboard (Horizon):
  Fix Released

Bug description:
  Environment:
  OpenStack Rocky/Ussuri
  compute node of libvirt/kvm on HP Proliant server with 24*cpu and 256GB mem

  
  Reproduction steps:
  1. Create an instance with flavor "m1.small";
  2. Resize the instance to flavor "m1.medium";
  3. In a while the instance enters "VERIFY_RESIZE" state, and the
 "Confirm Resize/Migration" button is enabled, we click the button and
 wait for the button to be enabled again, and then click the
 button repeatedly;
  4. The page crashes after the resize is done.

  
  Error log:
  http://ix.io/2oOd

  
  Bug analysis:
  The bug reproduces easily, if the instance stays in VERIFY_RESIZE state for
  long enough. After digging into the horizon source code, I found that
  "ConfirmResize.single" and "RevertResize.single" methods in
  "openstack_dashboard/dashboards/project/instance/tables.py" are lack of
  exception handling, and expose the exception to django, to cause the page 
crash.

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1882918/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1896933] [NEW] Exception when plugin creates a network without specifying the MTU

2020-09-24 Thread Thomas Bachman
Public bug reported:

This was found as a UT regression failure in the x/group-based-policy
project, but I think the same issue applies to the auto-allocated-
topology workflow (or any feature that creates a network at the plugin
or DB layer instead of at the REST layer). The exception seen is this:

  File 
"/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py",
 line 1053, in create_network
result, mech_context = self._create_network_db(context, network)
  File 
"/home/zuul/src/opendev.org/x/group-based-policy/gbpservice/neutron/plugins/ml2plus/plugin.py",
 line 333, in _create_network_db
context, network)
  File 
"/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py",
 line 1013, in _create_network_db
net_db.mtu = self._get_network_mtu(net_db)
  File 
"/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py",
 line 995, in _get_network_mtu
raise exc.InvalidInput(error_message=msg)
neutron_lib.exceptions.InvalidInput: Invalid input for operation: Requested MTU 
is too big, maximum is 1000.

The UT limits the network MTU using configuration file settings:
https://opendev.org/x/group-based-policy/src/branch/master/gbpservice/neutron/tests/unit/services/grouppolicy/test_aim_mapping_driver.py#L2951-L2952

The regression happens because a default MTU for the DB layer was
introduced in this patch: https://review.opendev.org/#/c/679399/

The default value used is the DEFAULT_NETWORK_MTU constant from neutron-
lib (1500). This is different than the default value installed by the
REST layer (0). When the network MTU is constrained using configuration
files, it gets to this code path:

https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L992-L1002

Since the DB layer has set the default to 1500 instead of 0, this
exception gets triggered, even though the caller at the plugin layer
didn't specify a value for the MTU.

One possible fix is to have the DB layer use a value of 0 for the
default instead of DEFAULT_NETWORK_MTU.

** Affects: neutron
 Importance: Undecided
 Assignee: Thomas Bachman (bachman)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1896933

Title:
  Exception when plugin creates a network without specifying the MTU

Status in neutron:
  In Progress

Bug description:
  This was found as a UT regression failure in the x/group-based-policy
  project, but I think the same issue applies to the auto-allocated-
  topology workflow (or any feature that creates a network at the plugin
  or DB layer instead of at the REST layer). The exception seen is this:

File 
"/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py",
 line 1053, in create_network
  result, mech_context = self._create_network_db(context, network)
File 
"/home/zuul/src/opendev.org/x/group-based-policy/gbpservice/neutron/plugins/ml2plus/plugin.py",
 line 333, in _create_network_db
  context, network)
File 
"/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py",
 line 1013, in _create_network_db
  net_db.mtu = self._get_network_mtu(net_db)
File 
"/home/zuul/src/opendev.org/x/group-based-policy/.tox/py36/lib/python3.6/site-packages/neutron/plugins/ml2/plugin.py",
 line 995, in _get_network_mtu
  raise exc.InvalidInput(error_message=msg)
  neutron_lib.exceptions.InvalidInput: Invalid input for operation: Requested 
MTU is too big, maximum is 1000.

  The UT limits the network MTU using configuration file settings:
  
https://opendev.org/x/group-based-policy/src/branch/master/gbpservice/neutron/tests/unit/services/grouppolicy/test_aim_mapping_driver.py#L2951-L2952

  The regression happens because a default MTU for the DB layer was
  introduced in this patch: https://review.opendev.org/#/c/679399/

  The default value used is the DEFAULT_NETWORK_MTU constant from
  neutron-lib (1500). This is different than the default value installed
  by the REST layer (0). When the network MTU is constrained using
  configuration files, it gets to this code path:

  
https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/plugin.py#L992-L1002

  Since the DB layer has set the default to 1500 instead of 0, this
  exception gets triggered, even though the caller at the plugin layer
  didn't specify a value for the MTU.

  One possible fix is to have the DB layer use a value of 0 for the
  default instead of DEFAULT_NETWORK_MTU.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1896933/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : 

[Yahoo-eng-team] [Bug 1896920] [NEW] Unnecessary error log when checking if a device is ready

2020-09-24 Thread Rodolfo Alonso
Public bug reported:

In method "ensure_device_is_ready" [1], if the device does not exist or
the MAC is still not assigned, the method returns False and also logs an
error. This error log is distracting; instead of this, an info message
could be logged.

The code using this method, that returns True or False depending on the
state of the interface, can decide to log a higher log message.

[1]https://github.com/openstack/neutron/blob/856cae4cf8e33c05b308d880df78b7be02ae90ad/neutron/agent/linux/ip_lib.py#L955

** Affects: neutron
 Importance: Wishlist
 Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
 Status: In Progress

** Changed in: neutron
   Importance: Undecided => Low

** Changed in: neutron
   Importance: Low => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1896920

Title:
  Unnecessary error log when checking if a device is ready

Status in neutron:
  In Progress

Bug description:
  In method "ensure_device_is_ready" [1], if the device does not exist
  or the MAC is still not assigned, the method returns False and also
  logs an error. This error log is distracting; instead of this, an info
  message could be logged.

  The code using this method, that returns True or False depending on
  the state of the interface, can decide to log a higher log message.

  
[1]https://github.com/openstack/neutron/blob/856cae4cf8e33c05b308d880df78b7be02ae90ad/neutron/agent/linux/ip_lib.py#L955

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1896920/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1896574] Re: how to deal with hypervisor name changing

2020-09-24 Thread norman shen
I think previous title is misleading. Actually hostname itself is still
A. what changes is fqdn name seen by hostname --fqdn.

** Changed in: nova
   Status: Invalid => New

** Summary changed:

- how to deal with hypervisor name changing
+ how to deal with hypervisor host fqdn name changing

** Description changed:

- nova fails to correctly account for resources after hypervisor name
- changes. For example, if previously the hypervisor name is A, and some
- later it switches to A.B, then all of the instances which belong to A
+ nova fails to correctly account for resources after hypervisor hosntame
+ fqdn changes. For example, if previously the hypervisor hostname fqdn is
+ A, and some later it to A.B, then all of the instances which belong to A
  will not be included in the resource computation for A.B although
  effectively they are the same thing.
  
+ But under such circumstances, compute service's is still A.
+ 
  Is there any way to deal with this situation? we are using openstack
  rocky.

** Description changed:

  nova fails to correctly account for resources after hypervisor hosntame
  fqdn changes. For example, if previously the hypervisor hostname fqdn is
  A, and some later it to A.B, then all of the instances which belong to A
  will not be included in the resource computation for A.B although
  effectively they are the same thing.
  
- But under such circumstances, compute service's is still A.
+ But under such circumstances, compute service's is listed as A.
  
  Is there any way to deal with this situation? we are using openstack
  rocky.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896574

Title:
  how to deal with hypervisor host fqdn name changing

Status in OpenStack Compute (nova):
  New

Bug description:
  nova fails to correctly account for resources after hypervisor
  hosntame fqdn changes. For example, if previously the hypervisor
  hostname fqdn is A, and some later it to A.B, then all of the
  instances which belong to A will not be included in the resource
  computation for A.B although effectively they are the same thing.

  But under such circumstances, compute service's is listed as A.

  Is there any way to deal with this situation? we are using openstack
  rocky.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1896574/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp