[Yahoo-eng-team] [Bug 2056537] [NEW] [ovn-octavia-provider] gateway chassis not filled on LogicalRouterPort event

2024-03-08 Thread Fernando Royo
Public bug reported:

The gateway neutron-ovn-invalid-chassis previously used for the CR-LRP
gateway_chassis has been removed in [1]. At this way, the logical router
port event received at creation is considered as a new port attached to
the router to a tenant network, adding the LB to that LS, which results
in failure during the functional tests.

In a real environment, this situation may not occur, except in the
scenario where the gateway_chassis for the LRP would arrive in a second
event rather than in the initial creation event.

[1] https://review.opendev.org/c/openstack/neutron/+/909305

** Affects: neutron
 Importance: Undecided
 Assignee: Fernando Royo (froyoredhat)
 Status: In Progress


** Tags: ovn-octavia-provider

** Changed in: neutron
 Assignee: (unassigned) => Fernando Royo (froyoredhat)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2056537

Title:
  [ovn-octavia-provider] gateway chassis not filled on LogicalRouterPort
  event

Status in neutron:
  In Progress

Bug description:
  The gateway neutron-ovn-invalid-chassis previously used for the CR-LRP
  gateway_chassis has been removed in [1]. At this way, the logical
  router port event received at creation is considered as a new port
  attached to the router to a tenant network, adding the LB to that LS,
  which results in failure during the functional tests.

  In a real environment, this situation may not occur, except in the
  scenario where the gateway_chassis for the LRP would arrive in a
  second event rather than in the initial creation event.

  [1] https://review.opendev.org/c/openstack/neutron/+/909305

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2056537/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2056544] [NEW] Attaching a pre-existing port with port security_disabled on a network with port_security enabled fails

2024-03-08 Thread Gaudenz Steinlin
Public bug reported:

Description
===

Attaching a pre-existing port with port security_disabled on a network
with port_security enabled which does not have any subnets fails. The
port_security setting on the network should not be relevant in this
case. It's only a default value for newly created port. For pre-existing
ports the port_security setting on the port should be considered
instead.

This fails because there is code to prohibit attaching to a network with
port_security enabled which does not have a subnet. Because then it's
not possible to attach security groups to the port. This is correct in
case a port is actually created by Nova and the port_security set on the
network is applied for the created port, but it's wrong for already
existing ports. The port_security setting on the port should be
considered instead.

Steps to reproduce
==

* Create an instance
* Create a network with port security enabled
* Create a port on this network with port security disabled
* Try to attach the port to the instance

Note: No subnet was created on the network.

Expected result
===

The port is attached to the instance.

Actual result
=

The port fails to attach to the instance with this message:

Network requires port_security_enabled and subnet associated in order to
apply security groups. (HTTP 400) (Request-ID:
req-3ce456bb-c016-4737-82f8-4b332b923ab6)

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2056544

Title:
  Attaching a pre-existing port with port security_disabled on a network
  with port_security enabled fails

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===

  Attaching a pre-existing port with port security_disabled on a network
  with port_security enabled which does not have any subnets fails. The
  port_security setting on the network should not be relevant in this
  case. It's only a default value for newly created port. For pre-
  existing ports the port_security setting on the port should be
  considered instead.

  This fails because there is code to prohibit attaching to a network
  with port_security enabled which does not have a subnet. Because then
  it's not possible to attach security groups to the port. This is
  correct in case a port is actually created by Nova and the
  port_security set on the network is applied for the created port, but
  it's wrong for already existing ports. The port_security setting on
  the port should be considered instead.

  Steps to reproduce
  ==

  * Create an instance
  * Create a network with port security enabled
  * Create a port on this network with port security disabled
  * Try to attach the port to the instance

  Note: No subnet was created on the network.

  Expected result
  ===

  The port is attached to the instance.

  Actual result
  =

  The port fails to attach to the instance with this message:

  Network requires port_security_enabled and subnet associated in order
  to apply security groups. (HTTP 400) (Request-ID:
  req-3ce456bb-c016-4737-82f8-4b332b923ab6)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2056544/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2056558] [NEW] ``OVNL3RouterPlugin._port_update`` can be called before the LRP is created in the OVN DB

2024-03-08 Thread Rodolfo Alonso
Public bug reported:

``OVNL3RouterPlugin._port_update`` [1] is called AFTER_UPDATE the router
port is created (for example, when a subnet is attached to a router).
This event is guaranteed to be called after the Neutron DB has the
resource (port) in the database. However, as the code highlights in the
comment, this event can be called before the OVN NB database has the LRP
resource created. The called method, ``update_router_port`` -->
``_update_router_port``, guarantees that the LRP update is executed only
when the LRP exists but the LRP read [2] does not have this
consideration.

This event should be replaced by an OVN DB event, checking the same
conditions as in [1] and guaranteeing that the LRP resource is already
created in the DB.

Example of this failure:
https://zuul.opendev.org/t/openstack/build/3f7935d7ed53473898bbf213e85dfb61/log/controller/logs/dsvm-
functional-
logs/ovn_octavia_provider.tests.functional.test_driver.TestOvnOctaviaProviderDriver.test_create_lb_custom_network/testrun.txt

[1]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/services/ovn_l3/plugin.py#L372-L381
[2]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1809-L1811

** Affects: neutron
 Importance: Medium
 Status: New

** Description changed:

  ``OVNL3RouterPlugin._port_update`` [1] is called AFTER_UPDATE the router
  port is created (for example, when a subnet is attached to a router).
  This event is guaranteed to be called after the Neutron DB has the
  resource (port) in the database. However, as the code highlights in the
  comment, this event can be called before the OVN NB database has the LRP
  resource created. The called method, ``update_router_port`` -->
  ``_update_router_port``, guarantees that the LRP update is executed only
  when the LRP exists but the LRP read [2] does not have this
  consideration.
  
  This event should be replaced by an OVN DB event, checking the same
  conditions as in [1] and guaranteeing that the LRP resource is already
  created in the DB.
  
+ Example of this failure:
+ 
https://zuul.opendev.org/t/openstack/build/3f7935d7ed53473898bbf213e85dfb61/log/controller/logs/dsvm-
+ functional-
+ 
logs/ovn_octavia_provider.tests.functional.test_driver.TestOvnOctaviaProviderDriver.test_create_lb_custom_network/testrun.txt
+ 
  
[1]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/services/ovn_l3/plugin.py#L372-L381
  
[2]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1809-L1811

** Changed in: neutron
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2056558

Title:
  ``OVNL3RouterPlugin._port_update`` can be called before the LRP is
  created in the OVN DB

Status in neutron:
  New

Bug description:
  ``OVNL3RouterPlugin._port_update`` [1] is called AFTER_UPDATE the
  router port is created (for example, when a subnet is attached to a
  router). This event is guaranteed to be called after the Neutron DB
  has the resource (port) in the database. However, as the code
  highlights in the comment, this event can be called before the OVN NB
  database has the LRP resource created. The called method,
  ``update_router_port`` --> ``_update_router_port``, guarantees that
  the LRP update is executed only when the LRP exists but the LRP read
  [2] does not have this consideration.

  This event should be replaced by an OVN DB event, checking the same
  conditions as in [1] and guaranteeing that the LRP resource is already
  created in the DB.

  Example of this failure:
  
https://zuul.opendev.org/t/openstack/build/3f7935d7ed53473898bbf213e85dfb61/log/controller/logs/dsvm-
  functional-
  
logs/ovn_octavia_provider.tests.functional.test_driver.TestOvnOctaviaProviderDriver.test_create_lb_custom_network/testrun.txt

  
[1]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/services/ovn_l3/plugin.py#L372-L381
  
[2]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1809-L1811

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2056558/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2052916] Re: HTTP get on s3tokens and ec2tokens endpoint gives 500 internal error

2024-03-08 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/keystone/+/908760
Committed: 
https://opendev.org/openstack/keystone/commit/6096457d7400c280f9ee07a9c5b9760e74ecee4b
Submitter: "Zuul (22348)"
Branch:master

commit 6096457d7400c280f9ee07a9c5b9760e74ecee4b
Author: Tobias Urdin 
Date:   Mon Feb 12 08:36:53 2024 +

Dont enforce when HTTP GET on s3tokens and ec2tokens

When calling the s3tokens or ec2tokens API with a
HTTP GET we should get a 405 Method Not Allowed but
we get a 500 Internal Server Error because we enforce
that method.

Closes-Bug: #2052916
Change-Id: I5f60d10dc25551175cc73ca8f3f28b0b95ec9f99
Signed-off-by: Tobias Urdin 


** Changed in: keystone
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/2052916

Title:
  HTTP get on s3tokens and ec2tokens endpoint gives 500 internal error

Status in OpenStack Identity (keystone):
  Fix Released

Bug description:
  When doing a HTTP GET against s3tokens and ec2tokens endpoint we
  should get a 405 method not allowed but because the get method is
  getting enforced we get a 500 internal server error instead.

  AssertionError: PROGRAMMING ERROR: enforcement
  (`keystone.common.rbac_enforcer.enforcer.RBACEnforcer.enforce_call()`)
  has not been called; API is unenforced.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/2052916/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1764738] Re: routed provider networks limit to one host

2024-03-08 Thread Brian Haley
>From all the changes that have merged this seems to be complete, will
close.

** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1764738

Title:
  routed provider networks limit to one host

Status in neutron:
  Fix Released

Bug description:
  There seems to be limitation for a compute node to only have interface
  on one segment in a multisegment network. This feels wrong and limits
  the compute resources since they can only be part of one segment.

  The purpose of multi segment networks is to group multiple segments
  under one network name. i.e. operators should be able to expand the IP
  pool without having to create multiple network for it like internet1,
  internet2, etc.

  The way it should work is that a compute node can belong to one or
  more segments. It should be up to the operator to decide how they want
  to segment the compute resources or not. It should not be enforced by
  the simple need to add IP ranges to a network.

  way to reproduce.
  1. configure compute nodes to have bridges configured on 2 segments
  2. create a network with 2 segments.
  3. create the segments
  2018-04-17 15:17:59.545 25 ERROR oslo_messaging.rpc.server
  2018-04-17 15:18:18.836 25 ERROR oslo_messaging.rpc.server 
[req-4fdf6ee1-2be3-49c5-b3cb-62a2194465ab - - - - -] Exception during message 
handling: HostConnectedToMultipleSegments: Host eselde03u02s04 is connected to 
multiple segments on routed provider network 
'5c1f4dd4-baff-4c59-ba56-bd9cc2c59fa4'.  It should be connected to one.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1764738/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1666779] Re: Expose neutron API via a WSGI script

2024-03-08 Thread Brian Haley
Seems this fix is released, will close.

** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1666779

Title:
  Expose neutron API via a WSGI script

Status in neutron:
  Fix Released

Bug description:
  As per Pike goal [1], we should expose neutron API via a WSGI script,
  and make devstack installation use a web server for default
  deployment. This bug is a RFE/tracker for the feature.

  [1] https://governance.openstack.org/tc/goals/pike/deploy-api-in-
  wsgi.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1666779/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1833674] Re: [RFE] Improve profiling of port binding and vif plugging

2024-03-08 Thread Brian Haley
This seems to be complete, will close bug. Please re-open if I'm wrong.

** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1833674

Title:
  [RFE] Improve profiling of port binding and vif plugging

Status in neutron:
  Fix Released

Bug description:
  As discussed on the 2019-May PTG in Denver we want to measure then
  improve the performance of Neutron's most important operation that is
  port binding.

  As we're working with OSProfiler reports we are realizing the report
  is incomplete. We could turn on tracing in other components and
  subcomponents by further propagating trace information.

  We heavily build on some previous work:

  * https://bugs.launchpad.net/neutron/+bug/1335640 [RFE] Neutron support for 
OSprofiler
  * https://review.opendev.org/615350 Integrate rally with osprofiler

  A few patches were already merged before opening this RFE:

  * https://review.opendev.org/662804 Run nova's VM boot rally scenario in the 
neutron gate
  * https://review.opendev.org/665614 Allow VM booting rally scenarios to time 
out

  We already see the need for a few changes:

  * New rally scenario to measure port binding
  * Profiling coverage for vif plugging

  This work is also driven by the discoveries made while interpreting
  profiler reports so I expect further changes here and there.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1833674/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1815827] Re: [RFE] neutron-lib: rehome neutron.object.base along with rbac db/objects

2024-03-08 Thread Brian Haley
I am going to close this as it's been a number of years and the original
patch was abandoned. If someone wants to pick it up please re-open.

** Changed in: neutron
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1815827

Title:
  [RFE] neutron-lib: rehome neutron.object.base along with rbac
  db/objects

Status in neutron:
  Won't Fix

Bug description:
  This isn't a request for a new feature per say, but rather a
  placeholder for the neutron drivers team to take a look at [1].

  Specifically I'm hoping for drivers team agreement that the 
modules/functionality being rehomed in [1] makes sense; no actual (deep) code 
review of [1] is necessary at this point.
   
  Assuming we can agree that the logic in [1] makes sense to rehome, I can 
proceed by chunking it up into smaller patches that will make the 
rehome/consume process easier.

  This work is part of [2] that's described in [3][4]. However as
  commented in [1], it's also necessary to rehome the rbac db/objects
  modules and their dependencies that weren't discussed previously.

  
  [1] https://review.openstack.org/#/c/621000
  [2] https://blueprints.launchpad.net/neutron/+spec/neutron-lib-decouple-db
  [3] 
https://specs.openstack.org/openstack/neutron-specs/specs/rocky/neutronlib-decouple-db-apiutils.html
  [4] 
https://specs.openstack.org/openstack/neutron-specs/specs/rocky/neutronlib-decouple-models.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1815827/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1694165] Re: Improve Neutron documentation for simpler deployments

2024-03-08 Thread Brian Haley
The documents have been updated many times over the past 6+ years, I'm
going to close this as they are much better now. If there is something
specific please open a new bug.

** Changed in: neutron
   Status: Triaged => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1694165

Title:
  Improve Neutron documentation for simpler deployments

Status in neutron:
  Won't Fix

Bug description:
  During Boston Summit session, an issue was raised that Neutron
  documentation for simpler deployments should be improved/simplified.

  Couple of observations were noted:

  1) For a non-neutron savvy users, it is not very intuitive to 
specify/configure networking requirements. 
  2) Basic default configuration (as documented) is very OVS centric. It should 
discuss other non-OVS specific deployments as well. 

  Here is the etherpad with the details of the discussion -
  https://etherpad.openstack.org/p/pike-neutron-making-it-easy

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1694165/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1797663] Re: refactor def _get_dvr_sync_data from neutron/db/l3_dvr_db.py

2024-03-08 Thread Brian Haley
As this has never been worked on am going to close. If anyone wants to
pick it up please re-open.

** Changed in: neutron
   Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1797663

Title:
  refactor def _get_dvr_sync_data from neutron/db/l3_dvr_db.py

Status in neutron:
  Won't Fix

Bug description:
  The function def _get_dvr_sync_data in neutron/db/l3_dvr_db.py is
  fetching and processing routers data and since its called upon for
  each dvr ha router type on update, its becomes very hard to pin point
  issues in such a massive method, so I propose breaking it into two
  methods.

  def _get_dvr_sync_data and _process_dvr_sync_data. will make debugging
  in future easy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1797663/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1786226] Re: Use sqlalchemy baked query

2024-03-08 Thread Brian Haley
>From comment in the change that was linked above:

"BakedQuery is a legacy extension that no longer does too much beyond
what SQLAlchemy 1.4 does in most cases automatically. new development w/
BakedQuery is a non-starter, this is a legacy module we would eventually
remove."

For that reason I'm going to close this bug.

** Changed in: neutron
   Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1786226

Title:
  Use sqlalchemy baked query

Status in neutron:
  Won't Fix

Bug description:
  I am running rally scenario test create_and_list_ports on a 3
  controller setup(each controller have 8 CPUs i.e 4 cores*2 HTs) with
  (function call trace enabled on neutron server processes) a
  concurrency of 8 for 400 iterations.

  Average time taken for create port is 7.207 seconds(when 400 ports are 
created) and the function call trace  for this run is at 
http://paste.openstack.org/show/727718/ and rally results are 
  
+---+
  |   Response Times (sec)  
  |
  
++---+--+--+--+---+---+-+---+
  | Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile 
(sec) | Max (sec) | Avg (sec) | Success | Count |
  
++---+--+--+--+---+---+-+---+
  | neutron.create_network | 2.085 | 2.491| 3.01 | 3.29 
| 7.558 | 2.611 | 100.0%  | 400   |
  | neutron.create_port| 5.69  | 6.878| 7.755| 9.394
| 17.0  | 7.207 | 100.0%  | 400   |
  | neutron.list_ports | 0.72  | 5.552| 9.123| 9.599
| 11.165| 5.559 | 100.0%  | 400   |
  | total  | 10.085| 15.263   | 18.789   | 19.734   
| 28.712| 15.377| 100.0%  | 400   |
  |  -> duration   | 10.085| 15.263   | 18.789   | 19.734   
| 28.712| 15.377| 100.0%  | 400   |
  |  -> idle_duration  | 0.0   | 0.0  | 0.0  | 0.0  
| 0.0   | 0.0   | 100.0%  | 400   |
  
++---+--+--+--+---+---+-+---+


  Michael Bayer (zzzeek) has analysed this callgraph and had some
  suggestions. One suggestion is to use baked query i.e
  https://review.openstack.org/#/c/430973/2

  This is his analysis - "But looking at the profile I see here, it is
  clear that the vast majority of time is spent doing lots and lots of
  small queries, and all of the mechanics involved with turning them
  into SQL strings and invoking them.   SQLAlchemy has a very effective
  optimization for this but it must be coded into Neutron.

  Here is the total time spent for Query to convert its state into SQL:

  148029/356073   15.2320.000 4583.8200.013
  /usr/lib64/python2.7/site-
  packages/sqlalchemy/orm/query.py:3372(Query._compile_context)

  that's 4583 seconds spent in Query compilation, which if Neutron were
  modified  to use baked queries, would be vastly reduced.  I
  demonstrated the beginning of this work in 2017 here:
  https://review.openstack.org/#/c/430973/1  , which illustrates how to
  first start to create a base query method in neutron that other
  functions can begin to make use of.  As more queries start using the
  baked form, this 4500 seconds number will begin to drop."

  
  I have restored his patch https://review.openstack.org/#/c/430973/2 , with 
this the average time taken to create port is 5.196 seconds (when 400 ports are 
created), and the function call trace  for this run is at 
http://paste.openstack.org/show/727719/ also total time spent on Query 
compilation (Query._compile_context) is only 1675 seconds.

  83696/1690627.3080.000 1675.1400.010 
/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py:3372(Query._compile_context)
   
  Rally results for this run are

  
+---+
  |   Response Times (sec)  
  |
  
++---+--+--+--+---+---+-+---+
  | Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile 
(sec) | Max (sec) | Avg (sec) | Success | Count |
  
++---+--+--+--+---+---+-+---+
  | 

[Yahoo-eng-team] [Bug 2056613] [NEW] libvirt CPU power management does not support live migration

2024-03-08 Thread Artom Lifshitz
Public bug reported:

Description
===
libvirt CPU power management does not support live migration

Steps to reproduce
==
1. Turn on libvirt CPU power management
2. Boot an instance with hw:cpu_policy=dedicated
3. Live migrate the instance

Expected result
===
Live migration succeeds.

Actual result
=
Live migration fails with the following libvirt error in the source 
nova-compute logs:

[instance: afdd5e62-2a97-4b58-a7e7-bb92152f4165] Migration operation thread 
notification {{(pid=103809) thread_finished 
/opt/stack/nova/nova/virt/libvirt/driver.py:10668}}
Feb 21 19:21:15.045216 np0036828692 nova-compute[103809]: Traceback (most 
recent call last):
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/hub.py", line 
471, in fire_timers
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: timer()
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/timer.py", 
line 59, in __call__
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: cb(*args, **kw)
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/event.py", line 
173, in _do_send
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: 
waiter.switch(result)
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/greenthread.py", 
line 264, in main
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: result = 
function(*args, **kwargs)
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/nova/nova/utils.py", line 664, in context_wrapper
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: return 
func(*args, **kwargs)
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 10322, in 
_live_migration_operation
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: with 
excutils.save_and_reraise_exception():
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/oslo_utils/excutils.py", 
line 227, in __exit__
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: 
self.force_reraise()
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/oslo_utils/excutils.py", 
line 200, in force_reraise
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: raise self.value
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 10311, in 
_live_migration_operation
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: 
guest.migrate(self._live_migration_uri(dest),
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/nova/nova/virt/libvirt/guest.py", line 648, in migrate
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: 
self._domain.migrateToURI3(
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 
186, in doit
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: result = 
proxy_call(self._autowrap, f, *args, **kwargs)
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 
144, in proxy_call
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: rv = execute(f, 
*args, **kwargs)
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 
125, in execute
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: raise 
e.with_traceback(tb)
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 82, 
in tworker
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: rv = meth(*args, 
**kwargs)
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]:   File 
"/usr/lib/python3/dist-packages/libvirt.py", line 2126, in migrateToURI3
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: raise 
libvirtError('virDomainMigrateToURI3() failed')
Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: libvirt.libvirtError: 
cannot set CPU affinity on process 48279: Invalid argument

Environment
===
This was originally noticed in a whitebox CI job [1] on devstack master.

Additional info
===
Regardless of whether NUMA live migration has changed the underlying CPU 
pinnings, it's necessary to make sure the cores are powered up on the 
destination, otherwise libvirt attempts to pin the instance to an offline core. 
Nova doesn't handle that. With some refactoring to the 

[Yahoo-eng-team] [Bug 2056612] [NEW] libvirt CPU power management does not handle `isolate` emulator thread policy

2024-03-08 Thread Artom Lifshitz
Public bug reported:

Description
===
libvirt CPU power management does not handle `isolate` emulator thread policy.

Steps to reproduce
==
1. Turn on libvirt CPU power management
2. Boot an instance with hw:cpu_policy=dedicated and 
hw:emulator_threads_policy=isolate

Expected result
===
After the execution of the steps above, what should have
happened if the issue wasn't present?

Actual result
=
The instance doesn't start, with the following libvirt error in the 
nova-compute log:

Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: : 
libvirt.libvirtError: cannot set CPU affinity on process 47343: Invalid argument
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest Traceback (most recent call last):
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest   File "/opt/stack/nova/nova/virt/libvirt/guest.py", 
line 165, in launch
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest return self._domain.createWithFlags(flags)
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 
186, in doit
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest result = proxy_call(self._autowrap, f, *args, 
**kwargs)
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 
144, in proxy_call
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest rv = execute(f, *args, **kwargs)
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 
125, in execute
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest raise e.with_traceback(tb)
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest   File 
"/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 82, 
in tworker
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest rv = meth(*args, **kwargs)
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest   File "/usr/lib/python3/dist-packages/libvirt.py", 
line 1385, in createWithFlags
Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest raise libvirtError('virDomainCreateWithFlags() 
failed')
Feb 21 19:15:31.316773 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest libvirt.libvirtError: cannot set CPU affinity on 
process 47343: Invalid argument
Feb 21 19:15:31.316773 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest
Feb 21 19:15:31.316773 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.driver [None req-ec45061f-e9a4-4b02-9354-0cb390bd28cf 
tempest-EmulatorThreadTest-1184416592 
tempest-EmulatorThreadTest-1184416592-project-member] [instance: 
f697a24e-6599-4ec0-9e3b-87eba1a81a0b] Failed to start libvirt guest: 
libvirt.libvirtError: cannot set CPU affinity on process 47343: Invalid argument

Environment
===
This was originally noticed in a whitebox CI job [1] on devstack master.

Additional info
===
When powering up an instance's CPU, Nova doesn't take into account that with 
the `isolate` emulator thread policy, there's an extra CPU being consumed by 
the emulator thread. In a real deployment, this results in libvirt trying to 
pin the instance to an offline CPU. In functional tests, it's relatively easy 
to observe that CPU not being powered on.

[1]
https://zuul.opendev.org/t/openstack/build/532b30767df54147a01508e7616930f5/logs

** Affects: nova
 Importance: Undecided
 Status: In Progress

** Description changed:

  Description
  ===
  libvirt CPU power management does not handle `isolate` emulator thread policy.
  
  Steps to reproduce
  ==
- 1. Boot an instance with hw:cpu_policy=dedicated and 
hw:emulator_threads_policy=isolate
- 
+ 1. Turn on libvirt CPU power management
+ 2. Boot an instance with hw:cpu_policy=dedicated and 
hw:emulator_threads_policy=isolate
  
  Expected result
  ===
  After the execution of the steps above, what should have
  happened if the issue wasn't present?
  
  Actual result
  =
  The instance doesn't start, with the following libvirt error in the 
nova-compute log:
  
  Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: : 
libvirt.libvirtError: cannot set CPU affinity on process 47343: Invalid argument
  Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR 
nova.virt.libvirt.guest Traceback (most recent call last):
  Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR