[Yahoo-eng-team] [Bug 2056537] [NEW] [ovn-octavia-provider] gateway chassis not filled on LogicalRouterPort event
Public bug reported: The gateway neutron-ovn-invalid-chassis previously used for the CR-LRP gateway_chassis has been removed in [1]. At this way, the logical router port event received at creation is considered as a new port attached to the router to a tenant network, adding the LB to that LS, which results in failure during the functional tests. In a real environment, this situation may not occur, except in the scenario where the gateway_chassis for the LRP would arrive in a second event rather than in the initial creation event. [1] https://review.opendev.org/c/openstack/neutron/+/909305 ** Affects: neutron Importance: Undecided Assignee: Fernando Royo (froyoredhat) Status: In Progress ** Tags: ovn-octavia-provider ** Changed in: neutron Assignee: (unassigned) => Fernando Royo (froyoredhat) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2056537 Title: [ovn-octavia-provider] gateway chassis not filled on LogicalRouterPort event Status in neutron: In Progress Bug description: The gateway neutron-ovn-invalid-chassis previously used for the CR-LRP gateway_chassis has been removed in [1]. At this way, the logical router port event received at creation is considered as a new port attached to the router to a tenant network, adding the LB to that LS, which results in failure during the functional tests. In a real environment, this situation may not occur, except in the scenario where the gateway_chassis for the LRP would arrive in a second event rather than in the initial creation event. [1] https://review.opendev.org/c/openstack/neutron/+/909305 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2056537/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2056544] [NEW] Attaching a pre-existing port with port security_disabled on a network with port_security enabled fails
Public bug reported: Description === Attaching a pre-existing port with port security_disabled on a network with port_security enabled which does not have any subnets fails. The port_security setting on the network should not be relevant in this case. It's only a default value for newly created port. For pre-existing ports the port_security setting on the port should be considered instead. This fails because there is code to prohibit attaching to a network with port_security enabled which does not have a subnet. Because then it's not possible to attach security groups to the port. This is correct in case a port is actually created by Nova and the port_security set on the network is applied for the created port, but it's wrong for already existing ports. The port_security setting on the port should be considered instead. Steps to reproduce == * Create an instance * Create a network with port security enabled * Create a port on this network with port security disabled * Try to attach the port to the instance Note: No subnet was created on the network. Expected result === The port is attached to the instance. Actual result = The port fails to attach to the instance with this message: Network requires port_security_enabled and subnet associated in order to apply security groups. (HTTP 400) (Request-ID: req-3ce456bb-c016-4737-82f8-4b332b923ab6) ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2056544 Title: Attaching a pre-existing port with port security_disabled on a network with port_security enabled fails Status in OpenStack Compute (nova): New Bug description: Description === Attaching a pre-existing port with port security_disabled on a network with port_security enabled which does not have any subnets fails. The port_security setting on the network should not be relevant in this case. It's only a default value for newly created port. For pre- existing ports the port_security setting on the port should be considered instead. This fails because there is code to prohibit attaching to a network with port_security enabled which does not have a subnet. Because then it's not possible to attach security groups to the port. This is correct in case a port is actually created by Nova and the port_security set on the network is applied for the created port, but it's wrong for already existing ports. The port_security setting on the port should be considered instead. Steps to reproduce == * Create an instance * Create a network with port security enabled * Create a port on this network with port security disabled * Try to attach the port to the instance Note: No subnet was created on the network. Expected result === The port is attached to the instance. Actual result = The port fails to attach to the instance with this message: Network requires port_security_enabled and subnet associated in order to apply security groups. (HTTP 400) (Request-ID: req-3ce456bb-c016-4737-82f8-4b332b923ab6) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2056544/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2056558] [NEW] ``OVNL3RouterPlugin._port_update`` can be called before the LRP is created in the OVN DB
Public bug reported: ``OVNL3RouterPlugin._port_update`` [1] is called AFTER_UPDATE the router port is created (for example, when a subnet is attached to a router). This event is guaranteed to be called after the Neutron DB has the resource (port) in the database. However, as the code highlights in the comment, this event can be called before the OVN NB database has the LRP resource created. The called method, ``update_router_port`` --> ``_update_router_port``, guarantees that the LRP update is executed only when the LRP exists but the LRP read [2] does not have this consideration. This event should be replaced by an OVN DB event, checking the same conditions as in [1] and guaranteeing that the LRP resource is already created in the DB. Example of this failure: https://zuul.opendev.org/t/openstack/build/3f7935d7ed53473898bbf213e85dfb61/log/controller/logs/dsvm- functional- logs/ovn_octavia_provider.tests.functional.test_driver.TestOvnOctaviaProviderDriver.test_create_lb_custom_network/testrun.txt [1]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/services/ovn_l3/plugin.py#L372-L381 [2]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1809-L1811 ** Affects: neutron Importance: Medium Status: New ** Description changed: ``OVNL3RouterPlugin._port_update`` [1] is called AFTER_UPDATE the router port is created (for example, when a subnet is attached to a router). This event is guaranteed to be called after the Neutron DB has the resource (port) in the database. However, as the code highlights in the comment, this event can be called before the OVN NB database has the LRP resource created. The called method, ``update_router_port`` --> ``_update_router_port``, guarantees that the LRP update is executed only when the LRP exists but the LRP read [2] does not have this consideration. This event should be replaced by an OVN DB event, checking the same conditions as in [1] and guaranteeing that the LRP resource is already created in the DB. + Example of this failure: + https://zuul.opendev.org/t/openstack/build/3f7935d7ed53473898bbf213e85dfb61/log/controller/logs/dsvm- + functional- + logs/ovn_octavia_provider.tests.functional.test_driver.TestOvnOctaviaProviderDriver.test_create_lb_custom_network/testrun.txt + [1]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/services/ovn_l3/plugin.py#L372-L381 [2]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1809-L1811 ** Changed in: neutron Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2056558 Title: ``OVNL3RouterPlugin._port_update`` can be called before the LRP is created in the OVN DB Status in neutron: New Bug description: ``OVNL3RouterPlugin._port_update`` [1] is called AFTER_UPDATE the router port is created (for example, when a subnet is attached to a router). This event is guaranteed to be called after the Neutron DB has the resource (port) in the database. However, as the code highlights in the comment, this event can be called before the OVN NB database has the LRP resource created. The called method, ``update_router_port`` --> ``_update_router_port``, guarantees that the LRP update is executed only when the LRP exists but the LRP read [2] does not have this consideration. This event should be replaced by an OVN DB event, checking the same conditions as in [1] and guaranteeing that the LRP resource is already created in the DB. Example of this failure: https://zuul.opendev.org/t/openstack/build/3f7935d7ed53473898bbf213e85dfb61/log/controller/logs/dsvm- functional- logs/ovn_octavia_provider.tests.functional.test_driver.TestOvnOctaviaProviderDriver.test_create_lb_custom_network/testrun.txt [1]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/services/ovn_l3/plugin.py#L372-L381 [2]https://opendev.org/openstack/neutron/src/commit/e8468a6dd647fd62eac429417c7f382e8859b574/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L1809-L1811 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2056558/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2052916] Re: HTTP get on s3tokens and ec2tokens endpoint gives 500 internal error
Reviewed: https://review.opendev.org/c/openstack/keystone/+/908760 Committed: https://opendev.org/openstack/keystone/commit/6096457d7400c280f9ee07a9c5b9760e74ecee4b Submitter: "Zuul (22348)" Branch:master commit 6096457d7400c280f9ee07a9c5b9760e74ecee4b Author: Tobias Urdin Date: Mon Feb 12 08:36:53 2024 + Dont enforce when HTTP GET on s3tokens and ec2tokens When calling the s3tokens or ec2tokens API with a HTTP GET we should get a 405 Method Not Allowed but we get a 500 Internal Server Error because we enforce that method. Closes-Bug: #2052916 Change-Id: I5f60d10dc25551175cc73ca8f3f28b0b95ec9f99 Signed-off-by: Tobias Urdin ** Changed in: keystone Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/2052916 Title: HTTP get on s3tokens and ec2tokens endpoint gives 500 internal error Status in OpenStack Identity (keystone): Fix Released Bug description: When doing a HTTP GET against s3tokens and ec2tokens endpoint we should get a 405 method not allowed but because the get method is getting enforced we get a 500 internal server error instead. AssertionError: PROGRAMMING ERROR: enforcement (`keystone.common.rbac_enforcer.enforcer.RBACEnforcer.enforce_call()`) has not been called; API is unenforced. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/2052916/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1764738] Re: routed provider networks limit to one host
>From all the changes that have merged this seems to be complete, will close. ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1764738 Title: routed provider networks limit to one host Status in neutron: Fix Released Bug description: There seems to be limitation for a compute node to only have interface on one segment in a multisegment network. This feels wrong and limits the compute resources since they can only be part of one segment. The purpose of multi segment networks is to group multiple segments under one network name. i.e. operators should be able to expand the IP pool without having to create multiple network for it like internet1, internet2, etc. The way it should work is that a compute node can belong to one or more segments. It should be up to the operator to decide how they want to segment the compute resources or not. It should not be enforced by the simple need to add IP ranges to a network. way to reproduce. 1. configure compute nodes to have bridges configured on 2 segments 2. create a network with 2 segments. 3. create the segments 2018-04-17 15:17:59.545 25 ERROR oslo_messaging.rpc.server 2018-04-17 15:18:18.836 25 ERROR oslo_messaging.rpc.server [req-4fdf6ee1-2be3-49c5-b3cb-62a2194465ab - - - - -] Exception during message handling: HostConnectedToMultipleSegments: Host eselde03u02s04 is connected to multiple segments on routed provider network '5c1f4dd4-baff-4c59-ba56-bd9cc2c59fa4'. It should be connected to one. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1764738/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1666779] Re: Expose neutron API via a WSGI script
Seems this fix is released, will close. ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1666779 Title: Expose neutron API via a WSGI script Status in neutron: Fix Released Bug description: As per Pike goal [1], we should expose neutron API via a WSGI script, and make devstack installation use a web server for default deployment. This bug is a RFE/tracker for the feature. [1] https://governance.openstack.org/tc/goals/pike/deploy-api-in- wsgi.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1666779/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1833674] Re: [RFE] Improve profiling of port binding and vif plugging
This seems to be complete, will close bug. Please re-open if I'm wrong. ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1833674 Title: [RFE] Improve profiling of port binding and vif plugging Status in neutron: Fix Released Bug description: As discussed on the 2019-May PTG in Denver we want to measure then improve the performance of Neutron's most important operation that is port binding. As we're working with OSProfiler reports we are realizing the report is incomplete. We could turn on tracing in other components and subcomponents by further propagating trace information. We heavily build on some previous work: * https://bugs.launchpad.net/neutron/+bug/1335640 [RFE] Neutron support for OSprofiler * https://review.opendev.org/615350 Integrate rally with osprofiler A few patches were already merged before opening this RFE: * https://review.opendev.org/662804 Run nova's VM boot rally scenario in the neutron gate * https://review.opendev.org/665614 Allow VM booting rally scenarios to time out We already see the need for a few changes: * New rally scenario to measure port binding * Profiling coverage for vif plugging This work is also driven by the discoveries made while interpreting profiler reports so I expect further changes here and there. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1833674/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1815827] Re: [RFE] neutron-lib: rehome neutron.object.base along with rbac db/objects
I am going to close this as it's been a number of years and the original patch was abandoned. If someone wants to pick it up please re-open. ** Changed in: neutron Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1815827 Title: [RFE] neutron-lib: rehome neutron.object.base along with rbac db/objects Status in neutron: Won't Fix Bug description: This isn't a request for a new feature per say, but rather a placeholder for the neutron drivers team to take a look at [1]. Specifically I'm hoping for drivers team agreement that the modules/functionality being rehomed in [1] makes sense; no actual (deep) code review of [1] is necessary at this point. Assuming we can agree that the logic in [1] makes sense to rehome, I can proceed by chunking it up into smaller patches that will make the rehome/consume process easier. This work is part of [2] that's described in [3][4]. However as commented in [1], it's also necessary to rehome the rbac db/objects modules and their dependencies that weren't discussed previously. [1] https://review.openstack.org/#/c/621000 [2] https://blueprints.launchpad.net/neutron/+spec/neutron-lib-decouple-db [3] https://specs.openstack.org/openstack/neutron-specs/specs/rocky/neutronlib-decouple-db-apiutils.html [4] https://specs.openstack.org/openstack/neutron-specs/specs/rocky/neutronlib-decouple-models.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1815827/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1694165] Re: Improve Neutron documentation for simpler deployments
The documents have been updated many times over the past 6+ years, I'm going to close this as they are much better now. If there is something specific please open a new bug. ** Changed in: neutron Status: Triaged => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1694165 Title: Improve Neutron documentation for simpler deployments Status in neutron: Won't Fix Bug description: During Boston Summit session, an issue was raised that Neutron documentation for simpler deployments should be improved/simplified. Couple of observations were noted: 1) For a non-neutron savvy users, it is not very intuitive to specify/configure networking requirements. 2) Basic default configuration (as documented) is very OVS centric. It should discuss other non-OVS specific deployments as well. Here is the etherpad with the details of the discussion - https://etherpad.openstack.org/p/pike-neutron-making-it-easy To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1694165/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1797663] Re: refactor def _get_dvr_sync_data from neutron/db/l3_dvr_db.py
As this has never been worked on am going to close. If anyone wants to pick it up please re-open. ** Changed in: neutron Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1797663 Title: refactor def _get_dvr_sync_data from neutron/db/l3_dvr_db.py Status in neutron: Won't Fix Bug description: The function def _get_dvr_sync_data in neutron/db/l3_dvr_db.py is fetching and processing routers data and since its called upon for each dvr ha router type on update, its becomes very hard to pin point issues in such a massive method, so I propose breaking it into two methods. def _get_dvr_sync_data and _process_dvr_sync_data. will make debugging in future easy. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1797663/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1786226] Re: Use sqlalchemy baked query
>From comment in the change that was linked above: "BakedQuery is a legacy extension that no longer does too much beyond what SQLAlchemy 1.4 does in most cases automatically. new development w/ BakedQuery is a non-starter, this is a legacy module we would eventually remove." For that reason I'm going to close this bug. ** Changed in: neutron Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1786226 Title: Use sqlalchemy baked query Status in neutron: Won't Fix Bug description: I am running rally scenario test create_and_list_ports on a 3 controller setup(each controller have 8 CPUs i.e 4 cores*2 HTs) with (function call trace enabled on neutron server processes) a concurrency of 8 for 400 iterations. Average time taken for create port is 7.207 seconds(when 400 ports are created) and the function call trace for this run is at http://paste.openstack.org/show/727718/ and rally results are +---+ | Response Times (sec) | ++---+--+--+--+---+---+-+---+ | Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count | ++---+--+--+--+---+---+-+---+ | neutron.create_network | 2.085 | 2.491| 3.01 | 3.29 | 7.558 | 2.611 | 100.0% | 400 | | neutron.create_port| 5.69 | 6.878| 7.755| 9.394 | 17.0 | 7.207 | 100.0% | 400 | | neutron.list_ports | 0.72 | 5.552| 9.123| 9.599 | 11.165| 5.559 | 100.0% | 400 | | total | 10.085| 15.263 | 18.789 | 19.734 | 28.712| 15.377| 100.0% | 400 | | -> duration | 10.085| 15.263 | 18.789 | 19.734 | 28.712| 15.377| 100.0% | 400 | | -> idle_duration | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 100.0% | 400 | ++---+--+--+--+---+---+-+---+ Michael Bayer (zzzeek) has analysed this callgraph and had some suggestions. One suggestion is to use baked query i.e https://review.openstack.org/#/c/430973/2 This is his analysis - "But looking at the profile I see here, it is clear that the vast majority of time is spent doing lots and lots of small queries, and all of the mechanics involved with turning them into SQL strings and invoking them. SQLAlchemy has a very effective optimization for this but it must be coded into Neutron. Here is the total time spent for Query to convert its state into SQL: 148029/356073 15.2320.000 4583.8200.013 /usr/lib64/python2.7/site- packages/sqlalchemy/orm/query.py:3372(Query._compile_context) that's 4583 seconds spent in Query compilation, which if Neutron were modified to use baked queries, would be vastly reduced. I demonstrated the beginning of this work in 2017 here: https://review.openstack.org/#/c/430973/1 , which illustrates how to first start to create a base query method in neutron that other functions can begin to make use of. As more queries start using the baked form, this 4500 seconds number will begin to drop." I have restored his patch https://review.openstack.org/#/c/430973/2 , with this the average time taken to create port is 5.196 seconds (when 400 ports are created), and the function call trace for this run is at http://paste.openstack.org/show/727719/ also total time spent on Query compilation (Query._compile_context) is only 1675 seconds. 83696/1690627.3080.000 1675.1400.010 /usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py:3372(Query._compile_context) Rally results for this run are +---+ | Response Times (sec) | ++---+--+--+--+---+---+-+---+ | Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count | ++---+--+--+--+---+---+-+---+ |
[Yahoo-eng-team] [Bug 2056613] [NEW] libvirt CPU power management does not support live migration
Public bug reported: Description === libvirt CPU power management does not support live migration Steps to reproduce == 1. Turn on libvirt CPU power management 2. Boot an instance with hw:cpu_policy=dedicated 3. Live migrate the instance Expected result === Live migration succeeds. Actual result = Live migration fails with the following libvirt error in the source nova-compute logs: [instance: afdd5e62-2a97-4b58-a7e7-bb92152f4165] Migration operation thread notification {{(pid=103809) thread_finished /opt/stack/nova/nova/virt/libvirt/driver.py:10668}} Feb 21 19:21:15.045216 np0036828692 nova-compute[103809]: Traceback (most recent call last): Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/hub.py", line 471, in fire_timers Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: timer() Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/timer.py", line 59, in __call__ Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: cb(*args, **kw) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/event.py", line 173, in _do_send Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: waiter.switch(result) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/greenthread.py", line 264, in main Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: result = function(*args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/nova/nova/utils.py", line 664, in context_wrapper Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: return func(*args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 10322, in _live_migration_operation Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: with excutils.save_and_reraise_exception(): Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__ Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: self.force_reraise() Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: raise self.value Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 10311, in _live_migration_operation Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: guest.migrate(self._live_migration_uri(dest), Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 648, in migrate Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: self._domain.migrateToURI3( Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 186, in doit Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: result = proxy_call(self._autowrap, f, *args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 144, in proxy_call Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: rv = execute(f, *args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 125, in execute Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: raise e.with_traceback(tb) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 82, in tworker Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: rv = meth(*args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/usr/lib/python3/dist-packages/libvirt.py", line 2126, in migrateToURI3 Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: raise libvirtError('virDomainMigrateToURI3() failed') Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: libvirt.libvirtError: cannot set CPU affinity on process 48279: Invalid argument Environment === This was originally noticed in a whitebox CI job [1] on devstack master. Additional info === Regardless of whether NUMA live migration has changed the underlying CPU pinnings, it's necessary to make sure the cores are powered up on the destination, otherwise libvirt attempts to pin the instance to an offline core. Nova doesn't handle that. With some refactoring to the
[Yahoo-eng-team] [Bug 2056612] [NEW] libvirt CPU power management does not handle `isolate` emulator thread policy
Public bug reported: Description === libvirt CPU power management does not handle `isolate` emulator thread policy. Steps to reproduce == 1. Turn on libvirt CPU power management 2. Boot an instance with hw:cpu_policy=dedicated and hw:emulator_threads_policy=isolate Expected result === After the execution of the steps above, what should have happened if the issue wasn't present? Actual result = The instance doesn't start, with the following libvirt error in the nova-compute log: Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: : libvirt.libvirtError: cannot set CPU affinity on process 47343: Invalid argument Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest Traceback (most recent call last): Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 165, in launch Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest return self._domain.createWithFlags(flags) Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 186, in doit Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest result = proxy_call(self._autowrap, f, *args, **kwargs) Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 144, in proxy_call Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest rv = execute(f, *args, **kwargs) Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 125, in execute Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest raise e.with_traceback(tb) Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 82, in tworker Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest rv = meth(*args, **kwargs) Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest File "/usr/lib/python3/dist-packages/libvirt.py", line 1385, in createWithFlags Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest raise libvirtError('virDomainCreateWithFlags() failed') Feb 21 19:15:31.316773 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest libvirt.libvirtError: cannot set CPU affinity on process 47343: Invalid argument Feb 21 19:15:31.316773 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest Feb 21 19:15:31.316773 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.driver [None req-ec45061f-e9a4-4b02-9354-0cb390bd28cf tempest-EmulatorThreadTest-1184416592 tempest-EmulatorThreadTest-1184416592-project-member] [instance: f697a24e-6599-4ec0-9e3b-87eba1a81a0b] Failed to start libvirt guest: libvirt.libvirtError: cannot set CPU affinity on process 47343: Invalid argument Environment === This was originally noticed in a whitebox CI job [1] on devstack master. Additional info === When powering up an instance's CPU, Nova doesn't take into account that with the `isolate` emulator thread policy, there's an extra CPU being consumed by the emulator thread. In a real deployment, this results in libvirt trying to pin the instance to an offline CPU. In functional tests, it's relatively easy to observe that CPU not being powered on. [1] https://zuul.opendev.org/t/openstack/build/532b30767df54147a01508e7616930f5/logs ** Affects: nova Importance: Undecided Status: In Progress ** Description changed: Description === libvirt CPU power management does not handle `isolate` emulator thread policy. Steps to reproduce == - 1. Boot an instance with hw:cpu_policy=dedicated and hw:emulator_threads_policy=isolate - + 1. Turn on libvirt CPU power management + 2. Boot an instance with hw:cpu_policy=dedicated and hw:emulator_threads_policy=isolate Expected result === After the execution of the steps above, what should have happened if the issue wasn't present? Actual result = The instance doesn't start, with the following libvirt error in the nova-compute log: Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: : libvirt.libvirtError: cannot set CPU affinity on process 47343: Invalid argument Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR nova.virt.libvirt.guest Traceback (most recent call last): Feb 21 19:15:31.301470 np0036828693 nova-compute[42254]: ERROR