Reviewed: https://review.opendev.org/c/openstack/nova/+/959604 Committed: https://opendev.org/openstack/nova/commit/567dbe1867602d544945b3584c3885ac146b6535 Submitter: "Zuul (22348)" Branch: master
commit 567dbe1867602d544945b3584c3885ac146b6535 Author: Sean Mooney <[email protected]> Date: Thu Sep 4 21:42:04 2025 +0100 hypervisors: Optimize uptime retrieval for better performance The /os-hypervisors/detail API endpoint was experiencing significant performance issues in environments with many compute nodes when using microversion 2.88 or higher, as it made sequential RPC calls to gather uptime information from each compute node. This change optimizes uptime retrieval by: * Adding uptime to periodic resource updates sent by nova-compute to the database, eliminating synchronous RPC calls during API requests * Restricting RPC-based uptime retrieval to hypervisor types that support it (libvirt and z/VM), avoiding unnecessary calls that would always fail * Preferring cached database uptime data over RPC calls when available Closes-Bug: #2122036 Assisted-By: Claude <[email protected]> Change-Id: I5723320f578192f7e0beead7d5df5d7e47d54d2b Co-Authored-By: Sylvain Bauza <[email protected]> Signed-off-by: Sean Mooney <[email protected]> ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2122036 Title: /os-hypervisors/detail takes too long to complete for 2.88 microversion Status in OpenStack Compute (nova): Fix Released Bug description: To Reproduce Steps to reproduce the behavior: In Antelope environment with huge number of compute nodes run "openstack hypervisor list" command. It could take more that 40 seconds to complete and provide an output. Expected behavior Command is completed quickly by default, extra delays are expected when operator explicitly asks for extra data. Bug impact May block command from completion with default timeouts (it will fail before because HAProxy will return 504). Also, we shouldn't likely activate time-consuming options by default. Known workaround Specify earlier API version (2.68 for example) --- There is another independent case that can cause slowness. The uptime RPC only called on computes that are considered up, but if the compute is down, but such fact is not yet detected by the conductor due to the missing hearthbeat then the the RPC is sent but never answered causing unnecessary delay in the API response. --- The slowness is due to 2.88 hypervisor/details includes the compute uptime and nova gathers that by RPC calling down to each computes sequentially. Older microversion should be use as a workaround where uptime is not part of that API As a future mitigation we should implement a periodic task in nova- compute that periodically reports the uptime to the compute_nodes.stas json blob into the cell DB in a new service version. And change the API to query RPC down to the compute if the service version is old. If the service version is new enough then the API can use the data directly from the DB. If we don't introduce a service version but instead use the existence of the field in the json blob as a condition then we can probably make the feature backportable. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2122036/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

