This is not a bug. I'm closing it.
You can find more information about large deployments in the the Large Scale
SIG.
https://docs.openstack.org/large-scale/journey/index.html
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: New
** Changed in: nova
Assignee: (unassigned) => Belmiro Moreira (moreira-belmiro-email-lists)
--
You received this bug notification because you are a member of Yahoo!
Engineering Tea
Public bug reported:
Instance "evacuation" is a great feature and we are trying to take advantage of
it.
But, it has some limitations, depending how "broken" is the node.
Let me give some context...
In the scenario where the compute node loses connectivity (broken switch
port, loose network
Public bug reported:
"""
While synchronizing instance power states, found 447 instances in the database
and 8712 instances on the hypervisor.
"""
This is the warning message that we get when using conductor groups
during a power sync.
Conductor groups allow to have dedicated nova-compute
Public bug reported:
```
2021-05-07 13:55:12.570 3142 WARNING nova.virt.ironic.driver
[req-bcca8fbe-3293-4d85-a3a3-a07328d91c17 - - - - -] This compute service
(XXX) is the only service present in the [ironic]/peer_list option. Are you
sure this should not include more hosts?
```
The
Public bug reported:
Doing a DB clean up I noticed that we have several images in "killed" state.
But using the CLI I wasn't able to list them.
However, when the image_id is known the details can be shown and they can be
deleted.
If an user can't list "killed" images, he doesn't know that those
Public bug reported:
Recently we live migrated an entire cell to new hardware and we hit the
following problem several times...
During a live migration Nova monitors the state of the migration quering
libvirt every 0.5s
Public bug reported:
I'm facing a similar issue to "https://bugs.launchpad.net/nova/+bug/1918419;
but somehow different which makes me open a new bug.
I'm giving some context to this bug to better explain how this affects
operations. Here's the story...
When a compute node needs a hardware
Public bug reported:
Becasue the spectre/meltdown vulnerabilities (2018) we needed to disable
SMT in all public facing compute nodes. As result the number of
available cores was reduced by half.
We had flavors available with 32vCPUs that couldn't be used anymore
because placement max_unit for
Public bug reported:
We use independent RabbitMQ clusters for each OpenStack project, Nova
Cells and also for notifications. Recently, I noticed in our test
infrastructure that if the RabbitMQ cluster for notifications has an
outage, Nova can't create new instances. Possibly other operations will
some confusion when operators are debugging
issues.
In my opinion Nova should log the real migration time.
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: New
** Changed in: nova
Assignee: (unassigned) => Belm
Other
=
I'm now opening target bugs for the generic issue reported in
https://bugs.launchpad.net/nova/+bug/1863728
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: New
** Changed in: nova
Assignee: (unassigne
Other
=
I'm now opening target bugs for this issue.
It was first reported has a generic bug in
https://bugs.launchpad.net/nova/+bug/1863728
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: New
** Description changed:
Public bug reported:
"""
It would be great if Nova supports instances with a different architecture than
the host.
An use case would be run aarch64 guests in a x86_64 compute node.
"""
The issue is that nova always uses the architecture from the host when defining
the instance domain and not
Public bug reported:
This is more a wish feature than a bug but considering the use cases I'm
surprised that it's not supported by nova.
*Support to create instances for a different architecture than the host
architecture*
My use case: Running ARM instances in x86_64 compute nodes.
This is not
Public bug reported:
Trying to create an instance (booting from volume when specifying an image)
fails.
Running Stein (19.0.1)
###
When using:
###
nova boot --flavor FLAVOR_ID --block-device
source=image,id=IMAGE_ID,dest=volume,size=10,shutdown=preserve,bootindex=0
INSTANCE_NAME
###
Public bug reported:
Because OSSN-0075 the Cloud Operator may choose to never purge the "images"
table.
But, regulations/policy may require that deleted data is not kept.
For this case the deleted image records need to be obfuscated (except
the image id).
** Affects: glance
Importance:
Public bug reported:
nova instance-action fails if project_id=NULL
Starting in api version 2.62 "an obfuscated hashed host id is returned"
To generate the host_id it uses utils.generate_hostid() that uses (in this
case) the project_id and the host of the action.
However, we can have actions
Public bug reported:
The problem is in rocky.
The resource tracker builds the resource provider tree and it's updated 2 times
in "_update_available_resource".
With "_init_compute_node" and in the "_update_available_resource" itself.
The problem is that the RP tree will contain all the ironic
Public bug reported:
The Ironic flavor migration to use resource classes happened in
Pike/Queens.
The flavors and the instances needed to be upgraded with the correct resource
class.
This was done by an online data migration.
Looking into Rocky code: ironic.driver._pike_flavor_migration
There
the correct support page.
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: In Progress
** Changed in: nova
Assignee: (unassigned) => Belmiro Moreira (moreira-belmiro-email-lists)
** Description changed:
- The &q
structures that store log files for analytics they use
significant storage space without bringing reasonable value.
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: In Progress
** Changed in: nova
Assignee: (unassigned)
Public bug reported:
Weights are applyed by the scheduler.
This means that if using "max_placement_results" with a number bellow to the
existing resources,
the weight policy will only be applied to the subset of allocation candidates
retrieved by placement.
As consequence we lose the policy to
Public bug reported:
Placement doesn't know if a resource provider (in this particular case a
compute node) is disabled. This is only filtered by the scheduler using
the "ComputeFilter".
However, when using the option "max_placement_results" to restrict the
amount of placement results there is
Public bug reported:
Getting the list of AVZs can take several seconds (~30 secs. in our case)
This is noticeable in Horizon when creating a new instance because the user
can't select an AVZ until this completes.
workflow:
- get all services from all cells (~1 for us)
- fetch all aggregates
Public bug reported:
Description
===
Baremetal nodes report CPU, RAM and DISK inventory.
The issue is that allocations for baremetal nodes are only done considering the
custom_resource_class. This happens because baremetal flavors are set to not
consume these resources.
See:
Public bug reported:
Quota utilisation calculation connects to all cells DBs to get all consumed
resources for a project.
When having several cells this can be inefficient and can fail if one of the
cell DBs is not available.
To calculate the quota utilization of a project should be enough to
Public bug reported:
When using the request_filter functionality, aggregates are mapped to
placement_aggregates.
placement_provider_aggregates contains the resource providers mapped in
aggregate_hosts.
The problem happens when a nova-compute for ironic fails and hosts are
automatically moved
Public bug reported:
Can't get AVZ for old instances:
curl http://169.254.169.254/latest/meta-data/placement/availability-zone
None#
This is because the upcall to the nova_api DB was removed in the commit: 9f7bac2
and old instances may haven't the AVZ defined.
Previously, the AVZ in the
Public bug reported:
In Queens the provider-tree refresh happens every 5 min (also in master).
ASSOCIATION_REFRESH = 300
For large deployments this creates unnecessary load in placement.
This option should be configurable.
related with:
https://review.openstack.org/#/c/535517/
** Affects: nova
Public bug reported:
The scheduler host.manager connects to all cells DBs to get compute node
info even if only a subset of compute nodes uuids are given by
placement.
This has a performance impact in large cloud deployments with several
cells.
Also related with:
Public bug reported:
In newton there was the data migration to fill the "keypair" in instance_extra
table.
The migration checks if an instance has a keypair and then adds the keypair
entry in the instance_extra table. This works if the keypair still exists in
the keypair table.
However, when
Public bug reported:
request_specs and instance_mappings in nova_api DB are not removed when an
instance is deleted.
In Queens they are removed when the instances are archived
(https://review.openstack.org/#/c/515034/)
However, for the deployments that archived instances before running
Queens
Public bug reported:
Services in nova_api cell fail to run if database/connection is not defined.
These services should only use api_database/connection.
In devstack database/connection is defined with the cell0 DB endpoint.
This shouldn't be required because the cell0 is set in nova_api DB.
Public bug reported:
Description
===
build_request not deleted when using cellsV1 and local nova_api
Placement needs to be enabled in Newton.
CellsV1 installations can deploy a placement service per child cell in order to
have a more efficient schedule during the transition to cellV2.
).\
order_by(column).limit(max_rows)
delete_statement = DeleteFromSelect(table, query_delete, column)
(...)
conn.execute(insert)
result_delete = conn.execute(delete_statement)
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email
ils.
Environment
===
nova master (commit: 8d21d711000fff80eb367692b157d09b6532923f)
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: New
** Tags: cells
** Changed in: nova
Assignee: (unassigned) => Belmiro Moreira (mo
showing the project instances.
Environment
===
nova master (commit: 8d21d711000fff80eb367692b157d09b6532923f)
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: New
** Tags: cells
** Description changed:
Description
*** This bug is a duplicate of bug 1665719 ***
https://bugs.launchpad.net/bugs/1665719
Already fixed #1665719
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack
ult
===
DB migrations succeed (334)
Actual result
=
DB doesn't migrate (329)
Environment
===
Tested with "13.1.2" and "14.0.3".
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Statu
y_name_template" to uuids has the same
problem.
For example: (consider a random uuid)
test--
test--
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: New
** Tags: cells
** Changed in: nova
Assignee: (unassigned
hat can fail with "No valid
host".
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: New
** Tags: cells
** Changed in: nova
Assignee: (unassigned) => Belmiro Moreira (moreira-belmiro-email-lists)
--
Public bug reported:
nova-scheduler is loading all instances (including deleted) at startup.
Experienced problems when each node has >6000 deleted instances, even when
using batches of 10 nodes.
Each query can take several minutes and transfer several GB of data.
This prevented nova-scheduler
Public bug reported:
In a cell setup can't create instances with flavors that have extra specs like:
hw:numa_nodes
hw:mem_page_size
nova-cell in the "child cell" fails with:
2015-11-17 10:51:50.574 ERROR nova.cells.scheduler
[req-f7dc64e6-a545-4c2c-bc57-4e4a2e86cf58 demo demo] Couldn't
Public bug reported:
NUMA cell overcommit can leave NUMA cells unused
When no NUMA configuration is defined for the guest (no flavor extra specs),
nova identifies the NUMA topology of the host and tries to match the cpu
placement to a NUMA cell (cpuset).
The cpuset is selected randomly.
Public bug reported:
nova version: 2014.2.2
Using cells (parent - child setup)
How to reproduce:
nova evacuate instance_uuid target_host
ERROR: The server has either erred or is incapable of performing the requested
operation. (HTTP 500) (Request-ID: req-af20-182a-4acd-869a-1b23314b21d4)
Public bug reported:
Instance rescue gets stuck when using cells.
nova version: 2014.2.2
Using cells (parent - child setup)
How to reproduce:
nova rescue instance_uuid
- the instance task state stays in rescuing.
- nova cells log of the child shows:
2015-04-26 01:26:09.475 20672 ERROR
Public bug reported:
When a service is added and enable_new_services=False there is no disable
reason specified.
Services can be disabled by several reasons and the admins can use the API to
specify a reason. However, having services disabled with no reason specified
creates additional checks on
Public bug reported:
The cell_type option is defined in nova.conf as “api” or “compute”.
However, when creating a cell using “nova-manage” the cell type “parent” or
“child” is expected.
nova-manage cell_type should be consistent with what is allowed in nova.conf.
** Affects: nova
Public bug reported:
Server Groups doesn't with cells.
Tested in Icehouse.
Using the API the server group is created in the top cell and not propagated
to children cells.
At this point booting a VM fails because schedulers in children cells are not
aware of the server group.
Creating the
Public bug reported:
When querying for the absolute limits of a specific tenant
the maxTotal* values reported aren't correct.
How to reproduce:
for example using devstack...
OS_TENANT_NAME=demo (11b2b129994844798c98f437d9809a9c)
OS_USERNAME=demo
$nova absolute-limits
Public bug reported:
Using cells and the target_cell filter.
With the scheduler hint target_cell if path is not valid
instance will stay in scheduling task state.
nova cells shows the following trace:
2014-04-13 20:25:40.237 ERROR nova.cells.messaging
[req-8bc1d2a7-92aa-48b6-afda-42f255e43904
Public bug reported:
After Grizzly - Havana upgrade the quota_usages table was
wiped out due to bug #1245746
Quota_usages is then updated after a user creates/delete an instance.
The problem is that quota_usages is updated per user in a tenant.
For tenants that are shared by different users
** Affects: nova
Importance: Undecided
Assignee: Belmiro Moreira (moreira-belmiro-email-lists)
Status: New
** Tags: cells
** Description changed:
- When launching multiple instances using nova api in a cell environment
(parent-child setup)
+ When launching multiple instances
Public bug reported:
For flavors in cells is needed to create the same flavor manually in all
available cells using nova API. If for some reason we need to delete a
flavor in a cell the “instance_types” tables will then be out of sync
(different IDs for flavors).
This blocks the instance
Public bug reported:
Security groups are not working with cells using nova-network.
Only cell API database is updated when adding rules. These are not propagated
into the children cells.
** Affects: nova
Importance: Undecided
Status: New
** Tags: cells
** Description changed:
** Changed in: nova
Status: Triaged = Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1164408
Title:
Snapshot doesn't get hypervisor_type and vm_mode
57 matches
Mail list logo