[Yahoo-eng-team] [Bug 2007635] Re: ask for large-scale deployment help

2023-02-18 Thread Belmiro Moreira
This is not a bug. I'm closing it. You can find more information about large deployments in the the Large Scale SIG. https://docs.openstack.org/large-scale/journey/index.html ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of

[Yahoo-eng-team] [Bug 1951617] [NEW] "Quota exceeded" message is confusing for "resize"

2021-11-19 Thread Belmiro Moreira
Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: New ** Changed in: nova Assignee: (unassigned) => Belmiro Moreira (moreira-belmiro-email-lists) -- You received this bug notification because you are a member of Yahoo! Engineering Tea

[Yahoo-eng-team] [Bug 1947753] [NEW] Evacuated instances are not removed from the source

2021-10-19 Thread Belmiro Moreira
Public bug reported: Instance "evacuation" is a great feature and we are trying to take advantage of it. But, it has some limitations, depending how "broken" is the node. Let me give some context... In the scenario where the compute node loses connectivity (broken switch port, loose network

[Yahoo-eng-team] [Bug 1933955] [NEW] Power sync using the Ironic driver queries all the nodes from Ironic when using Conductor Groups

2021-06-29 Thread Belmiro Moreira
Public bug reported: """ While synchronizing instance power states, found 447 instances in the database and 8712 instances on the hypervisor. """ This is the warning message that we get when using conductor groups during a power sync. Conductor groups allow to have dedicated nova-compute

[Yahoo-eng-team] [Bug 1927740] [NEW] Ironic driver persistent warn msg when running only a node per conductor group

2021-05-07 Thread Belmiro Moreira
Public bug reported: ``` 2021-05-07 13:55:12.570 3142 WARNING nova.virt.ironic.driver [req-bcca8fbe-3293-4d85-a3a3-a07328d91c17 - - - - -] This compute service (XXX) is the only service present in the [ironic]/peer_list option. Are you sure this should not include more hosts? ``` The

[Yahoo-eng-team] [Bug 1924612] [NEW] Can't list "killed" images using the CLI

2021-04-15 Thread Belmiro Moreira
Public bug reported: Doing a DB clean up I noticed that we have several images in "killed" state. But using the CLI I wasn't able to list them. However, when the image_id is known the details can be shown and they can be deleted. If an user can't list "killed" images, he doesn't know that those

[Yahoo-eng-team] [Bug 1924585] [NEW] Live Migration - if libvirt timeout the instance goes to error state but the live migration continues

2021-04-15 Thread Belmiro Moreira
Public bug reported: Recently we live migrated an entire cell to new hardware and we hit the following problem several times... During a live migration Nova monitors the state of the migration quering libvirt every 0.5s

[Yahoo-eng-team] [Bug 1924123] [NEW] If source compute node is overcommitted instances can't be migrated

2021-04-14 Thread Belmiro Moreira
Public bug reported: I'm facing a similar issue to "https://bugs.launchpad.net/nova/+bug/1918419; but somehow different which makes me open a new bug. I'm giving some context to this bug to better explain how this affects operations. Here's the story... When a compute node needs a hardware

[Yahoo-eng-team] [Bug 1918419] [NEW] vCPU resource max_unit is hardcoded

2021-03-10 Thread Belmiro Moreira
Public bug reported: Becasue the spectre/meltdown vulnerabilities (2018) we needed to disable SMT in all public facing compute nodes. As result the number of available cores was reduced by half. We had flavors available with 32vCPUs that couldn't be used anymore because placement max_unit for

[Yahoo-eng-team] [Bug 1917645] [NEW] Nova can't create instances if RabbitMQ notification cluster is down

2021-03-03 Thread Belmiro Moreira
Public bug reported: We use independent RabbitMQ clusters for each OpenStack project, Nova Cells and also for notifications. Recently, I noticed in our test infrastructure that if the RabbitMQ cluster for notifications has an outage, Nova can't create new instances. Possibly other operations will

[Yahoo-eng-team] [Bug 1916031] [NEW] Wrong elapsed time logged during a live migration

2021-02-18 Thread Belmiro Moreira
some confusion when operators are debugging issues. In my opinion Nova should log the real migration time. ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: New ** Changed in: nova Assignee: (unassigned) => Belm

[Yahoo-eng-team] [Bug 1902216] [NEW] Can't define a cpu_model from a different architecture

2020-10-30 Thread Belmiro Moreira
Other = I'm now opening target bugs for the generic issue reported in https://bugs.launchpad.net/nova/+bug/1863728 ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: New ** Changed in: nova Assignee: (unassigne

[Yahoo-eng-team] [Bug 1902205] [NEW] UEFI loader should consider the guest architecture not the host

2020-10-30 Thread Belmiro Moreira
Other = I'm now opening target bugs for this issue. It was first reported has a generic bug in https://bugs.launchpad.net/nova/+bug/1863728 ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: New ** Description changed:

[Yahoo-eng-team] [Bug 1902203] [NEW] Instance architecture should be reflected in the instance domain

2020-10-30 Thread Belmiro Moreira
Public bug reported: """ It would be great if Nova supports instances with a different architecture than the host. An use case would be run aarch64 guests in a x86_64 compute node. """ The issue is that nova always uses the architecture from the host when defining the instance domain and not

[Yahoo-eng-team] [Bug 1863728] [NEW] Nova can't create instances for a different arch

2020-02-18 Thread Belmiro Moreira
Public bug reported: This is more a wish feature than a bug but considering the use cases I'm surprised that it's not supported by nova. *Support to create instances for a different architecture than the host architecture* My use case: Running ARM instances in x86_64 compute nodes. This is not

[Yahoo-eng-team] [Bug 1848514] [NEW] Booting from volume providing an image fails

2019-10-17 Thread Belmiro Moreira
Public bug reported: Trying to create an instance (booting from volume when specifying an image) fails. Running Stein (19.0.1) ### When using: ### nova boot --flavor FLAVOR_ID --block-device source=image,id=IMAGE_ID,dest=volume,size=10,shutdown=preserve,bootindex=0 INSTANCE_NAME ###

[Yahoo-eng-team] [Bug 1837200] [NEW] Deleted images info should be obfuscated - OSSN-0075

2019-07-19 Thread Belmiro Moreira
Public bug reported: Because OSSN-0075 the Cloud Operator may choose to never purge the "images" table. But, regulations/policy may require that deleted data is not kept. For this case the deleted image records need to be obfuscated (except the image id). ** Affects: glance Importance:

[Yahoo-eng-team] [Bug 1817542] [NEW] nova instance-action fails if project_id=NULL

2019-02-25 Thread Belmiro Moreira
Public bug reported: nova instance-action fails if project_id=NULL Starting in api version 2.62 "an obfuscated hashed host id is returned" To generate the host_id it uses utils.generate_hostid() that uses (in this case) the project_id and the host of the action. However, we can have actions

[Yahoo-eng-team] [Bug 1816086] [NEW] Resource Tracker performance with Ironic driver

2019-02-15 Thread Belmiro Moreira
Public bug reported: The problem is in rocky. The resource tracker builds the resource provider tree and it's updated 2 times in "_update_available_resource". With "_init_compute_node" and in the "_update_available_resource" itself. The problem is that the RP tree will contain all the ironic

[Yahoo-eng-team] [Bug 1816034] [NEW] Ironic flavor migration and default resource classes

2019-02-15 Thread Belmiro Moreira
Public bug reported: The Ironic flavor migration to use resource classes happened in Pike/Queens. The flavors and the instances needed to be upgraded with the correct resource class. This was done by an online data migration. Looking into Rocky code: ironic.driver._pike_flavor_migration There

[Yahoo-eng-team] [Bug 1810342] [NEW] API unexpected exception message

2019-01-02 Thread Belmiro Moreira
the correct support page. ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: In Progress ** Changed in: nova Assignee: (unassigned) => Belmiro Moreira (moreira-belmiro-email-lists) ** Description changed: - The &q

[Yahoo-eng-team] [Bug 1810340] [NEW] Repetitive info messages from nova-compute

2019-01-02 Thread Belmiro Moreira
structures that store log files for analytics they use significant storage space without bringing reasonable value. ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: In Progress ** Changed in: nova Assignee: (unassigned)

[Yahoo-eng-team] [Bug 1805989] [NEW] Weight policy to stack/spread instances and "max_placement_results"

2018-11-30 Thread Belmiro Moreira
Public bug reported: Weights are applyed by the scheduler. This means that if using "max_placement_results" with a number bellow to the existing resources, the weight policy will only be applied to the subset of allocation candidates retrieved by placement. As consequence we lose the policy to

[Yahoo-eng-team] [Bug 1805984] [NEW] Placement is not aware of disable compute nodes

2018-11-29 Thread Belmiro Moreira
Public bug reported: Placement doesn't know if a resource provider (in this particular case a compute node) is disabled. This is only filtered by the scheduler using the "ComputeFilter". However, when using the option "max_placement_results" to restrict the amount of placement results there is

[Yahoo-eng-team] [Bug 1801897] [NEW] List AVZs can take several seconds

2018-11-06 Thread Belmiro Moreira
Public bug reported: Getting the list of AVZs can take several seconds (~30 secs. in our case) This is noticeable in Horizon when creating a new instance because the user can't select an AVZ until this completes. workflow: - get all services from all cells (~1 for us) - fetch all aggregates

[Yahoo-eng-team] [Bug 1796920] [NEW] Baremetal nodes should not be exposing non-custom-resource-class (vcpu, ram, disk)

2018-10-09 Thread Belmiro Moreira
Public bug reported: Description === Baremetal nodes report CPU, RAM and DISK inventory. The issue is that allocations for baremetal nodes are only done considering the custom_resource_class. This happens because baremetal flavors are set to not consume these resources. See:

[Yahoo-eng-team] [Bug 1771810] [NEW] Quota calculation connects to all available cells

2018-05-17 Thread Belmiro Moreira
Public bug reported: Quota utilisation calculation connects to all cells DBs to get all consumed resources for a project. When having several cells this can be inefficient and can fail if one of the cell DBs is not available. To calculate the quota utilization of a project should be enough to

[Yahoo-eng-team] [Bug 1771806] [NEW] Ironic nova-compute failover creates new resource provider removing the resource_provider_aggregates link

2018-05-17 Thread Belmiro Moreira
Public bug reported: When using the request_filter functionality, aggregates are mapped to placement_aggregates. placement_provider_aggregates contains the resource providers mapped in aggregate_hosts. The problem happens when a nova-compute for ironic fails and hosts are automatically moved

[Yahoo-eng-team] [Bug 1768876] [NEW] Old instances can get AVZ from metadata

2018-05-03 Thread Belmiro Moreira
Public bug reported: Can't get AVZ for old instances: curl http://169.254.169.254/latest/meta-data/placement/availability-zone None# This is because the upcall to the nova_api DB was removed in the commit: 9f7bac2 and old instances may haven't the AVZ defined. Previously, the AVZ in the

[Yahoo-eng-team] [Bug 1767309] [NEW] Placement - Make association_refresh configurable

2018-04-27 Thread Belmiro Moreira
Public bug reported: In Queens the provider-tree refresh happens every 5 min (also in master). ASSOCIATION_REFRESH = 300 For large deployments this creates unnecessary load in placement. This option should be configurable. related with: https://review.openstack.org/#/c/535517/ ** Affects: nova

[Yahoo-eng-team] [Bug 1767303] [NEW] Scheduler connects to all cells DBs to gather compute nodes info

2018-04-27 Thread Belmiro Moreira
Public bug reported: The scheduler host.manager connects to all cells DBs to get compute node info even if only a subset of compute nodes uuids are given by placement. This has a performance impact in large cloud deployments with several cells. Also related with:

[Yahoo-eng-team] [Bug 1761197] [NEW] Not defined keypairs in instance_extra cellsV1 DBs

2018-04-04 Thread Belmiro Moreira
Public bug reported: In newton there was the data migration to fill the "keypair" in instance_extra table. The migration checks if an instance has a keypair and then adds the keypair entry in the instance_extra table. This works if the keypair still exists in the keypair table. However, when

[Yahoo-eng-team] [Bug 1761198] [NEW] "Orphan" request_specs and instance_mappings

2018-04-04 Thread Belmiro Moreira
Public bug reported: request_specs and instance_mappings in nova_api DB are not removed when an instance is deleted. In Queens they are removed when the instances are archived (https://review.openstack.org/#/c/515034/) However, for the deployments that archived instances before running Queens

[Yahoo-eng-team] [Bug 1757472] [NEW] Required to define database/connection when running services for nova_api cell

2018-03-21 Thread Belmiro Moreira
Public bug reported: Services in nova_api cell fail to run if database/connection is not defined. These services should only use api_database/connection. In devstack database/connection is defined with the cell0 DB endpoint. This shouldn't be required because the cell0 is set in nova_api DB.

[Yahoo-eng-team] [Bug 1735353] [NEW] build_request not deleted when using cellsV1 and local nova_api DB

2017-11-30 Thread Belmiro Moreira
Public bug reported: Description === build_request not deleted when using cellsV1 and local nova_api Placement needs to be enabled in Newton. CellsV1 installations can deploy a placement service per child cell in order to have a more efficient schedule during the transition to cellV2.

[Yahoo-eng-team] [Bug 1727266] [NEW] archive_deleted_instances is not atomic for insert/delete

2017-10-25 Thread Belmiro Moreira
).\ order_by(column).limit(max_rows) delete_statement = DeleteFromSelect(table, query_delete, column) (...) conn.execute(insert) result_delete = conn.execute(delete_statement) ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email

[Yahoo-eng-team] [Bug 1726310] [NEW] nova doesn't list services if it can't connect to a cell DB

2017-10-23 Thread Belmiro Moreira
ils. Environment === nova master (commit: 8d21d711000fff80eb367692b157d09b6532923f) ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: New ** Tags: cells ** Changed in: nova Assignee: (unassigned) => Belmiro Moreira (mo

[Yahoo-eng-team] [Bug 1726301] [NEW] Nova should list instances even if it can't connect to a cell DB

2017-10-23 Thread Belmiro Moreira
showing the project instances. Environment === nova master (commit: 8d21d711000fff80eb367692b157d09b6532923f) ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: New ** Tags: cells ** Description changed: Description

[Yahoo-eng-team] [Bug 1681431] Re: "nova-manage db sync" fails from Mitaka to Newton because deleted compute nodes

2017-04-10 Thread Belmiro Moreira
*** This bug is a duplicate of bug 1665719 *** https://bugs.launchpad.net/bugs/1665719 Already fixed #1665719 ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack

[Yahoo-eng-team] [Bug 1681431] [NEW] "nova-manage db sync" fails from Mitaka to Newton because deleted compute nodes

2017-04-10 Thread Belmiro Moreira
ult === DB migrations succeed (334) Actual result = DB doesn't migrate (329) Environment === Tested with "13.1.2" and "14.0.3". ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Statu

[Yahoo-eng-team] [Bug 1533380] [NEW] Creating multiple instances with a single request when using cells creates wrong instance names

2016-01-12 Thread Belmiro Moreira
y_name_template" to uuids has the same problem. For example: (consider a random uuid) test-- test-- ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: New ** Tags: cells ** Changed in: nova Assignee: (unassigned

[Yahoo-eng-team] [Bug 1532562] [NEW] Cell capacities updates include available resources of compute nodes "down"

2016-01-10 Thread Belmiro Moreira
hat can fail with "No valid host". ** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: New ** Tags: cells ** Changed in: nova Assignee: (unassigned) => Belmiro Moreira (moreira-belmiro-email-lists) --

[Yahoo-eng-team] [Bug 1524114] [NEW] nova-scheduler also loads deleted instances at startup

2015-12-08 Thread Belmiro Moreira
Public bug reported: nova-scheduler is loading all instances (including deleted) at startup. Experienced problems when each node has >6000 deleted instances, even when using batches of 10 nodes. Each query can take several minutes and transfer several GB of data. This prevented nova-scheduler

[Yahoo-eng-team] [Bug 1517006] [NEW] Can't create instances with flavors that have extra specs in a cell setup

2015-11-17 Thread Belmiro Moreira
Public bug reported: In a cell setup can't create instances with flavors that have extra specs like: hw:numa_nodes hw:mem_page_size nova-cell in the "child cell" fails with: 2015-11-17 10:51:50.574 ERROR nova.cells.scheduler [req-f7dc64e6-a545-4c2c-bc57-4e4a2e86cf58 demo demo] Couldn't

[Yahoo-eng-team] [Bug 1461777] [NEW] NUMA cell overcommit can leave NUMA cells unused

2015-06-04 Thread Belmiro Moreira
Public bug reported: NUMA cell overcommit can leave NUMA cells unused When no NUMA configuration is defined for the guest (no flavor extra specs), nova identifies the NUMA topology of the host and tries to match the cpu placement to a NUMA cell (cpuset). The cpuset is selected randomly.

[Yahoo-eng-team] [Bug 1454418] [NEW] Evacuate fails when using cells - AttributeError: 'NoneType' object has no attribute 'count'

2015-05-12 Thread Belmiro Moreira
Public bug reported: nova version: 2014.2.2 Using cells (parent - child setup) How to reproduce: nova evacuate instance_uuid target_host ERROR: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-af20-182a-4acd-869a-1b23314b21d4)

[Yahoo-eng-team] [Bug 1448564] [NEW] Rescue using cells fails with: unexpected keyword argument 'expected_task_state'

2015-04-25 Thread Belmiro Moreira
Public bug reported: Instance rescue gets stuck when using cells. nova version: 2014.2.2 Using cells (parent - child setup) How to reproduce: nova rescue instance_uuid - the instance task state stays in rescuing. - nova cells log of the child shows: 2015-04-26 01:26:09.475 20672 ERROR

[Yahoo-eng-team] [Bug 1417027] [NEW] No disable reason defined for new services when enable_new_services=False

2015-02-02 Thread Belmiro Moreira
Public bug reported: When a service is added and enable_new_services=False there is no disable reason specified. Services can be disabled by several reasons and the admins can use the API to specify a reason. However, having services disabled with no reason specified creates additional checks on

[Yahoo-eng-team] [Bug 1414480] [NEW] Cell type in “nova-manage cell create” is different from what is used in nova.conf

2015-01-25 Thread Belmiro Moreira
Public bug reported: The cell_type option is defined in nova.conf as “api” or “compute”. However, when creating a cell using “nova-manage” the cell type “parent” or “child” is expected. nova-manage cell_type should be consistent with what is allowed in nova.conf. ** Affects: nova

[Yahoo-eng-team] [Bug 1369518] [NEW] Server Group Anti/Affinity functionality doesn't work with cells

2014-09-15 Thread Belmiro Moreira
Public bug reported: Server Groups doesn't with cells. Tested in Icehouse. Using the API the server group is created in the top cell and not propagated to children cells. At this point booting a VM fails because schedulers in children cells are not aware of the server group. Creating the

[Yahoo-eng-team] [Bug 1334278] [NEW] limits with tenant parameter returns wrong maxTotal* values

2014-06-25 Thread Belmiro Moreira
Public bug reported: When querying for the absolute limits of a specific tenant the maxTotal* values reported aren't correct. How to reproduce: for example using devstack... OS_TENANT_NAME=demo (11b2b129994844798c98f437d9809a9c) OS_USERNAME=demo $nova absolute-limits

[Yahoo-eng-team] [Bug 1307223] [NEW] If target_cell path not valid instance stays in BUILD status

2014-04-13 Thread Belmiro Moreira
Public bug reported: Using cells and the target_cell filter. With the scheduler hint target_cell if path is not valid instance will stay in scheduling task state. nova cells shows the following trace: 2014-04-13 20:25:40.237 ERROR nova.cells.messaging [req-8bc1d2a7-92aa-48b6-afda-42f255e43904

[Yahoo-eng-team] [Bug 1286527] [NEW] Quota usages update should check all usage in tenant not only per user

2014-03-01 Thread Belmiro Moreira
Public bug reported: After Grizzly - Havana upgrade the quota_usages table was wiped out due to bug #1245746 Quota_usages is then updated after a user creates/delete an instance. The problem is that quota_usages is updated per user in a tenant. For tenants that are shared by different users

[Yahoo-eng-team] [Bug 1282709] [NEW] Instance names always include the first uuid in cell environment when creating multiple instances

2014-02-20 Thread Belmiro Moreira
** Affects: nova Importance: Undecided Assignee: Belmiro Moreira (moreira-belmiro-email-lists) Status: New ** Tags: cells ** Description changed: - When launching multiple instances using nova api in a cell environment (parent-child setup) + When launching multiple instances

[Yahoo-eng-team] [Bug 1274169] [NEW] Nova libvirt driver uses the instance type ID instead the flavor ID when creating instances - problematic with cells

2014-01-29 Thread Belmiro Moreira
Public bug reported: For flavors in cells is needed to create the same flavor manually in all available cells using nova API. If for some reason we need to delete a flavor in a cell the “instance_types” tables will then be out of sync (different IDs for flavors). This blocks the instance

[Yahoo-eng-team] [Bug 1274325] [NEW] Security-groups not working with cells using nova-network

2014-01-29 Thread Belmiro Moreira
Public bug reported: Security groups are not working with cells using nova-network. Only cell API database is updated when adding rules. These are not propagated into the children cells. ** Affects: nova Importance: Undecided Status: New ** Tags: cells ** Description changed:

[Yahoo-eng-team] [Bug 1164408] Re: Snapshot doesn't get hypervisor_type and vm_mode properties

2013-05-13 Thread Belmiro Moreira
** Changed in: nova Status: Triaged = Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1164408 Title: Snapshot doesn't get hypervisor_type and vm_mode