Public bug reported:

When removing and then redeploying a compute node, the process fails due
to a pymysql.err.IntegrityError: (1062, "Duplicate entry ...") in the
Nova database.

This issue occurs because the nova-compute service deletion succeeds
even when there are instances still present on the host. The service
deletion API call removes the host mapping and the service and resource
provider records from Placement, but it does not delete the compute_node
object from the database. This leaves an orphaned compute_nodes record.

When the compute node is subsequently reprovisioned, Nova attempts to
create a new compute_nodes record, which conflicts with the orphaned
record and violates the database's unique constraint, leading to the
"Duplicate entry" error.

Steps to Reproduce:

* Deploy a compute node and launch an instance on it.
* Disable and delete the nova-compute service for that node.
* Observe that the service deletion succeeds, despite the presence of an 
instance.
* Attempt to redeploy the same compute node.

The nova-compute service will fail to start, and the logs will show a
"Duplicate entry" error.

Expected Behavior:

The service deletion should fail if there are instances on the host, as per the 
check in nova/api/openstack/compute/services.py:
https://github.com/openstack/nova/blob/8b81b5f91ffe1f9c38a483d151b82316d443dbf6/nova/api/openstack/compute/services.py#L268-L274

Actual Behavior:
The service deletion succeeds, leaving an orphaned compute_nodes record in the 
database and causing redeployment to fail.

Workaround:
The only workaround is to manually delete the orphaned compute_nodes record 
from the database using nova-manage cell_v2 delete_host before attempting to 
redeploy the node (but I haven't tried this yet!).

Conclusion:
This is a bug in the service deletion logic. The check for existing instances 
is not functioning as expected, which leads to an inconsistent state in the 
Nova database and prevents the successful redeployment of compute nodes. This 
creates a significant operational issue for anyone needing to perform 
maintenance or hardware replacement on compute nodes.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2127831

Title:
  Nova service deletion succeeds despite existing instances, causing
  "Duplicate entry" error on redeployment

Status in OpenStack Compute (nova):
  New

Bug description:
  When removing and then redeploying a compute node, the process fails
  due to a pymysql.err.IntegrityError: (1062, "Duplicate entry ...") in
  the Nova database.

  This issue occurs because the nova-compute service deletion succeeds
  even when there are instances still present on the host. The service
  deletion API call removes the host mapping and the service and
  resource provider records from Placement, but it does not delete the
  compute_node object from the database. This leaves an orphaned
  compute_nodes record.

  When the compute node is subsequently reprovisioned, Nova attempts to
  create a new compute_nodes record, which conflicts with the orphaned
  record and violates the database's unique constraint, leading to the
  "Duplicate entry" error.

  Steps to Reproduce:

  * Deploy a compute node and launch an instance on it.
  * Disable and delete the nova-compute service for that node.
  * Observe that the service deletion succeeds, despite the presence of an 
instance.
  * Attempt to redeploy the same compute node.

  The nova-compute service will fail to start, and the logs will show a
  "Duplicate entry" error.

  Expected Behavior:

  The service deletion should fail if there are instances on the host, as per 
the check in nova/api/openstack/compute/services.py:
  
https://github.com/openstack/nova/blob/8b81b5f91ffe1f9c38a483d151b82316d443dbf6/nova/api/openstack/compute/services.py#L268-L274

  Actual Behavior:
  The service deletion succeeds, leaving an orphaned compute_nodes record in 
the database and causing redeployment to fail.

  Workaround:
  The only workaround is to manually delete the orphaned compute_nodes record 
from the database using nova-manage cell_v2 delete_host before attempting to 
redeploy the node (but I haven't tried this yet!).

  Conclusion:
  This is a bug in the service deletion logic. The check for existing instances 
is not functioning as expected, which leads to an inconsistent state in the 
Nova database and prevents the successful redeployment of compute nodes. This 
creates a significant operational issue for anyone needing to perform 
maintenance or hardware replacement on compute nodes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2127831/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to