Public bug reported: This is based on code inspection currently but it looks like this should fail in the following case:
https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/api.py#L3723 When we attach a volume to a shelved offloaded server, we create the BDM in the API. If the API is configured to point at cell0, then the BDM would be created in cell0. When we unshelve the instance, conductor asks the scheduler for a host (which is in some cell) and we build the instance in that cell. This could be a different cell because we currently don't restrict that in the conductor task manager when unshelving like we do for migrate: https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/conductor/tasks/migrate.py#L63-L66 The fact we don't restrict where the instance goes when it's unshelved is a separate bug. When unshelving the instance, it gets built on some compute and we pull the BDMs from the database configured for that cell (should be cell1, cell2, ..., cellN - some specific non-cell0 database). https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/manager.py#L4513 If the BDM was created in the API in cell0, it shouldn't come back from that query in the compute manager code. What's most confusing about this is Tempest has tests for testing attach/detach a volume to a shelved offloaded instance: https://github.com/openstack/tempest/blob/21dd8a5ee2ab5a068cbb20d0468bd5f444fef59a/tempest/api/compute/volumes/test_attach_volume.py#L148 And those are passing on the devstack change that runs with multiple cells and configures the API to use cell0 for the [database] section where the BDM would live: https://review.openstack.org/#/c/473565/ Unless maybe that test is broken. We are configured to run ssh validation in the gate jobs on master (pike) though, so the test is counting the number of partitions on the guest before and after the unshelve operation to see that they show up. It's also listing volume attachments for the instance after unshelve. ** Affects: nova Importance: High Assignee: Dan Smith (danms) Status: In Progress ** Tags: cells shelve volumes -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1702932 Title: Unshelving an offloaded server with volume attachments may not attach to the guest in multi-cell env Status in OpenStack Compute (nova): In Progress Bug description: This is based on code inspection currently but it looks like this should fail in the following case: https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/api.py#L3723 When we attach a volume to a shelved offloaded server, we create the BDM in the API. If the API is configured to point at cell0, then the BDM would be created in cell0. When we unshelve the instance, conductor asks the scheduler for a host (which is in some cell) and we build the instance in that cell. This could be a different cell because we currently don't restrict that in the conductor task manager when unshelving like we do for migrate: https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/conductor/tasks/migrate.py#L63-L66 The fact we don't restrict where the instance goes when it's unshelved is a separate bug. When unshelving the instance, it gets built on some compute and we pull the BDMs from the database configured for that cell (should be cell1, cell2, ..., cellN - some specific non-cell0 database). https://github.com/openstack/nova/blob/56cd608d3a199dcb02ac2ae071ff3057241259da/nova/compute/manager.py#L4513 If the BDM was created in the API in cell0, it shouldn't come back from that query in the compute manager code. What's most confusing about this is Tempest has tests for testing attach/detach a volume to a shelved offloaded instance: https://github.com/openstack/tempest/blob/21dd8a5ee2ab5a068cbb20d0468bd5f444fef59a/tempest/api/compute/volumes/test_attach_volume.py#L148 And those are passing on the devstack change that runs with multiple cells and configures the API to use cell0 for the [database] section where the BDM would live: https://review.openstack.org/#/c/473565/ Unless maybe that test is broken. We are configured to run ssh validation in the gate jobs on master (pike) though, so the test is counting the number of partitions on the guest before and after the unshelve operation to see that they show up. It's also listing volume attachments for the instance after unshelve. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1702932/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

