[Yahoo-eng-team] [Bug 1750780] Re: Race with local file systems can make open-vm-tools fail to start
cloud-init task was just for discussion, marking it invalid to make clear there is no cloud-init action needed. ** Changed in: cloud-init Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1750780 Title: Race with local file systems can make open-vm-tools fail to start Status in cloud-init: Invalid Status in open-vm-tools package in Ubuntu: Triaged Bug description: Since the change in [1] open-vm-tools-service starts very (very) early. Not so much due to the Before=cloud-init-local.service But much more by DefaultDependencies=no That can trigger an issue that looks like root@ubuntuguest:~# systemctl status -l open-vm-tools.service ● open-vm-tools.service - Service for virtual machines hosted on VMware Loaded: loaded (/lib/systemd/system/open-vm-tools.service; enabled; vendor preset: enabled) Active: failed (Result: resources) As it is right now open-vm-tools can race with the other early start and then fail. In detail one can find a message like: open-vm-tools.service: Failed to run 'start' task: Read-only file system" This is due to privtaeTmp=yes which is also set needing a writable /var/tmp [2] To ensure this works PrivateTmp would have to be removed (not good) or some after dependencies added that make this work reliably. I added After=local-fs.target which made it work for me in 3/3 tests. I' like to have an ack by the cloud-init Team that this does not totally kill the originally intended Before=cloud-init-local.service I think it does not as local-fs can complete before cloud-init-local, then open-vm-tools can initialize and finally cloud-init-local can pick up the data. To summarize: # cloud-init-local # DefaultDependencies=no Wants=network-pre.target After=systemd-remount-fs.service Before=NetworkManager.service Before=network-pre.target Before=shutdown.target Before=sysinit.target Conflicts=shutdown.target RequiresMountsFor=/var/lib/cloud # open-vm-tools # DefaultDependencies=no Before=cloud-init-local.service Proposed is to add to the latter: After=local-fs.target [1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859677 [2]: https://github.com/systemd/systemd/issues/5610 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1750780/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1384379] Re: versions resource uses host_url which may be incorrect
Reviewed: https://review.openstack.org/15 Committed: https://git.openstack.org/cgit/openstack/trove/commit/?id=1667ad5e80be7d0bf3ac8e02410a18ce3a0ea4cd Submitter: Zuul Branch:master commit 1667ad5e80be7d0bf3ac8e02410a18ce3a0ea4cd Author: Zhao ChaoDate: Wed Feb 7 11:07:02 2018 +0800 Allow host URL for versions to be configurable The versions resource constructs the links by using application_url, but it's possible that the API endpoint is behind a load balancer or SSL terminator. This means that the application_url might be incorrect. This fix provides a config option (similar to other services) which lets us override the host URL when constructing links for the versions API. Co-Authored-By: Nikhil Manchanda Change-Id: I23f06c6c2d52ba46c74e0d097c4963d2de731d30 Closes-bug: 1384379 ** Changed in: trove Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1384379 Title: versions resource uses host_url which may be incorrect Status in Cinder: Fix Released Status in Glance: Fix Released Status in Glance icehouse series: Triaged Status in Glance juno series: Triaged Status in OpenStack Heat: Triaged Status in Ironic: Fix Released Status in Manila: Fix Released Status in OpenStack Compute (nova): Fix Released Status in OpenStack DBaaS (Trove): Fix Released Bug description: The versions resource constructs the links by using host_url, but the glance api endpoint may be behind a proxy or ssl terminator. This means that host_url may be incorrect. It should have a config option to override host_url like the other services do when constructing versions links. To manage notifications about this bug go to: https://bugs.launchpad.net/cinder/+bug/1384379/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1727260] Re: Nova assumes that a volume is fully detached from the compute if the volume is not defined in the instance's libvirt definition
Reviewed: https://review.openstack.org/515008 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ce531dd1b763704b9043ddde8e80ac99cd193660 Submitter: Zuul Branch:master commit ce531dd1b763704b9043ddde8e80ac99cd193660 Author: Sahid Orentino FerdjaouiDate: Wed Oct 25 05:57:11 2017 -0400 libvirt: disconnect volume from host during detach Under certain failure scenarios it may be that although the libvirt definition for the volume has been removed for the instance that the associated storage lun on the compute server may not have been fully cleaned up yet. In case users try an other attempt to detach volume we should not stop the process whether the device is not found in domain definition but try to disconnect the logical device from host. This commit makes the process to attempt a disconnect volume even if the device is not attached to the guest. Closes-Bug: #1727260 Change-Id: I4182642aab3fd2ffb1c97d2de9bdca58982289d8 Signed-off-by: Sahid Orentino Ferdjaoui ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1727260 Title: Nova assumes that a volume is fully detached from the compute if the volume is not defined in the instance's libvirt definition Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Confirmed Status in OpenStack Compute (nova) queens series: In Progress Bug description: During a volume detach operation, Nova compute attempts to remove the volume from libvirt for the instance before proceeding to remove the storage lun from the underlying compute host. If Nova discovers that the volume was not found in the instance's libvirt definition then it ignores that error condition and returns (after issuing a warning message "Ignoring DiskNotFound exception while detaching"). However, under certain failure scenarios it may be that although the libvirt definition for the volume has been removed for the instance that the associated storage lun on the compute server may not have been fully cleaned up yet. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1727260/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750666] Re: Deleting an instance before scheduling with BFV fails to detach volume
** Also affects: nova/queens Importance: Undecided Status: New ** Tags added: queens-rc-potential volumes ** Also affects: nova/pike Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750666 Title: Deleting an instance before scheduling with BFV fails to detach volume Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: New Status in OpenStack Compute (nova) queens series: New Bug description: If you try to boot and instance and delete it early before scheduling, the '_delete_while_booting' codepath hits `_attempt_delete_of_buildrequest` which tries to remove the block device mappings. However, if the cloud contains compute nodes before Pike, no block device mappings will be present in the database (because they are only saved if using the new attachment flow), which means the attachment IDs are empty and the volume delete fails: 2018-02-20 16:02:25,063 WARNING [nova.compute.api] Ignoring volume cleanup failure due to Object action obj_load_attr failed because: attribute attachment_id not lazy-loadable To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1750666/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1404867] Re: Volume remains in-use status, if instance booted from volume is deleted in error state
** Also affects: nova/ocata Importance: Undecided Status: New ** Changed in: nova/ocata Assignee: (unassigned) => Mohammed Naser (mnaser) ** Changed in: nova/ocata Status: New => In Progress ** Changed in: nova/ocata Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1404867 Title: Volume remains in-use status, if instance booted from volume is deleted in error state Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: In Progress Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: In Progress Bug description: If the instance is booted from volume and goes in to error state due to some reason. Volume from which instance is booted, remains in-use state even the instance is deleted. IMO, volume should be detached so that it can be used to boot other instance. Steps to reproduce: 1. Log in to Horizon, create a new volume. 2. Create an Instance using newly created volume. 3. Verify instance is in active state. $ source devstack/openrc demo demo $ nova list +--+--+++-+--+ | ID | Name | Status | Task State | Power State | Networks | +--+--+++-+--+ | dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | ACTIVE | - | Running | private=10.0.0.3 | +--+--+++-+--+ Note: Use shelve-unshelve api to see the instance goes into error state. unshelving volumed back instance does not work and sets instance state to error state (ref: https://bugs.launchpad.net/nova/+bug/1404801) 4. Shelve the instance $ nova shelve 5. Verify the status is SHELVED_OFFLOADED. $ nova list +--+--+---++-+--+ | ID | Name | Status| Task State | Power State | Networks | +--+--+---++-+--+ | dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | SHELVED_OFFLOADED | - | Shutdown| private=10.0.0.3 | +--+--+---++-+--+ 6. Unshelve the instance. $ nova unshelve 5. Verify the instance is in Error state. $ nova list +--+--+---++-+--+ | ID | Name | Status| Task State | Power State | Networks | +--+--+---++-+--+ | dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | Error | unshelving | Spawning| private=10.0.0.3 | +--+--+---++-+--+ 6. Delete the instance using Horizon. 7. Verify that volume still in in-use state $ cinder list +--++--+--+-+--+--+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--++--+--+-+--+--+ | 4aeefd25-10aa-42c2-9a2d-1c89a95b4d4f | in-use | test | 1 | lvmdriver-1 | true | 8f7bdc24-1891-4bbb-8f0c-732b9cbecae7 | +--++--+--+-+--+--+ 8. In Horizon, volume "Attached To" information is displayed as "Attached to None on /dev/vda". 9. User is not able to delete this volume, or attached it to another instance as it is still in use. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1404867/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1408527] Re: Delete instance without block_device_mapping record in database after schedule error
** Also affects: nova/queens Importance: Undecided Status: New ** Also affects: nova/ocata Importance: Undecided Status: New ** Also affects: nova/pike Importance: Undecided Status: New ** Changed in: nova/ocata Status: New => In Progress ** Changed in: nova/pike Status: New => In Progress ** Changed in: nova/queens Status: New => In Progress ** Changed in: nova/ocata Assignee: (unassigned) => Mohammed Naser (mnaser) ** Changed in: nova Assignee: Ankit Agrawal (ankitagrawal) => melanie witt (melwitt) ** Changed in: nova/pike Assignee: (unassigned) => Mohammed Naser (mnaser) ** Changed in: nova/queens Assignee: (unassigned) => Mohammed Naser (mnaser) ** Changed in: nova/ocata Importance: Undecided => Medium ** Changed in: nova Importance: Low => Medium ** Changed in: nova/pike Importance: Undecided => Medium ** Changed in: nova/queens Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1408527 Title: Delete instance without block_device_mapping record in database after schedule error Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: In Progress Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: In Progress Bug description: When a instance with cinder volume is failed to be scheduled to a host, its status becomes error. Now I delete it successfully, but in block_device_mapping table of nova database, the volume information of the instance is still kept, and not deleted. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1408527/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750666] Related fix merged to nova (master)
Reviewed: https://review.openstack.org/546315 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3120627d9802ceda46c2db387fec8fbc80700338 Submitter: Zuul Branch:master commit 3120627d9802ceda46c2db387fec8fbc80700338 Author: Mohammed NaserDate: Tue Feb 20 16:47:06 2018 -0500 Add functional test for deleting BFV server with old attach flow When creating a new instance and deleting it before it gets scheduled with the old attachment flow (reserve_volume), the block device mappings are not persisted to database which means that the clean up fails because it tries to lookup attachment_id which cannot be lazy loaded. This patch adds a (failing) functional test to check for this issue which will be addressed in a follow-up patch. Related-Bug: #1750666 Change-Id: I294c54e5a22dd6e5b226a4b00e7cd116813f0704 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750666 Title: Deleting an instance before scheduling with BFV fails to detach volume Status in OpenStack Compute (nova): Fix Released Bug description: If you try to boot and instance and delete it early before scheduling, the '_delete_while_booting' codepath hits `_attempt_delete_of_buildrequest` which tries to remove the block device mappings. However, if the cloud contains compute nodes before Pike, no block device mappings will be present in the database (because they are only saved if using the new attachment flow), which means the attachment IDs are empty and the volume delete fails: 2018-02-20 16:02:25,063 WARNING [nova.compute.api] Ignoring volume cleanup failure due to Object action obj_load_attr failed because: attribute attachment_id not lazy-loadable To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1750666/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1742963] Re: Cannot boot VM with Contrail SDN controller
Reviewed: https://review.openstack.org/533212 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1f5fe3190bf2e0987945a6ef9ec430673c9fa736 Submitter: Zuul Branch:master commit 1f5fe3190bf2e0987945a6ef9ec430673c9fa736 Author: Édouard ThuleauDate: Fri Jan 12 16:20:32 2018 +0100 Update plugs Contrail methods to work with privsep As privsep uses the msgpack to send method arguments to the privsep daemon, we could not use anymore custom data type like nova.objects.instance.Instance. Change-Id: I09f04d5b2f1cb39339ad7c4569186db5d361797a Closes-Bug: #1742963 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1742963 Title: Cannot boot VM with Contrail SDN controller Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Bug description: On the master branch, nova-compute fails to create vif on the Contrail vrouter compute agent and the instance fails to spawn: ... Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] six.reraise(self.type_, self.value, self.tb) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/opt/stack/openstack/nova/nova/virt/libvirt/driver.py", line 5238, in _create_domain_and_network Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] self.plug_vifs(instance, network_info) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/opt/stack/openstack/nova/nova/virt/libvirt/driver.py", line 755, in plug_vifs Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] self.vif_driver.plug(instance, vif) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/opt/stack/openstack/nova/nova/virt/libvirt/vif.py", line 769, in p lug Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] func(instance, vif) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/opt/stack/openstack/nova/nova/virt/libvirt/vif.py", line 727, in plug_vrouter Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] instance, vif, ip_addr, ip6_addr, ptype) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 207, in _wrap Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] return self.channel.remote_call(name, args, kwargs) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 192, in remote_call Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] result = self.send_recv((Message.CALL.value, name, args, kwargs)) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/comm.py", line 163, in send_recv Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] self.writer.send((myid, msg)) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/comm.py", line 54, in send Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] buf = msgpack.packb(msg, use_bin_type=True) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/msgpack/__init__.py", line 47, in packb Jan 12
[Yahoo-eng-team] [Bug 1750917] [NEW] Keystone returns a HTTP 500 error if xmlsec CLI is missing
Public bug reported: Keystone log is also unhelpful. All we got is "ERROR idp _sign_assertion Error when signing assertion, reason: [Errno 2] No such file or directory" When the xmlsec1 package is absent. We may need to add a check here https://github.com/openstack/keystone/blob/master/keystone/federation/idp.py#L421 to see if CONF.saml.xmlsec1_binary exist. If absent, we just to provide a more helpful log entry. Steps to reproduce: 1. Install devstack and enable federation. 2. Uninstall the xmlsec1 package 3. Try to authenticate via federation and you'll get a HTTP 500 error and the corresponding log entry in keystone.log ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1750917 Title: Keystone returns a HTTP 500 error if xmlsec CLI is missing Status in OpenStack Identity (keystone): New Bug description: Keystone log is also unhelpful. All we got is "ERROR idp _sign_assertion Error when signing assertion, reason: [Errno 2] No such file or directory" When the xmlsec1 package is absent. We may need to add a check here https://github.com/openstack/keystone/blob/master/keystone/federation/idp.py#L421 to see if CONF.saml.xmlsec1_binary exist. If absent, we just to provide a more helpful log entry. Steps to reproduce: 1. Install devstack and enable federation. 2. Uninstall the xmlsec1 package 3. Try to authenticate via federation and you'll get a HTTP 500 error and the corresponding log entry in keystone.log To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1750917/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1700748] Re: Persistent tokens are not cleaned up when removing users from projects
Now that we're in the Rocky development cycle, we've removed the uuid token provider and the sql token storage driver, since it was slated for removal this release [0]. [0] https://review.openstack.org/#/c/543060/ ** Changed in: keystone Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1700748 Title: Persistent tokens are not cleaned up when removing users from projects Status in OpenStack Identity (keystone): Invalid Bug description: If deleting a role, we should iterate over the assignments for this role and build the list of tokens we need to delete. In order to minimize the number of token list to delete, remove any redundant user+project deletions. I think simplify the list for the same user is Improper, the same user and different project target different tokens. At the same time, original processing actually doesn't work due to user_ids is never added to. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1700748/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750064] Re: multiattach volume failures are masked in compute api
Reviewed: https://review.openstack.org/545478 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5754ac0ab3670b47243b2d880aa153a7f42a3ac5 Submitter: Zuul Branch:master commit 5754ac0ab3670b47243b2d880aa153a7f42a3ac5 Author: Matt RiedemannDate: Fri Feb 16 17:38:38 2018 -0500 Fix error handling in compute API for multiattach errors Because of the big Exception block in _validate_bdm, the multiattach-specific errors raised out of the _check_attach_and_reserve_volume method were being lost and the very generic InvalidBDMVolume was returned to the user. For example, I hit this when trying to create a server from a multiattach volume but forgot to specify microversion 2.60 and it was just telling me it couldn't get the volume, which I knew was bogus since I could get the volume details. The fix is to handle the specific errors we want to re-raise. The tests, which missed this because of their high-level mocking, are updated so that we actually get to the problematic code and only the things we don't care about along the way are mocked out. Change-Id: I0b397e5bcdfd635fa562beb29819dd8c6b828e8a Closes-Bug: #1750064 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750064 Title: multiattach volume failures are masked in compute api Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Bug description: While trying to create a server from a multiattach volume, I kept getting this unhelpful error: $ openstack server create --flavor 1 --wait --volume cirros-multiattach-volume new-server Block Device Mapping is Invalid: failed to get volume fde36bc8-3104-43b7-902a-3b391d7f4e12. (HTTP 400) (Request-ID: req-4e2dc785-82b6-4e48-93e5-9b72bd62b3ba) By adding some debug code to nova-api, I figured out it was this: Feb 16 21:18:35 multiattach devstack@n-api.service[13811]: ERROR nova.compute.api [None req-88476e2e-f6ed-4b95-bed2-f6cf567b044d demo demo] Failed to get volume fde36bc8-3104-43b7-902a-3b391d7f4e12. Error: Multiattach volumes are only supported starting with compute API version 2.60.: MultiattachNotSupportedOldMicroversion: Multiattach volumes are only supported starting with compute API version 2.60. This is the problematic exception block: https://github.com/openstack/nova/blob/9910ac95eb9ac822ef5d38b8af2d3aff5dc4d25a/nova/compute/api.py#L1354 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1750064/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750680] Re: Nova returns a traceback when it's unable to detach a volume still in use
Reviewed: https://review.openstack.org/546423 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b16c0f10539a6c6b547a70a41c75ef723fc618ce Submitter: Zuul Branch:master commit b16c0f10539a6c6b547a70a41c75ef723fc618ce Author: Dan SmithDate: Tue Feb 20 14:41:35 2018 -0800 Avoid exploding if guest refuses to detach a volume When we run detach_volume(), the guest has to respond to the ACPI eject request in order for us to proceed. It may not do this at all if the volume is mounted or in-use, or may not by the time we time out if lots of dirty data needs flushing. Right now, we let the failure exception bubble up to our caller and we log a nasty stack trace, which doesn't really convey the reason (and that it's an expected and reasonable thing to happen). Thus, this patch catches that, logs the situation at warning level and avoids the trace. Change-Id: I3800b466a50b1e5f5d1e8c8a963d9a6258af67ee Closes-Bug: #1750680 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750680 Title: Nova returns a traceback when it's unable to detach a volume still in use Status in OpenStack Compute (nova): Fix Released Bug description: Description === If libvirt is unable to detach a volume because it's still in-use by the guest (either mounted and/or file opened), nova returns a traceback. Steps to reproduce == * Create an instance with volume attached using heat * Make sure there's activity on the volume * Delete stack Expected result === We would expect nova to not return a traceback but a clean log about its incapacity to detach volume. If possible, that would be great if that exception was raised back to either cinder or heat. Actual result = ``` 21495 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall [-] Dynamic interval looping call 'oslo_service.loopingcall._func' failed: DeviceDetachFailed: Device detach failed for vdf: Unable to detach from guest transient domain. 21496 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall Traceback (most recent call last): 21497 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 137, in _run_loop 21498 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall result = func(*self.args, **self.kw) 21499 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 415, in _func 21500 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall return self._sleep_time 21501 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 21502 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall self.force_reraise() 21503 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 21504 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall six.reraise(self.type_, self.value, self.tb) 21505 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 394, in _func 21506 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall result = f(*args, **kwargs) 21507 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 462, in _do_wait_and_retry_detach 21508 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall device=alternative_device_name, reason=reason) 21509 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall DeviceDetachFailed: Device detach failed for vdf: Unable to detach from guest transient domain. ``` Environment === * Red Hat Openstack 12 ``` libvirt-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 15:28:48 2018 libvirt-client-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 15:26:07 2018 libvirt-daemon-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 15:26:02 2018 libvirt-daemon-config-network-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 15:26:06 2018 libvirt-daemon-config-nwfilter-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 15:26:05 2018 libvirt-daemon-driver-interface-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 15:26:05 2018 libvirt-daemon-driver-lxc-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 15:26:06 2018 libvirt-daemon-driver-network-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 15:26:02 2018 libvirt-daemon-driver-nodedev-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 15:26:05 2018
[Yahoo-eng-team] [Bug 1750084] Re: Report client associations include non-sharing providers
Reviewed: https://review.openstack.org/545494 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d2152f309439a9d9b054a481826242ca15f5c93e Submitter: Zuul Branch:master commit d2152f309439a9d9b054a481826242ca15f5c93e Author: Eric FriedDate: Fri Feb 16 18:18:32 2018 -0600 Only pull associated *sharing* providers It was discussed and decided [1] that we only want to be pulling down, caching, and passing to update_provider_tree providers associated via aggregate with the compute node's provider tree if they are sharing providers. Otherwise we'll get e.g. all the *other* compute nodes which are also associated with a sharing provider. [1] https://review.openstack.org/#/c/540111/4/specs/rocky/approved/update-provider-tree.rst@48 Change-Id: Iab366da7623e5e31b8416e89fee7d418f7bf9b30 Closes-Bug: #1750084 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750084 Title: Report client associations include non-sharing providers Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: In Progress Bug description: It was discussed and decided [1] that we only want to be pulling down, caching, and passing to update_provider_tree providers associated via aggregate with the compute node's provider tree if they are sharing providers. Otherwise we'll get e.g. all the *other* compute nodes which are also associated with a sharing provider. [1] https://review.openstack.org/#/c/540111/4/specs/rocky/approved /update-provider-tree.rst@48 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1750084/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750892] [NEW] Image remains in queued status after location set via PATCH
Public bug reported: Pike release, with show_image_direct_url and show_multiple_locations enabled. Attempting to create an image using the HTTP backend with the glance v2 API. I create a new/blank image (goes into "queued" status), then set the location with: curl -g -i -X PATCH -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'User-Agent: imacdonn-getting-dangerous' -H 'X-Auth-Token: xxx' -H 'Content-Type: application/openstack-images-v2.1-json-patch' -d '[{"op":"replace", "path": "/locations", "value": [{"url": "http://my_http_server/cirros.img;, "metadata": {}}]}]' http://my_glance_api_endpoint:9292/v2/images/e5581f14-2d05-4ae7-8d78-9da42731a37e This results in the direct_url getting set correctly, and the size of the image is correctly determined, but the image remains in "queued" status. It should become "active". ** Affects: glance Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1750892 Title: Image remains in queued status after location set via PATCH Status in Glance: New Bug description: Pike release, with show_image_direct_url and show_multiple_locations enabled. Attempting to create an image using the HTTP backend with the glance v2 API. I create a new/blank image (goes into "queued" status), then set the location with: curl -g -i -X PATCH -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'User-Agent: imacdonn-getting-dangerous' -H 'X-Auth-Token: xxx' -H 'Content-Type: application/openstack-images-v2.1-json-patch' -d '[{"op":"replace", "path": "/locations", "value": [{"url": "http://my_http_server/cirros.img;, "metadata": {}}]}]' http://my_glance_api_endpoint:9292/v2/images/e5581f14-2d05-4ae7-8d78-9da42731a37e This results in the direct_url getting set correctly, and the size of the image is correctly determined, but the image remains in "queued" status. It should become "active". To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1750892/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750884] [NEW] [2.4, bionic] /etc/resolv.conf not configured correctly in Bionic, leads to no DNS resolution
Public bug reported: When deploying Bionic, /etc/resolv.conf is not configured correctly, which leads to no DNS resolution. In the output below, you will see that netplan config is correctly to the 10.90.90.1 nameserver, but in resolv.conf that's a local address. Resolv.conf should really be configured to use the provided DNS server(s). That said, despite that fact, DNS resolution doesn't work with the local address. Bionic -- ubuntu@node01:~$ cat /etc/netplan/50-cloud-init.yaml # This file is generated from information provided by # the datasource. Changes to it will not persist across an instance. # To disable cloud-init's network configuration capabilities, write a file # /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following: # network: {config: disabled} network: version: 2 ethernets: enp0s25: match: macaddress: b8:ae:ed:7d:17:d2 mtu: 1500 nameservers: addresses: - 10.90.90.1 search: - maaslab - maas set-name: enp0s25 bridges: br0: addresses: - 10.90.90.3/24 gateway4: 10.90.90.1 interfaces: - enp0s25 parameters: forward-delay: 15 stp: false ubuntu@node01:~$ cat /etc/resolv.conf # This file is managed by man:systemd-resolved(8). Do not edit. # # 127.0.0.53 is the systemd-resolved stub resolver. # run "systemd-resolve --status" to see details about the actual nameservers. nameserver 127.0.0.53 search maaslab maas ubuntu@node01:~$ ping google.com ping: google.com: Temporary failure in name resolution [...] ubuntu@node01:~$ sudo vim /etc/resolv.conf ubuntu@node01:~$ cat /etc/resolv.conf # This file is managed by man:systemd-resolved(8). Do not edit. # # 127.0.0.53 is the systemd-resolved stub resolver. # run "systemd-resolve --status" to see details about the actual nameservers. nameserver 10.90.90.1 search maaslab maas ubuntu@node01:~$ ping google.com PING google.com (172.217.0.174) 56(84) bytes of data. 64 bytes from mia09s16-in-f14.1e100.net (172.217.0.174): icmp_seq=1 ttl=52 time=4.46 ms 64 bytes from mia09s16-in-f14.1e100.net (172.217.0.174): icmp_seq=2 ttl=52 time=4.38 ms = Xenial == ubuntu@node05:~$ cat /etc/network/interfaces.d/50-cloud-init.cfg # This file is generated from information provided by # the datasource. Changes to it will not persist across an instance. # To disable cloud-init's network configuration capabilities, write a file # /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following: # network: {config: disabled} auto lo iface lo inet loopback dns-nameservers 10.90.90.1 dns-search maaslab maas auto enp0s25 iface enp0s25 inet static address 10.90.90.162/24 gateway 10.90.90.1 mtu 1500 ubuntu@node05:~$ cat /etc/resolv.conf # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8) # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN nameserver 10.90.90.1 search maaslab maas ** Affects: cloud-init Importance: Undecided Status: New ** Affects: maas Importance: Undecided Status: Invalid ** Affects: nplan (Ubuntu) Importance: Critical Status: New ** Affects: systemd (Ubuntu) Importance: Critical Status: New ** Also affects: cloud-init Importance: Undecided Status: New ** Also affects: nplan (Ubuntu) Importance: Undecided Status: New ** Also affects: systemd (Ubuntu) Importance: Undecided Status: New ** Changed in: nplan (Ubuntu) Importance: Undecided => Critical ** Changed in: systemd (Ubuntu) Importance: Undecided => Critical ** Changed in: maas Status: New => Incomplete ** Changed in: maas Status: Incomplete => Invalid ** Description changed: When deploying Bionic, /etc/resolv.conf is not configured correctly, which leads to no DNS resolution. In the output below, you will see that netplan config is correctly to the 10.90.90.1 nameserver, but in resolv.conf that's a local address. + Resolv.conf should really be configured to use the provided DNS + server(s) Bionic -- ubuntu@node01:~$ cat /etc/netplan/50-cloud-init.yaml # This file is generated from information provided by # the datasource. Changes to it will not persist across an instance. # To disable cloud-init's network configuration capabilities, write a file # /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following: # network: {config: disabled} network: - version: 2 - ethernets: - enp0s25: - match: - macaddress: b8:ae:ed:7d:17:d2 - mtu: 1500 - nameservers: - addresses: - - 10.90.90.1 - search: -
[Yahoo-eng-team] [Bug 1742963] Re: Cannot boot VM with Contrail SDN controller
** Also affects: nova/queens Importance: Undecided Status: New ** Changed in: nova Importance: Undecided => High ** Changed in: nova/queens Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1742963 Title: Cannot boot VM with Contrail SDN controller Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: New Bug description: On the master branch, nova-compute fails to create vif on the Contrail vrouter compute agent and the instance fails to spawn: ... Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] six.reraise(self.type_, self.value, self.tb) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/opt/stack/openstack/nova/nova/virt/libvirt/driver.py", line 5238, in _create_domain_and_network Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] self.plug_vifs(instance, network_info) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/opt/stack/openstack/nova/nova/virt/libvirt/driver.py", line 755, in plug_vifs Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] self.vif_driver.plug(instance, vif) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/opt/stack/openstack/nova/nova/virt/libvirt/vif.py", line 769, in p lug Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] func(instance, vif) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/opt/stack/openstack/nova/nova/virt/libvirt/vif.py", line 727, in plug_vrouter Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] instance, vif, ip_addr, ip6_addr, ptype) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 207, in _wrap Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] return self.channel.remote_call(name, args, kwargs) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 192, in remote_call Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] result = self.send_recv((Message.CALL.value, name, args, kwargs)) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/comm.py", line 163, in send_recv Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] self.writer.send((myid, msg)) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/oslo_privsep/comm.py", line 54, in send Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] buf = msgpack.packb(msg, use_bin_type=True) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "/usr/local/lib/python2.7/dist-packages/msgpack/__init__.py", line 47, in packb Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] return Packer(**kwargs).pack(o) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] File "msgpack/_packer.pyx", line 231, in msgpack._packer.Packer.pack (msg pack/_packer.cpp:3661) Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR nova.compute.manager [instance:
[Yahoo-eng-team] [Bug 1750084] Re: Report client associations include non-sharing providers
** Changed in: nova Importance: Undecided => High ** Also affects: nova/queens Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750084 Title: Report client associations include non-sharing providers Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) queens series: New Bug description: It was discussed and decided [1] that we only want to be pulling down, caching, and passing to update_provider_tree providers associated via aggregate with the compute node's provider tree if they are sharing providers. Otherwise we'll get e.g. all the *other* compute nodes which are also associated with a sharing provider. [1] https://review.openstack.org/#/c/540111/4/specs/rocky/approved /update-provider-tree.rst@48 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1750084/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750890] [NEW] Neutron db performance at scale
Public bug reported: OpenStack Neutron (like OpenStack) relies on SQL Alcehmy and its ORM for database support. From our observations, Neutron is not utilizing the ORM models directly, but rather inserting an additional model layer above SQLAlchmeny and manually building these models from a number of underlying DB models. We ran into significant performance issues due to the increased number of queries at large scale. For ports the problem starts here https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_common.py#L202-L219. The base dict is built from a single DB query row and then the processing of all extensions (which is the default behaviour) leads to a sequential series of additional queries per row to augment the dict. In our opinion, this causes issues from a performance perspective, it leads to the classic n+1 query anti-pattern and fundamentally does not scale (an alternate option would be to do a “joined” query with active extensions). This illustrates the type of workarounds that result from this approach https://github.com/openstack/neutron/blob/master/neutron/db/_utils.py#L95-L107. Instead of using native SQL to filter fields from the result the whole result reset has to be iterated to filter out fields, again surely this is an anti-pattern when processing DB objects. With respect to LBaaS support, we removed the intermediate model layer with this (and a couple of previous) commit(s) https://github.com/sapcc /neutron-lbaas/commit/f71867fbf6c8a27df43aaff6046948dce60f3081. This is just an interim change but after implementing this we saw LBAAS API requests going from > 1-5 minutes and degrading with # of objects to a consistent sub second response time. Version: This is/should be present in all versions, but our testing has been done in Mitaka and above. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1750890 Title: Neutron db performance at scale Status in neutron: New Bug description: OpenStack Neutron (like OpenStack) relies on SQL Alcehmy and its ORM for database support. From our observations, Neutron is not utilizing the ORM models directly, but rather inserting an additional model layer above SQLAlchmeny and manually building these models from a number of underlying DB models. We ran into significant performance issues due to the increased number of queries at large scale. For ports the problem starts here https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_common.py#L202-L219. The base dict is built from a single DB query row and then the processing of all extensions (which is the default behaviour) leads to a sequential series of additional queries per row to augment the dict. In our opinion, this causes issues from a performance perspective, it leads to the classic n+1 query anti-pattern and fundamentally does not scale (an alternate option would be to do a “joined” query with active extensions). This illustrates the type of workarounds that result from this approach https://github.com/openstack/neutron/blob/master/neutron/db/_utils.py#L95-L107. Instead of using native SQL to filter fields from the result the whole result reset has to be iterated to filter out fields, again surely this is an anti-pattern when processing DB objects. With respect to LBaaS support, we removed the intermediate model layer with this (and a couple of previous) commit(s) https://github.com/sapcc/neutron- lbaas/commit/f71867fbf6c8a27df43aaff6046948dce60f3081. This is just an interim change but after implementing this we saw LBAAS API requests going from > 1-5 minutes and degrading with # of objects to a consistent sub second response time. Version: This is/should be present in all versions, but our testing has been done in Mitaka and above. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1750890/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750705] Re: glance db_sync requires mysql db to have log_bin_trust_function_creators = 1
** Also affects: charm-percona-cluster Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1750705 Title: glance db_sync requires mysql db to have log_bin_trust_function_creators = 1 Status in OpenStack percona-cluster charm: New Status in Glance: New Status in glance package in Ubuntu: New Bug description: Upon deploying glance via cs:~openstack-charmers-next/xenial/glance glance appears to throw a CRIT unhandled error, so far I have experienced this on arm64. Not sure about other archs at this point in time. Decided to bug and will investigate further. Cloud- xenial-queens/proposed This occurs when the the shared-db-relation hook fires for mysql :shared-db. unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed CRITI [glance] Unhandled error unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed Traceback (most recent call last): unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/bin/glance-manage", line 10, in unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed sys.exit(main()) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 528, in main unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed return CONF.command.action_fn() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 360, in sync unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed self.command_object.sync(CONF.command.version) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 153, in sync unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed self.expand() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 208, in expand unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed self._sync(version=expand_head) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 168, in _sync unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed alembic_command.upgrade(a_config, version) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/command.py", line 254, in upgrade unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed script.run_env() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/script/base.py", line 425, in run_env unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed util.load_python_file(self.dir, 'env.py') unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 93, in load_python_file unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed module = load_module_py(module_id, path) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/util/compat.py", line 75, in load_module_py unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed mod = imp.load_source(module_id, path, fp) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/db/sqlalchemy/alembic_migrations/env.py", line 88, in unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed run_migrations_online() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/db/sqlalchemy/alembic_migrations/env.py", line 83, in run_migrations_online unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed context.run_migrations() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "", line 8, in run_migrations unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/runtime/environment.py", line 836, in run_migrations unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed self.get_context().run_migrations(**kw) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/runtime/migration.py", line 330, in run_migrations unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed step.migration_fn(**kw)
[Yahoo-eng-team] [Bug 1750705] Re: glance CRITI Unhandled error DBError Duplicate column name
Need to tie down the exact requirements but: set global log_bin_trust_function_creators = 1; and then dropping and recreating the DB resolves this issue; apparently creation of triggers which include 'unsafe' functions requires this setting. ** Project changed: charm-glance => glance (Ubuntu) ** Summary changed: - glance CRITI Unhandled error DBError Duplicate column name + glance db_sync requires mysql db to have log_bin_trust_function_creators = 1 ** Also affects: glance Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1750705 Title: glance db_sync requires mysql db to have log_bin_trust_function_creators = 1 Status in Glance: New Status in glance package in Ubuntu: New Bug description: Upon deploying glance via cs:~openstack-charmers-next/xenial/glance glance appears to throw a CRIT unhandled error, so far I have experienced this on arm64. Not sure about other archs at this point in time. Decided to bug and will investigate further. Cloud- xenial-queens/proposed This occurs when the the shared-db-relation hook fires for mysql :shared-db. unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed CRITI [glance] Unhandled error unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed Traceback (most recent call last): unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/bin/glance-manage", line 10, in unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed sys.exit(main()) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 528, in main unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed return CONF.command.action_fn() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 360, in sync unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed self.command_object.sync(CONF.command.version) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 153, in sync unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed self.expand() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 208, in expand unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed self._sync(version=expand_head) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 168, in _sync unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed alembic_command.upgrade(a_config, version) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/command.py", line 254, in upgrade unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed script.run_env() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/script/base.py", line 425, in run_env unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed util.load_python_file(self.dir, 'env.py') unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 93, in load_python_file unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed module = load_module_py(module_id, path) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/util/compat.py", line 75, in load_module_py unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed mod = imp.load_source(module_id, path, fp) unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/db/sqlalchemy/alembic_migrations/env.py", line 88, in unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed run_migrations_online() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/glance/db/sqlalchemy/alembic_migrations/env.py", line 83, in run_migrations_online unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed context.run_migrations() unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "", line 8, in run_migrations unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed File "/usr/lib/python2.7/dist-packages/alembic/runtime/environment.py", line 836, in run_migrations
[Yahoo-eng-team] [Bug 1749397] Re: In Verify operation of the Identity service 1st step is not required as the file /etc/keystone/keystone-paste.ini doesn't contain admin_auth_token
*** This bug is a duplicate of bug 1716797 *** https://bugs.launchpad.net/bugs/1716797 Looks like this was reported right after https://bugs.launchpad.net/keystone/+bug/1716797 was committed. ** This bug has been marked a duplicate of bug 1716797 Verify operation in keystone: step 1 has already been done -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1749397 Title: In Verify operation of the Identity service 1st step is not required as the file /etc/keystone/keystone-paste.ini doesn't contain admin_auth_token Status in OpenStack Identity (keystone): New Bug description: In Verify operation of the Identity service 1st step is not required as the file /etc/keystone/keystone-paste.ini doesn't contain admin_auth_token in the sections [pipeline:public_api], [pipeline:admin_api], and [pipeline:api_v3]. https://docs.openstack.org/keystone/pike/install/keystone-verify- ubuntu.html To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1749397/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1727260] Re: Nova assumes that a volume is fully detached from the compute if the volume is not defined in the instance's libvirt definition
** Also affects: nova/pike Importance: Undecided Status: New ** Also affects: nova/queens Importance: Undecided Status: New ** Changed in: nova/pike Status: New => Confirmed ** Changed in: nova/queens Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1727260 Title: Nova assumes that a volume is fully detached from the compute if the volume is not defined in the instance's libvirt definition Status in OpenStack Compute (nova): In Progress Status in OpenStack Compute (nova) pike series: Confirmed Status in OpenStack Compute (nova) queens series: In Progress Bug description: During a volume detach operation, Nova compute attempts to remove the volume from libvirt for the instance before proceeding to remove the storage lun from the underlying compute host. If Nova discovers that the volume was not found in the instance's libvirt definition then it ignores that error condition and returns (after issuing a warning message "Ignoring DiskNotFound exception while detaching"). However, under certain failure scenarios it may be that although the libvirt definition for the volume has been removed for the instance that the associated storage lun on the compute server may not have been fully cleaned up yet. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1727260/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1742505] Re: gre_sys set to default 1472 when using path_mtu > 1500 with ovs 2.8.x
This bug was fixed in the package openvswitch - 2.8.1-0ubuntu0.17.10.2 --- openvswitch (2.8.1-0ubuntu0.17.10.2) artful; urgency=medium * d/p/dpif-kernel-gre-mtu-workaround.patch, d/p/dpif-netlink-rtnl-Use-65000-instead-of-65535-as-tunnel-MTU.patch: Cherry pick in-flight fixes for workaround to correctly set MTU of GRE devices via netlink (LP: #1742505). openvswitch (2.8.1-0ubuntu0.17.10.1) artful; urgency=medium * New upstream stable release (LP: #1724622). -- James PageSat, 20 Jan 2018 10:22:31 + ** Changed in: openvswitch (Ubuntu Artful) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1742505 Title: gre_sys set to default 1472 when using path_mtu > 1500 with ovs 2.8.x Status in Ubuntu Cloud Archive: Fix Committed Status in Ubuntu Cloud Archive pike series: Fix Committed Status in Ubuntu Cloud Archive queens series: Fix Committed Status in neutron: Invalid Status in linux package in Ubuntu: Confirmed Status in openvswitch package in Ubuntu: Fix Released Status in linux source package in Artful: Confirmed Status in openvswitch source package in Artful: Fix Released Status in linux source package in Bionic: Confirmed Status in openvswitch source package in Bionic: Fix Released Bug description: [Impact] OpenStack Clouds using GRE overlay tunnels with > 1500 MTU's will observe packet fragmentation/networking issues for traffic in overlay networks. [Test Case] Deploy OpenStack Pike (xenial + pike UCA or artful) Create tenant networks using GRE segmentation Boot instances Instance networking will be broken/slow gre_sys devices will be set to mtu=1472 on hypervisor hosts. [Regression Potential] Minimal; the fix to OVS works around an issue for GRE tunnel port setup via rtnetlink by performing a second request once the gre device is setup to set the MTU to a high value (65000). [Original Bug Report] Setup: Pike neutron 11.0.2-0ubuntu1.1~cloud0 OVS 2.8.0 Jumbo frames setttings per: https://docs.openstack.org/mitaka/networking-guide/config-mtu.html global_physnet_mtu = 9000 path_mtu = 9000 Symptoms: gre_sys MTU is 1472 Instances with MTUs > 1500 fail to communicate across GRE Temporary Workaround: ifconfig gre_sys MTU 9000 Note: When ovs rebuilds tunnels, such as on a restart, gre_sys MTU is set back to default 1472. Note: downgrading from OVS 2.8.0 to 2.6.1 resolves the issue. Previous behavior: With Ocata or Pike and OVS 2.6.x gre_sys MTU defaults to 65490 It remains at 65490 through restarts. This may be related to some combination of the following changes in OVS which seem to imply MTUs must be set in the ovs database for tunnel interfaces and patches: https://github.com/openvswitch/ovs/commit/8c319e8b73032e06c7dd1832b3b31f8a1189dcd1 https://github.com/openvswitch/ovs/commit/3a414a0a4f1901ba015ec80b917b9fb206f3c74f https://github.com/openvswitch/ovs/blob/6355db7f447c8e83efbd4971cca9265f5e0c8531/datapath/vport-internal_dev.c#L186 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1742505/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750829] [NEW] RFE: libvirt: Add ability to configure extra CPU flags for named CPU models
Public bug reported: Motivation -- The recent "Meltdown" CVE fixes resulted in criticial performance penalty, From here[*]: [...] However, in examining both the various fixes rolled out in actual Linux distros over the past few days and doing some very informal surveying of environments I have access to, I discovered that the PCID ["process-context identifiers"] processor feature, which used to be a virtual no-op, is now a performance AND security critical item.[...] So if a Nova user has applied all the "Meltdown" CVE fixes, and is using a named CPU model (like "IvyBridge", or "Westmere" — which specifically lack the said obscure "PCID" feature) they will incur severe performance degradation[*]. Note that some of Intel *physical* CPUs themselves include the 'pcid' CPU feature flag; but the named CPU models provided by libvirt & QEMU lack that flag — hence we explicitly specify it for virtual CPUs via the following proposed config attribute. [*] https://groups.google.com/forum/m/#!topic/mechanical- sympathy/L9mHTbeQLNU Proposed change --- Modify Nova's libvirt driver such that it will be possible to set granular CPU feature flags for named CPU models. E.g. to explicitly specify the 'pcid' feature flag with Intel IvyBridge CPU model, set the following in /etc/nova.conf: ... [libvirt] cpu_model=IvyBridge cpu_model_extra_flags="pcid" ... The list of known CPU feature flags ('vmx', 'xtpr', 'pcid', et cetera) can be found in /usr/share/libvirt/cpu_map.xml. Note that before specifying extra CPU feature flags, one should check if the named CPU models (provided by libvirt) already include the said flags. E.g. the 'Broadwell', 'Haswell-noTSX' named CPU models provided by libvirt already provides the 'pcid' CPU feature flag. Other use cases --- - Nested Virtualization — an operator can specify the Intel 'vmx' or AMD 'svm' flags in the level-1 guest (i.e. the guest hypervisor) - Ability to use 1GB huge pages with Haswell model as one use case for extra flags (thanks: Daniel Berrangé, for mentioning this scenario): cpu_model_extra_flags=Haswell cpu_model_extra_flags="pdpe1gb" ** Affects: nova Importance: Undecided Assignee: Kashyap Chamarthy (kashyapc) Status: In Progress ** Tags: libvirt ** Tags added: libvirt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750829 Title: RFE: libvirt: Add ability to configure extra CPU flags for named CPU models Status in OpenStack Compute (nova): In Progress Bug description: Motivation -- The recent "Meltdown" CVE fixes resulted in criticial performance penalty, From here[*]: [...] However, in examining both the various fixes rolled out in actual Linux distros over the past few days and doing some very informal surveying of environments I have access to, I discovered that the PCID ["process-context identifiers"] processor feature, which used to be a virtual no-op, is now a performance AND security critical item.[...] So if a Nova user has applied all the "Meltdown" CVE fixes, and is using a named CPU model (like "IvyBridge", or "Westmere" — which specifically lack the said obscure "PCID" feature) they will incur severe performance degradation[*]. Note that some of Intel *physical* CPUs themselves include the 'pcid' CPU feature flag; but the named CPU models provided by libvirt & QEMU lack that flag — hence we explicitly specify it for virtual CPUs via the following proposed config attribute. [*] https://groups.google.com/forum/m/#!topic/mechanical- sympathy/L9mHTbeQLNU Proposed change --- Modify Nova's libvirt driver such that it will be possible to set granular CPU feature flags for named CPU models. E.g. to explicitly specify the 'pcid' feature flag with Intel IvyBridge CPU model, set the following in /etc/nova.conf: ... [libvirt] cpu_model=IvyBridge cpu_model_extra_flags="pcid" ... The list of known CPU feature flags ('vmx', 'xtpr', 'pcid', et cetera) can be found in /usr/share/libvirt/cpu_map.xml. Note that before specifying extra CPU feature flags, one should check if the named CPU models (provided by libvirt) already include the said flags. E.g. the 'Broadwell', 'Haswell-noTSX' named CPU models provided by libvirt already provides the 'pcid' CPU feature flag. Other use cases --- - Nested Virtualization — an operator can specify the Intel 'vmx' or AMD 'svm' flags in the level-1 guest (i.e. the guest hypervisor) - Ability to use 1GB huge pages with Haswell model as one use case for extra flags (thanks: Daniel Berrangé, for mentioning this scenario): cpu_model_extra_flags=Haswell
[Yahoo-eng-team] [Bug 1591971] Re: Glance task creates failed when setting work_dir local and qemu-img version is 1.5.3
Looks like this was fixed by configuration, closing. ** Changed in: glance Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1591971 Title: Glance task creates failed when setting work_dir local and qemu-img version is 1.5.3 Status in Glance: Invalid Bug description: The openstack version is mitaka. # rpm -qa |grep qemu-img qemu-img-1.5.3-105.el7_2.4.x86_64 The glance-api.conf setting is: [task] work_dir = /home/work/ [taskflow_executor] conversion_format = raw Then run the cli: glance task-create --type import --input '{"import_from":"http://10.43.177.17/cirros-0.3.2-x86_64-disk.img","import_from_format": "","image_properties":{"disk_format":"qcow2","container_format":"bare","name":"test1"}}' The log is : 2016-06-14 04:08:29.032 DEBUG oslo_concurrency.processutils [-] CMD "qemu-img info --output=json file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d" returned: 1 in 0.025s from (pid=5460) execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:374 2016-06-14 04:08:29.033 DEBUG oslo_concurrency.processutils [-] None command: u'qemu-img info --output=json file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d' exit code: 1 stdout: u'' stderr: u"qemu-img: Could not open 'file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d': Could not open 'file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d': No such file or directory\n" from (pid=5460) execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:413 2016-06-14 04:08:29.034 DEBUG oslo_concurrency.processutils [-] u'qemu-img info --output=json file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d' failed. Not Retrying. from (pid=5460) execute /usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:422 Command: qemu-img info --output=json file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d Exit code: 1 Stdout: u'' Stderr: u"qemu-img: Could not open 'file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d': Could not open 'file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d': No such file or directory\n" 2016-06-14 04:08:29.072 WARNING glance.async.taskflow_executor [-] Task 'import-ImportToFS-42684807-86db-4ff5-a4a9-abf3b1998b63' (5ff9cf63-f257-48d2-9cc9-cfeffd905854) transitioned into state 'FAILURE' from state 'RUNNING' 4 predecessors (most recent first): Flow 'import' |__Atom 'import-CreateImage-42684807-86db-4ff5-a4a9-abf3b1998b63' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {}, 'provides': '90ff2129-0079-487e-a7ec-79ef23bd1c0d'} |__Atom 'import_retry' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {}, 'provides': [(None, {})]} |__Flow 'import' 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor Traceback (most recent call last): 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor File "/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", line 82, in _execute_task 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor result = task.execute(**arguments) 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor File "/opt/stack/glance/glance/async/flows/base_import.py", line 175, in execute 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor metadata = json.loads(stdout) 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor return _default_decoder.decode(s) 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor File "/usr/lib64/python2.7/json/decoder.py", line 365, in decode 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor File "/usr/lib64/python2.7/json/decoder.py", line 383, in raw_decode 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor raise ValueError("No JSON object could be decoded") 2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor ValueError: No JSON object could be decoded To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1591971/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1620363] Re: report nginx or non-eventlet based strategies for deploying glance
This has been addressed by https://docs.openstack.org/glance/latest/admin/apache-httpd.html , which was first merged back in Pike. Of course it's always possible to improve the docs, so annakoppad feel free to put up improvement patches if you are still interested in this. ** Changed in: glance Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1620363 Title: report nginx or non-eventlet based strategies for deploying glance Status in Glance: Fix Released Bug description: Recently, more than a few people have asked about different ways a glance service can be deployed: * eventlet based * nginx based * other ways (like say using repose) It would be good to document this in our developer docs or specs and create a FAQ page in our launchpad project page for people to refer and then discuss further. To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1620363/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750555] Re: Revisit database rolling upgrade documentation
Reviewed: https://review.openstack.org/546172 Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=d500b3f883e94a3b82f313bffe6dbeb08d7ee1e4 Submitter: Zuul Branch:master commit d500b3f883e94a3b82f313bffe6dbeb08d7ee1e4 Author: Abhishek KekaneDate: Tue Feb 20 14:47:43 2018 + Revise database rolling upgrade documentation - mark zero-downtime-db-upgrade as EXPERIMENTAL for queens - clarify the relation between the E-M-C strategy and zero-downtime db upgrades - add note that for MySQL, using the glance-manage expand or glance-manage contract command requires that the glance is granted SUPER privileges - add note to contributor docs about checking the trigger flag in expand and contract scripts Co-authored-by: Abhishek Kekane Co-authored-by: Brian Rosmaita Change-Id: I5af4a1428b89ecb05a1be9c420c5f0afc05b9a95 Closes-Bug: #1750555 ** Changed in: glance Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1750555 Title: Revisit database rolling upgrade documentation Status in Glance: Fix Released Bug description: Since db_sync is now internally using EMC pattern we need to revisit the entire database rolling upgrades documentation. To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1750555/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1748900] Re: api-ref: value of custom property not limited to 255 chars
Reviewed: https://review.openstack.org/546021 Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=601f82ac24038a40dc48579fc3928b6e0f0373bf Submitter: Zuul Branch:master commit 601f82ac24038a40dc48579fc3928b6e0f0373bf Author: Brian RosmaitaDate: Mon Feb 19 22:11:44 2018 -0500 Correct length limit for custom property value The api-ref states that the both the key and value of a custom property are limited to 255 chars. This limit applies only to the key. Change-Id: I3bacca8b25f2a8339f6d8758e45c690da9968555 Closes-bug: #1748900 ** Changed in: glance Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1748900 Title: api-ref: value of custom property not limited to 255 chars Status in Glance: Fix Released Bug description: - [x] This doc is inaccurate in this way: __ https://developer.openstack.org/api-ref/image/v2/index.html#create-an- image Where it says: "Additionally, you may include additional properties specified as key:value pairs, where the value must be a string data type. Keys and values are limited to 255 chars in length. Available key names may be limited by the cloud’s property protection configuration." The 255 char length restriction is only for keys, not values: https://github.com/openstack/glance/blob/265659e8c34865331568b069fdb27ea272df4eaa/glance/db/sqlalchemy/models.py#L158 --- Release: 16.0.0.0rc2.dev10 on 'Sat Feb 10 21:15:25 2018, commit 262e61a' SHA: Source: https://git.openstack.org/cgit/openstack/glance/tree/api-ref/source/v2/index.rst URL: https://developer.openstack.org/api-ref/image/v2/index.html To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1748900/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1749788] Re: image import: uri filtering conf opts help text needs revision
Reviewed: https://review.openstack.org/546020 Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=d289d0d17b4e2ace80c74be80d768a3820a9da62 Submitter: Zuul Branch:master commit d289d0d17b4e2ace80c74be80d768a3820a9da62 Author: Brian RosmaitaDate: Mon Feb 19 21:55:16 2018 -0500 Revise help text for uri filtering options Clarify the help text and clean up some log messages. Includes the regenerated glance-image-import.conf.sample file. Change-Id: I7f9087aaf9c6969e15f63029cc38fe5a0939ad40 Closes-bug: #1749788 ** Changed in: glance Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1749788 Title: image import: uri filtering conf opts help text needs revision Status in Glance: Fix Released Bug description: The six whitelist/blacklist options are not easy to explain individually, so the help text could use a revision. See the original patch for some questions people had, and see Sean's comments on the cherry-pick patch for some stylistic stuff that should also be corrected. https://review.openstack.org/#/q/Ide5ace8979bb12239c99a312747b3151c1e64ce8 To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1749788/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1748229] Re: revise api-ref: add info about web-download import-method
Reviewed: https://review.openstack.org/545629 Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=4cf65d57952cb5a85973822cf9d27fe97eba18b4 Submitter: Zuul Branch:master commit 4cf65d57952cb5a85973822cf9d27fe97eba18b4 Author: Brian RosmaitaDate: Sat Feb 17 16:29:51 2018 -0500 api-ref: update interoperable image import info Generalizes the discussion to include the new web-download import method and includes a new sample import request. Change-Id: Icb6cd920f31c6e8e4eecf17880dd3244e5d1a61b Closes-bug: #1748229 ** Changed in: glance Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1748229 Title: revise api-ref: add info about web-download import-method Status in Glance: Fix Released Bug description: Need to describe the web-download workflow. See TODOs in api- ref/source/v2/images-import.inc To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1748229/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750811] [NEW] resize command should work without ssh
Public bug reported: Description === VM resizing should work without ssh connections between compute nodes if they have the same shared storage. One option is to pass an additional argument if it's a resize on a shared storage. The same functionality is already implemented for vm migrations. Expected result === Resizing on shared storage works without ssh connections between compute nodes. Actual result = Resizing fails, because it requires an ssh connection between the source and target host. Environment === 1. Openstack version: nova-api 2:16.0.3-0ubuntu1~cloud0 nova-common 2:16.0.3-0ubuntu1~cloud0 nova-conductor 2:16.0.3-0ubuntu1~cloud0 nova-consoleauth 2:16.0.3-0ubuntu1~cloud0 nova-novncproxy 2:16.0.3-0ubuntu1~cloud0 nova-placement-api 2:16.0.3-0ubuntu1~cloud0 nova-scheduler 2:16.0.3-0ubuntu1~cloud0 python-nova 2:16.0.3-0ubuntu1~cloud0 python-novaclient 2:9.1.0-0ubuntu1~cloud0 2. Hypervisor: Libvirt + KVM ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750811 Title: resize command should work without ssh Status in OpenStack Compute (nova): New Bug description: Description === VM resizing should work without ssh connections between compute nodes if they have the same shared storage. One option is to pass an additional argument if it's a resize on a shared storage. The same functionality is already implemented for vm migrations. Expected result === Resizing on shared storage works without ssh connections between compute nodes. Actual result = Resizing fails, because it requires an ssh connection between the source and target host. Environment === 1. Openstack version: nova-api 2:16.0.3-0ubuntu1~cloud0 nova-common 2:16.0.3-0ubuntu1~cloud0 nova-conductor 2:16.0.3-0ubuntu1~cloud0 nova-consoleauth 2:16.0.3-0ubuntu1~cloud0 nova-novncproxy 2:16.0.3-0ubuntu1~cloud0 nova-placement-api 2:16.0.3-0ubuntu1~cloud0 nova-scheduler 2:16.0.3-0ubuntu1~cloud0 python-nova 2:16.0.3-0ubuntu1~cloud0 python-novaclient 2:9.1.0-0ubuntu1~cloud0 2. Hypervisor: Libvirt + KVM To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1750811/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750790] [NEW] resources on target host aren't released if resize fails
Public bug reported: Description === If we try to resize a host to a new flavor and resize fails due to missing ssh host-keys. Then the resources aren't released on the target host. Steps to reproduce == 1. Check output from placement API Output from placement API before resize command: HOST A: { "resource_provider_generation": 997, "usages": { "DISK_GB": 0, "MEMORY_MB": 495616, "VCPU": 84 } } HOST B: { "resource_provider_generation": 33, "usages": { "DISK_GB": 0, "MEMORY_MB": 221184, "VCPU": 40 } } 2. Try to resize host and check resources from placement API: This is the output after resize (to flavor with 24GB and 12CPUs) failed: HOST B: { "resource_provider_generation": 33, "usages": { "DISK_GB": 0, "MEMORY_MB": 245760, "VCPU": 52 } } 3. Delete VM an check resources again After deleting the VM the resources have been released (on source and target host). Expected result === If resizing fails resources must be released on target host. Actual result = Resources aren't released on target host. Environment === 1. Openstack version: nova-api 2:16.0.3-0ubuntu1~cloud0 nova-common 2:16.0.3-0ubuntu1~cloud0 nova-conductor 2:16.0.3-0ubuntu1~cloud0 nova-consoleauth 2:16.0.3-0ubuntu1~cloud0 nova-novncproxy 2:16.0.3-0ubuntu1~cloud0 nova-placement-api 2:16.0.3-0ubuntu1~cloud0 nova-scheduler 2:16.0.3-0ubuntu1~cloud0 python-nova 2:16.0.3-0ubuntu1~cloud0 python-novaclient 2:9.1.0-0ubuntu1~cloud0 2. Hypervisor: Libvirt + KVM ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1750790 Title: resources on target host aren't released if resize fails Status in OpenStack Compute (nova): New Bug description: Description === If we try to resize a host to a new flavor and resize fails due to missing ssh host-keys. Then the resources aren't released on the target host. Steps to reproduce == 1. Check output from placement API Output from placement API before resize command: HOST A: { "resource_provider_generation": 997, "usages": { "DISK_GB": 0, "MEMORY_MB": 495616, "VCPU": 84 } } HOST B: { "resource_provider_generation": 33, "usages": { "DISK_GB": 0, "MEMORY_MB": 221184, "VCPU": 40 } } 2. Try to resize host and check resources from placement API: This is the output after resize (to flavor with 24GB and 12CPUs) failed: HOST B: { "resource_provider_generation": 33, "usages": { "DISK_GB": 0, "MEMORY_MB": 245760, "VCPU": 52 } } 3. Delete VM an check resources again After deleting the VM the resources have been released (on source and target host). Expected result === If resizing fails resources must be released on target host. Actual result = Resources aren't released on target host. Environment === 1. Openstack version: nova-api 2:16.0.3-0ubuntu1~cloud0 nova-common 2:16.0.3-0ubuntu1~cloud0 nova-conductor 2:16.0.3-0ubuntu1~cloud0 nova-consoleauth 2:16.0.3-0ubuntu1~cloud0 nova-novncproxy 2:16.0.3-0ubuntu1~cloud0 nova-placement-api 2:16.0.3-0ubuntu1~cloud0 nova-scheduler 2:16.0.3-0ubuntu1~cloud0 python-nova 2:16.0.3-0ubuntu1~cloud0 python-novaclient 2:9.1.0-0ubuntu1~cloud0 2. Hypervisor: Libvirt + KVM To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1750790/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750780] [NEW] Race with local file systems can make open-vm-tools fail to start
Public bug reported: Since the change in [1] open-vm-tools-service starts very (very) early. Not so much due to the Before=cloud-init-local.service But much more by DefaultDependencies=no That can trigger an issue that looks like root@ubuntuguest:~# systemctl status -l open-vm-tools.service ● open-vm-tools.service - Service for virtual machines hosted on VMware Loaded: loaded (/lib/systemd/system/open-vm-tools.service; enabled; vendor preset: enabled) Active: failed (Result: resources) As it is right now open-vm-tools can race with the other early start and then fail. In detail one can find a message like: open-vm-tools.service: Failed to run 'start' task: Read-only file system" This is due to privtaeTmp=yes which is also set needing a writable /var/tmp [2] To ensure this works PrivateTmp would have to be removed (not good) or some after dependencies added that make this work reliably. I added After=local-fs.target which made it work for me in 3/3 tests. I' like to have an ack by the cloud-init Team that this does not totally kill the originally intended Before=cloud-init-local.service I think it does not as local-fs can complete before cloud-init-local, then open-vm-tools can initialize and finally cloud-init-local can pick up the data. To summarize: # cloud-init-local # DefaultDependencies=no Wants=network-pre.target After=systemd-remount-fs.service Before=NetworkManager.service Before=network-pre.target Before=shutdown.target Before=sysinit.target Conflicts=shutdown.target RequiresMountsFor=/var/lib/cloud # open-vm-tools # DefaultDependencies=no Before=cloud-init-local.service Proposed is to add to the latter: After=local-fs.target [1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859677 [2]: https://github.com/systemd/systemd/issues/5610 ** Affects: cloud-init Importance: Undecided Status: New ** Affects: open-vm-tools (Ubuntu) Importance: High Status: Triaged ** Also affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1750780 Title: Race with local file systems can make open-vm-tools fail to start Status in cloud-init: New Status in open-vm-tools package in Ubuntu: Triaged Bug description: Since the change in [1] open-vm-tools-service starts very (very) early. Not so much due to the Before=cloud-init-local.service But much more by DefaultDependencies=no That can trigger an issue that looks like root@ubuntuguest:~# systemctl status -l open-vm-tools.service ● open-vm-tools.service - Service for virtual machines hosted on VMware Loaded: loaded (/lib/systemd/system/open-vm-tools.service; enabled; vendor preset: enabled) Active: failed (Result: resources) As it is right now open-vm-tools can race with the other early start and then fail. In detail one can find a message like: open-vm-tools.service: Failed to run 'start' task: Read-only file system" This is due to privtaeTmp=yes which is also set needing a writable /var/tmp [2] To ensure this works PrivateTmp would have to be removed (not good) or some after dependencies added that make this work reliably. I added After=local-fs.target which made it work for me in 3/3 tests. I' like to have an ack by the cloud-init Team that this does not totally kill the originally intended Before=cloud-init-local.service I think it does not as local-fs can complete before cloud-init-local, then open-vm-tools can initialize and finally cloud-init-local can pick up the data. To summarize: # cloud-init-local # DefaultDependencies=no Wants=network-pre.target After=systemd-remount-fs.service Before=NetworkManager.service Before=network-pre.target Before=shutdown.target Before=sysinit.target Conflicts=shutdown.target RequiresMountsFor=/var/lib/cloud # open-vm-tools # DefaultDependencies=no Before=cloud-init-local.service Proposed is to add to the latter: After=local-fs.target [1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859677 [2]: https://github.com/systemd/systemd/issues/5610 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1750780/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750777] [NEW] openvswitch agent eating CPU, time spent in ip_conntrack.py
Public bug reported: We just ran into a case where the openvswitch agent (local dev destack, current master branch) eats 100% of CPU time. Pyflame profiling show the time being largely spent in neutron.agent.linux.ip_conntrack, line 95. https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_conntrack.py#L95 The code around this line is: while True: pool.spawn_n(self._process_queue) The documentation of eventlet.spawn_n says: "The same as spawn(), but it’s not possible to know how the function terminated (i.e. no return value or exceptions). This makes execution faster. See spawn_n for more details." I suspect that GreenPool.spaw_n may behave similarly. It seems plausible that spawn_n is returning very quickly because of some error, and then all time is quickly spent in a short circuited while loop. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1750777 Title: openvswitch agent eating CPU, time spent in ip_conntrack.py Status in neutron: New Bug description: We just ran into a case where the openvswitch agent (local dev destack, current master branch) eats 100% of CPU time. Pyflame profiling show the time being largely spent in neutron.agent.linux.ip_conntrack, line 95. https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_conntrack.py#L95 The code around this line is: while True: pool.spawn_n(self._process_queue) The documentation of eventlet.spawn_n says: "The same as spawn(), but it’s not possible to know how the function terminated (i.e. no return value or exceptions). This makes execution faster. See spawn_n for more details." I suspect that GreenPool.spaw_n may behave similarly. It seems plausible that spawn_n is returning very quickly because of some error, and then all time is quickly spent in a short circuited while loop. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1750777/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750770] [NEW] installing cloud init in vmware breaks ubuntu user
Public bug reported: When installing cloud-init in vmware without any setup for user/vendor data it breaks the ubuntu user. Steps to reproduce: 1. take vmwre (free 30 days is fine) 2. install xenial (maybe newer as well but my case was xenial) 3. set up your user to be ubuntu/ubuntu (through the vmware fast installer) # you now have a working system # no user/vendor data provider was set up (unless vmware did some internally) 4. install cloud-init 5. reboot # on reboot I see the cloud init vmware data gatherer timing out (fine as expected) # But after that I can't login anymore, so it seems it changed the user This came up in debugging another issue - so there is a chance I messed the service dependencies up enough to trigger this :-/ (we need to check that) Sorry, this sucks at getting logs and since I can't login anymore ... I'll have to setup a new system with a second user to use to take a look. ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1750770 Title: installing cloud init in vmware breaks ubuntu user Status in cloud-init: New Bug description: When installing cloud-init in vmware without any setup for user/vendor data it breaks the ubuntu user. Steps to reproduce: 1. take vmwre (free 30 days is fine) 2. install xenial (maybe newer as well but my case was xenial) 3. set up your user to be ubuntu/ubuntu (through the vmware fast installer) # you now have a working system # no user/vendor data provider was set up (unless vmware did some internally) 4. install cloud-init 5. reboot # on reboot I see the cloud init vmware data gatherer timing out (fine as expected) # But after that I can't login anymore, so it seems it changed the user This came up in debugging another issue - so there is a chance I messed the service dependencies up enough to trigger this :-/ (we need to check that) Sorry, this sucks at getting logs and since I can't login anymore ... I'll have to setup a new system with a second user to use to take a look. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1750770/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750205] Re: image import: 500 for web-download import-method
Reviewed: https://review.openstack.org/545649 Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=156ba81c2fad2844af1ad21b24c771cf66522932 Submitter: Zuul Branch:master commit 156ba81c2fad2844af1ad21b24c771cf66522932 Author: Brian RosmaitaDate: Sat Feb 17 23:48:18 2018 -0500 Fix config group not found error Two parts to this fix: * add a call to oslo.config.cfg.import_group so that the function that checks a uri against the configured white/blacklists can access them * move the location where these options are defined into the module's __init__ so that they can be imported without causing a circular import (which happens if you import them from their current location) Change-Id: I6363faba0c4cbe75e6e4d0cbf0209a62c10474ef Closes-bug: #1750205 ** Changed in: glance Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1750205 Title: image import: 500 for web-download import-method Status in Glance: Fix Released Bug description: This is in the log: Feb 17 23:18:00 br-virtual-machine glance-api[22952]: ERROR glance.common.wsgi [None req-59d8c68b-8fc9-4b04-9215-6f64abd55532 demo demo] Caught error: no such option import_filtering_opts in group [DEFAULT]: NoSuchOptError: no such option import_filtering_opts in group [DEFAULT] Pretty sure the problem is that when the uri validating function was moved to common.utils, the import filtering options are no longer guaranteed to be registered at the point when the request hits the ImagesController. To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1750205/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1749640] Re: db sync fails for mysql while adding triggers
Reviewed: https://review.openstack.org/544792 Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=14e8a7b53ba7ee6e6c3b9265c819bd9acc5274a1 Submitter: Zuul Branch:master commit 14e8a7b53ba7ee6e6c3b9265c819bd9acc5274a1 Author: Abhishek KekaneDate: Tue Feb 20 15:32:00 2018 + Triggers shouldn't be execute in offline migration Recently this change [1] in glance-manage db_sync is internally using Expand, Migrate and Contract. EMC is explicitly used for online migration for which glance uses triggers to sync data between old columns and new columns. DB Sync is used for offline migartion for which adding triggers is not required. Made provision to execute triggers explicitly in case of online migration (EMC pattern) and skip the same in case of offline migration (db sync). [1] https://review.openstack.org/#/c/433934/ Closes-Bug: #1749640 Change-Id: I816c73405dd61d933182ad5efc24445a0add4eea ** Changed in: glance Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1749640 Title: db sync fails for mysql while adding triggers Status in Glance: Fix Released Bug description: glance-manage db sync fails while adding triggers to the database table with error. Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 os-collect-config[2239]: "DBError: (pymysql.err.InternalError) (1419, u'You do not have the SUPER privilege and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable)') [SQL: u\"\\nCREATE TRIGGER insert_visibility BEFORE INSERT ON images\\nFOR EACH ROW\\nBEGIN\\n-- NOTE(abashmak):\\n-- The following IF/ELSE block implements a priority decision tree.\\n-- Strict order MUST be followed to correctly cover all the edge cases.\\n\\n-- Edge case: neither is_public nor visibility specified\\n--(or both specified as NULL):\\nIF NEW.is_public <=> NULL AND NEW.visibility <=> NULL THEN\\n SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid visibility value';\\n-- Edge case: both is_public and visibility specified:\\nELSEIF NOT(NEW.is_public <=> NULL OR NEW.visibility <=> NULL) THEN\\nSIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid visibility value';\\n-- Inserting with is_public, set visibility accordingly:\\nELSEIF NOT NEW.is_public <=> NULL THEN\\nIF NEW.is_public = 1 THEN\\nSET NEW.visibility = 'public';\\nELSE\\nSET NEW.visibility = 'shared';\\nEND IF;\\n-- Inserting with visibility, set is_public accordingly:\\nELSEIF NOT NEW.visibility <=> NULL THEN\\nIF NEW.visibility = 'public' THEN\\n SET NEW.is_public = 1;\\nELSE\\nSET NEW.is_public = 0;\\nEND IF;\\n-- Edge case: either one of: is_public or visibility,\\n--is explicitly set to NULL:\\n ELSE\\nSIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid visibility value';\\nEND IF;\\nEND;\\n\"] (Background on this error at: http://sqlalche.me/e/2j85)", The reason is for MySQL, using the glance-manage db_sync or glance-manage expand command requires that you either grant your glance user SUPER privileges, or run set global log_bin_trust_function_creators=1; in mysql beforehand. Actual logs: Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 os-collect-config[2239]: "+++ [[ -n 0 ]]", Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 os-collect-config[2239]: "+++ glance-manage db_sync", Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 os-collect-config[2239]: "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:1334: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade", Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 os-collect-config[2239]: " expire_on_commit=expire_on_commit, _conf=conf)", Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 os-collect-config[2239]: "INFO [alembic.runtime.migration] Context impl MySQLImpl.", Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 os-collect-config[2239]: "INFO [alembic.runtime.migration] Will assume non-transactional DDL.", Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 os-collect-config[2239]: "INFO [alembic.runtime.migration] Running upgrade -> liberty, liberty initial", Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 os-collect-config[2239]: "INFO [alembic.runtime.migration] Running upgrade liberty -> mitaka01, add index on created_at and updated_at columns of 'images'