[Yahoo-eng-team] [Bug 2056377] Re: Create Role button is not visible for default admin user
Mistakenly marked openstack dashboard as not affected. Needs to be further analyzed. The charm is configuring the backend settings to allow for role edits, but needs further analysis. ** Also affects: charm-openstack-dashboard Importance: Undecided Status: New ** Changed in: charm-openstack-dashboard Status: New => Triaged ** Changed in: charm-openstack-dashboard Importance: Undecided => Medium -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/2056377 Title: Create Role button is not visible for default admin user Status in OpenStack Dashboard Charm: Triaged Status in OpenStack Dashboard (Horizon): Confirmed Bug description: When I navigate to the Identity->Roles page as the admin user, I see a list of the roles in the cluster, but no buttons to create/modify/delete the roles. I am using a charmed install of a Yoga cluster, and the default admin user (username: admin, domain: admin_domain, default keystone policies and default horizon policies, that is no overrides). I can create/modify/delete roles from the CLI. I thought this might have been a subtlety of the policy rules, so I tested including some Horizon policy overrides, which open up the policies to any user (setting every policy listed in the keystone/identity folders to "@"). This didn't change anything. To verify I was setting the policies correctly, I tested "identity:create_user": "!" and the "Create User" button disappeared in Horizon. This bug seems similar to [1] however I am using the default admin "power user" instead of adding domain roles. [1] https://bugs.launchpad.net/horizon/+bug/1775227 To manage notifications about this bug go to: https://bugs.launchpad.net/charm-openstack-dashboard/+bug/2056377/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1452641] Re: Static Ceph mon IP addresses in connection_info can prevent VM startup
This is not a charm bug, its a limitation/bug in the way that nova handles the BDM devices. ** Changed in: nova Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1452641 Title: Static Ceph mon IP addresses in connection_info can prevent VM startup Status in OpenStack Compute (nova): Invalid Status in nova package in Ubuntu: Triaged Bug description: The Cinder rbd driver extracts the IP addresses of the Ceph mon servers from the Ceph mon map when the instance/volume connection is established. This info is then stored in nova's block-device-mapping table and is never re-validated down the line. Changing the Ceph mon servers' IP adresses will prevent the instance from booting as the stale connection info will enter the instance's XML. One idea to fix this would be to use the information from ceph.conf, which should be an alias or a loadblancer, directly. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1452641/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2026284] Re: virtio-net-tx-queue-size reflects in nova conf but not for the vm even after a hard reboot
This does not appear to be a charm issue, but rather it appears to potentially be a nova issue. I can confirm that setting the rx_queue_size and tx_queue_size results in the nova.conf file being updated by the charm, but that the resulting hard rebooted guest does not get the tx_queue_size, only the rx_queue_size. ** Also affects: nova Importance: Undecided Status: New ** Changed in: nova Status: New => Incomplete -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2026284 Title: virtio-net-tx-queue-size reflects in nova conf but not for the vm even after a hard reboot Status in OpenStack Nova Compute Charm: New Status in OpenStack Compute (nova): Incomplete Bug description: After modifying the nova compute config options, - virtio-net-rx-queue-size=512 - virtio-net-tx-queue-size=512 I hard rebooted my vm and spawned a new vm and what I see (on both of them) is: - virsh xml ``` # virsh dumpxml 2 | grep -i queue ``` - nova.conf ``` # grep -i queue /etc/nova/nova.conf tx_queue_size = 512 rx_queue_size = 512 ``` - inside the vm ``` root@jammy-135110:~# ethtool -g ens2 Ring parameters for ens2: Pre-set maximums: RX: 512 RX Mini:n/a RX Jumbo: n/a TX: 256 Current hardware settings: RX: 512 RX Mini:n/a RX Jumbo: n/a TX: 256 ``` The RX config gets propagated, but the TX config does not Please let me know if any more information is needed. -- env: - focal ussuri - nova-compute: charm: nova-compute channel: ussuri/stable revision: 669 - this is a freshly deployed openstack on vms (not on baremetal) - libvirt: 6.0.0-0ubuntu8.16 - nova-compute-libvirt 21.2.4-0ubuntu2.5 - qemu 4.2-3ubuntu6.27 To manage notifications about this bug go to: https://bugs.launchpad.net/charm-nova-compute/+bug/2026284/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1993628] Re: Designate synchronisation inconsistensies with Neutron-API
Agree that this likely isn’t a charm issue. I’ll mark invalid for now, but feel free to reopen if evidence suggests otherwise. ** Changed in: charm-designate Status: New => Invalid ** Changed in: charm-designate Status: Invalid => Incomplete -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1993628 Title: Designate synchronisation inconsistensies with Neutron-API Status in OpenStack Designate Charm: Incomplete Status in neutron: New Bug description: When setting a network to use automatically a dns-domain, some inconsistensies were observed when deleting and recreating new instances sharing the same names and associating them to the same floating IPs from before. This has been reproduced on : * Focal Ussuri (Neutron-api and Designate charms with Ussuri/edge branch) * Focal Yoga (Neutron-api and Designate charms with Yoga/stable branch) Reproducible steps : * create a domain zone with "openstack zone create" * configure an existing self-service with the newly created domain "openstack network set --dns-domain ..." * create a router on the self-service network with an external gateway on provider network * create an instance on self-service network * create a floating ip address on provider network * associate floating ip to instance --> the DNS entry gets created * delete the instance *WITH* the floating ip still attached --> the DNS entry is deleted * recreate a new instance with exactly the *same* name and re-use the *same* floating ip --> the DNS entry doesn't get created --> it doesn't seem to be related to TTL, since this makes the issue permanent even after a day of testing when TTL is set by default to 1 hour Worse inconsistensies can be seen when, instead of deleting an instance, moving the floating ip directly to another instance * have 2 instances vm-1 and vm-2 * attach floating ip to vm-1 "openstack server add floating ip XXX vm-1" --> the DNS entry is created * attach the same floating ip to vm-2 ""openstack server add floating ip XXX vm-2" (this is permitted by CLI and simply move the fIP to vm-2) --> the DNS entry still use vm-1, vm-2 doesn't get created When you combine these 2 issues, you can be left with either false records being kept or automatic records failing silently to be created. Workaround : * either always remove floating ip *before* deleting an instance or * remove floating ip on instance * then re-add floating ip on instance Eventually when deleting the floating ip to reassign it, we are gratified with this error on neutron-api unit (on Ussuri but the error is similar on Yoga) : 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db [req-e6d270d2-fbde-42d7-a75b-2c8a67c42fcb 2dc4151f6dba4c3e8ba8537c9c354c13 f548268d5255424591baa8783f1cf277 - 6a71047e7d7f4e01945ec58df06ae63f 6a71047e7d7f4e01945ec58df06ae63f] Error deleting Floating IP data from external DNS service. Name: 'vm-2'. Domain: 'compute.stack.vpn.'. IP addresses '192.168.21.217'. DNS service driver message 'Name vm-2.compute.stack.vpn. is duplicated in the external DNS service': neutron_lib.exceptions.dns.DuplicateRecordSet: Name vm-2.compute.stack.vpn. is duplicated in the external DNS service 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db Traceback (most recent call last): 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db File "/usr/lib/python3/dist-packages/neutron/db/dns_db.py", line 214, in _delete_floatingip_from_external_dns_service 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db self.dns_driver.delete_record_set(context, dns_domain, dns_name, 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db File "/usr/lib/python3/dist-packages/neutron/services/externaldns/drivers/designate/driver.py", line 172, in delete_record_set 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db ids_to_delete = self._get_ids_ips_to_delete( 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db File "/usr/lib/python3/dist-packages/neutron/services/externaldns/drivers/designate/driver.py", line 200, in _get_ids_ips_to_delete 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db raise dns_exc.DuplicateRecordSet(dns_name=name) 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db neutron_lib.exceptions.dns.DuplicateRecordSet: Name vm-2.compute.stack.vpn. is duplicated in the external DNS service 2022-10-19 02:24:12.497 67548 ERROR neutron.db.dns_db To manage notifications about this bug go to: https://bugs.launchpad.net/charm-designate/+bug/1993628/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1880828] Re: New instance is always in "spawning" status
Marking charm tasks as invalid on this particular bug as these aren't related to the charms and were chased down to other components. ** Changed in: charm-nova-compute Status: New => Invalid ** Changed in: openstack-bundles Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1880828 Title: New instance is always in "spawning" status Status in OpenStack Nova Compute Charm: Invalid Status in OpenStack Compute (nova): Triaged Status in OpenStack Bundles: Invalid Bug description: bundle: openstack-base-bionic-train https://github.com/openstack-charmers/openstack-bundles/blob/master/development/openstack-base-bionic-train/bundle.yaml hardware: 2 d05 and 2 d06 (the log of the compute node is from one of the d06. Please note they are arm64 arch.) When trying to create new instances on the deployed openstack, the instance is always in the status of "spawning" [Steps to Reproduce] 1. Deploy with the above bundle and hardware by following the instruction of https://jaas.ai/openstack-base/bundle/67 2. Wait about 1.5 until the deployment is ready. By ready it means every unit shows its message as "ready" e.g. https://paste.ubuntu.com/p/k48YVnPyVZ/ 3. Follow the instruction of https://jaas.ai/openstack-base/bundle/67 until the step of "openstack server create" to create new instance. This step is also summarized in details in this gist code snippet https://gist.github.com/tai271828/b0c00a611e703046dd52da12a66226b0#file-02-basic-test-just-deployed-sh [Expected Behavior] An instance is created a few seconds later [Actual Behavior] The status of the instance is always (> 20 minutes) "spawning" [Additional Information] 1. [workaround] Use `ps aux | grep qemu-img` to check if a qemu-img image converting process exists or not. The process should complete within ~20 sec. If the process exists for more than 1 minutes, use `pkill -f qemu-img` to terminate the process and re-create instances again. The image converting process looks like this one: ``` qemu-img convert -t none -O raw -f qcow2 /var/lib/nova/instance s/_base/9b8156fbecaa194804a637226c8ffded93a57489.part /var/lib/nova/instances/_base/9b8156fbecaa194804a637226c8ffded93a57489.converted ``` 2. By investing in more details, this issue is a coupled issue of 1) nova should timeout instance process (comment#21) 2) qemu does not terminate the process to convert the image successfully (comment#20) To manage notifications about this bug go to: https://bugs.launchpad.net/charm-nova-compute/+bug/1880828/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1892361] Re: SRIOV instance gets type-PF interface, libvirt kvm fails
Queens and Rocky are both extended maintenance and have had the proposed patches merged. Updating tasks to mark as fix released. ** Changed in: nova/rocky Status: New => Fix Released ** Changed in: nova/queens Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1892361 Title: SRIOV instance gets type-PF interface, libvirt kvm fails Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive queens series: Fix Released Status in Ubuntu Cloud Archive rocky series: Fix Released Status in Ubuntu Cloud Archive stein series: Fix Released Status in Ubuntu Cloud Archive train series: Fix Released Status in Ubuntu Cloud Archive ussuri series: Fix Released Status in Ubuntu Cloud Archive victoria series: Fix Released Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) queens series: Fix Released Status in OpenStack Compute (nova) rocky series: Fix Released Status in OpenStack Compute (nova) stein series: Fix Committed Status in OpenStack Compute (nova) train series: Fix Released Status in OpenStack Compute (nova) ussuri series: Fix Released Status in OpenStack Compute (nova) victoria series: Fix Released Status in nova package in Ubuntu: Fix Released Status in nova source package in Bionic: Fix Released Status in nova source package in Focal: Fix Released Status in nova source package in Groovy: Fix Released Status in nova source package in Hirsute: Fix Released Bug description: When spawning an SR-IOV enabled instance on a newly deployed host, nova attempts to spawn it with an type-PF pci device. This fails with the below stack trace. After restarting neutron-sriov-agent and nova-compute services on the compute node and spawning an SR-IOV instance again, a type-VF pci device is selected, and instance spawning succeeds. Stack trace: 2020-08-20 08:29:09.558 7624 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 6db8011e6ecd4fd0aaa53c8f89f08b1b __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:400 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [req-e3e49d07-24c6-4c62-916e-f830f70983a2 ddcfb3640535428798aa3c8545362bd4 dd99e7950a5b46b5b924ccd1720b6257 - 015e4fd7db304665ab5378caa691bb8b 015e4fd7db304665ab5378caa691bb8b] [insta nce: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Instance failed to spawn: libvirtError: unsupported configuration: Interface type hostdev is currently supported on SR-IOV Virtual Functions only 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] Traceback (most recent call last): 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2274, in _build_resources 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] yield resources 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] block_device_info=block_device_info) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3147, in spawn 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure=True) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5651, in _create_domain_and_network 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] destroy_disks_on_failure) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] self.force_reraise() 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-fe88-4020-9a9e-f4c437c6de11] six.reraise(self.type_, self.value, self.tb) 2020-08-20 08:29:09.561 7624 ERROR nova.compute.manager [instance: 9498ea75-f
[Yahoo-eng-team] [Bug 1888395] Re: live migration of a vm using the single port binding work flow is broken in train as a result of the introduction of sriov live migration
** Description changed: - it was working in queens but fails in train. nova compute at the target - aborts with the exception: + [Impact] + + Live migration of instances in an environment that uses neutron backends + that do not support multiple port bindings will fail with error + 'NotImplemented', effectively rendering live-migration inoperable in + these environments. + + This is fixed by first checking to ensure the backend supports the + multiple port bindings before providing the port bindings. + + [Test Plan] + + 1. deploy a Train/Ussuri OpenStack cloud w/ at least 2 compute nodes + using an SDN that does not support multiple port bindings (e.g. + opencontrail). + + 2. Attempt to perform a live migration of an instance. + + 3. Observe that the live migration will fail without this fix due to the + trace below (NotImplementedError: Cannot load 'vif_type' in the base + class), and should succeed with this fix. + + + [Where problems could occur] + + This affects the live migration code, so likely problems would arise in + this area. Specifically, the check introduced is guarding information + provided for instances using SR-IOV indirect migration. + + Regressions would likely occur in the form of live migration errors + around features that rely on the multiple port bindings (e.g. the SR- + IOV) and not the more generic/common use case. Errors may be seen in + standard network providers that are included with distro packaging, but + may also be seen in scenarios where proprietary SDNs are used. + + + [Original Description] + it was working in queens but fails in train. nova compute at the target aborts with the exception: Traceback (most recent call last): - File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming - res = self.dispatcher.dispatch(message) - File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch - return self._do_dispatch(endpoint, method, ctxt, args) - File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch - result = func(ctxt, **new_args) - File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 79, in wrapped - function_name, call_dict, binary, tb) - File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ - self.force_reraise() - File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise - six.reraise(self.type_, self.value, self.tb) - File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 69, in wrapped - return f(self, context, *args, **kw) - File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1372, in decorated_function - return function(self, context, *args, **kwargs) - File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 219, in decorated_function - kwargs['instance'], e, sys.exc_info()) - File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__self.force_reraise() - File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise - six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 207, in decorated_function - return function(self, context, *args, **kwargs) - File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7007, in pre_live_migration - bdm.save() - File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ - self.force_reraise() - File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise - six.reraise(self.type_, self.value, self.tb) - File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6972, in pre_live_migration - migrate_data) - File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 9190, in pre_live_migration - instance, network_info, migrate_data) - File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 9071, in _pre_live_migration_plug_vifs - vif_plug_nw_info.append(migrate_vif.get_dest_vif()) - File "/usr/lib/python2.7/site-packages/nova/objects/migrate_data.py", line 90, in get_dest_vif - vif['type'] = self.vif_type - File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 67, in getter - self.obj_load_attr(name) - File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 603, in obj_load_attr - _("Cannot load '%s' in the base class") % attrname) + File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming + res = self.dispatcher.dispatch(message) + File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 274, in dispatch + return self._do_dispatch(endpoint, method, ctxt, args) + Fi
[Yahoo-eng-team] [Bug 1915318] Re: User list cannot be retrieved when pointing user_tree_dn at top level of the root domain
Further discussion with Jeff indicated that replacing the { and } with ( and ) resolved the issue. ** Changed in: keystone Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1915318 Title: User list cannot be retrieved when pointing user_tree_dn at top level of the root domain Status in OpenStack Identity (keystone): Invalid Bug description: Windows AD, functional level Windows Server 2012 R2 Focal + Ussuri keystone-ldap-31 Using ldap-config-flags of: ``` ldap-config-flags: "{ user_tree_dn: 'DC=example,DC=org', query_scope: sub, user_objectclass: person, user_id_attribute: cn, user_filter: '{|(memberOf=CN=OpenStackAdmins,OU=OpenStack,OU=Groups,DC=example,DC=org)(memberOf=CN=OpenStackUsers,OU=OpenStack,OU=Groups,DC=example,DC=org)}', user_name_attribute: sAMAccountName, user_mail_attribute: mail, user_pass_attribute: '', user_description_attribute: displayName, user_enabled_attribute: userAccountControl, user_enabled_mask: 2, user_enabled_invert: false, user_enabled_default: 512, group_tree_dn: 'OU=OpenStack,OU=Groups,DC=example,DC=org', group_objectclass: group, group_id_attribute: cn, group_name_attribute: sAMAccountName, group_member_attribute: member, }" ``` The user list cannot be retrieved, but the group list can. Horizon shows an error of "Unable to retrieve user list" Running `openstack user list --domain example.org` shows "Internal Server Error (HTTP 500)" In this scenario. There are 2 sets of users that customer wants to have access to this openstack environment. There are no logs in /var/log/keystone/keystone.log when this error occurs The DN's for those 2 different User trees are: OU=AdminUsers,DC=example,DC=com and OU=Users,DC=example,DC=com As can be seen, both OU's are off of the root domain, and don't share a common tree, other than the root. When the user_dn_tree is changed to `OU=AdminUsers,DC=example,DC=com` then users in that User tree can log in, and show up in the user list, but the users from OU=Users,DC=example,DC=com do not. and Vice-Versa To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1915318/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1912513] Re: Port creation fails with error IP already allocated but the IP is available
Since this doesn't appear to be an issue with the charms, I'm going to remove the project from being affected by this bug and the field critical designation. However, feel free to re-add it should evidence present itself otherwise. ** No longer affects: charm-neutron-api -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1912513 Title: Port creation fails with error IP already allocated but the IP is available Status in neutron: Incomplete Bug description: Description: = When trying to create a new port using an available IP in the allocation pool of a VLAN neutron network, creation fails with error: IP address 10.41.8.3 already allocated in subnet afb678c6-a152-4f1d-8d77-03b9167520cc Precondition: = A port using the same IP was previously created and then deleted. How to reproduce: = I have the following network: $ openstack network show e30b938b-210d-45c2-894c-95c0c5d08f79 +---++ | Field | Value | +---++ | admin_state_up| UP | | availability_zone_hints | | | availability_zones| | | created_at| 2020-11-25T10:55:32Z | | description | | | dns_domain| | | id| e30b938b-210d-45c2-894c-95c0c5d08f79 | | ipv4_address_scope| None | | ipv6_address_scope| None | | is_default| False | | is_vlan_transparent | None | | location | cloud='', project.domain_id=, project.domain_name=, project.id='606e529ab1bc4b18a6d5dbf8735b9815', project.name=, region_name='us-test', zone= | | mtu | 1500 | | name | test | | port_security_enabled | True | | project_id| 606e529ab1bc4b18a6d5dbf8735b9815 | | provider:network_type | vlan | | provider:physical_network | physnet1 | | provider:segmentation_id | 2220 |
[Yahoo-eng-team] [Bug 1820612] Re: Logging is hard to read if there is a problem with resources during live migration
** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1820612 Title: Logging is hard to read if there is a problem with resources during live migration Status in OpenStack nova-cloud-controller charm: Incomplete Status in OpenStack Compute (nova): New Bug description: Issuing the command to migrate an instance from openstack-6 to another hosts (openstack-17) which does have enough resources # logging in nova-compute.log 2019-03-18 12:56:49.111 301805 DEBUG nova.scheduler.client.report [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] Doubling-up allocation request for move operation. _move_operation_alloc_request /usr/lib/python2.7/dist-packages/nova/scheduler/client/report.py:162 2019-03-18 12:56:49.112 301805 DEBUG nova.scheduler.client.report [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] New allocation request containing both source and destination hosts in move operation: {'allocations': [{'resource_provider': {'uuid': u'4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec'}, 'resources': {u'VCPU': 4, u'MEMORY_MB': 2048, u'DISK_GB': 20}}, {'resource_provider': {'uuid': u'57990d7c-7b10-40ee-916f-324bf7784eed'}, 'resources': {u'VCPU': 4, u'MEMORY_MB': 2048, u'DISK_GB': 20}}]} _move_operation_alloc_request /usr/lib/python2.7/dist-packages/nova/scheduler/client/report.py:202 2019-03-18 12:56:49.146 301805 WARNING nova.scheduler.client.report [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] Unable to submit allocation for instance 7e00-7913-4de9-8f45-ce13fcb8a104 (409 409 Conflict 409 Conflict There was a conflict when trying to complete your request. Unable to allocate inventory: Unable to create allocation for 'MEMORY_MB' on resource provider '4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec'. The requested amount would exceed the capacity. ) 2019-03-18 12:56:49.147 301805 WARNING nova.scheduler.utils [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] Failed to compute_task_migrate_server: No valid host was found. Unable to move instance 7e00-7913-4de9-8f45-ce13fcb8a104 to host openstack-17. There is not enough capacity on the host for the instance.: NoValidHost: No valid host was found. Unable to move instance 7e00-7913-4de9-8f45-ce13fcb8a104 to host openstack-17. There is not enough capacity on the host for the instance. 2019-03-18 12:56:49.148 301805 WARNING nova.scheduler.utils [req-786024d5-2ba8-450c-9809-bbafaf7c15bd 7a5e20f2d1fc4af18f959a4666c2265c b07f32d8f1f84ba7bbe821ee7fa4f09a - f750199c451f432f9d615a147744f4f5 f750199c451f432f9d615a147744f4f5] [instance: 7e00-7913-4de9-8f45-ce13fcb8a104] Setting instance to ACTIVE state.: NoValidHost: No valid host was found. Unable to move instance 7e00-7913-4de9-8f45-ce13fcb8a104 to host openstack-17. There is not enough capacity on the host for the instance. When searching who resource provider '4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec' is, used the nova_api database select * from resource_providers where uuid='4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec'; +-+-++--+--++--+ | created_at | updated_at | id | uuid | name | generation | can_host | +-+-++--+--++--+ | 2018-05-09 11:00:01 | 2019-03-14 10:47:55 | 39 | 4ce95dcf-4c42-47cf-bd1e-48a0f4a5ecec | openstack-6.maas |171 | NULL | +-+-++--+--++--+ 1 row in set (0.00 sec) So that is openstack-6 and nog 17 as mentioned in the above logging. From the logging provided this is not clear, also there does not seem to be an command to retrieve the resource-provider, based on the uuid and that is the only thing logged. To manage notifications about this bug go to: https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1820612/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team
[Yahoo-eng-team] [Bug 1713499] Re: Cannot delete a neutron network, if the currently configured MTU is lower than the network's MTU
** Also affects: neutron (Ubuntu) Importance: Undecided Status: New ** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1713499 Title: Cannot delete a neutron network, if the currently configured MTU is lower than the network's MTU Status in Ubuntu Cloud Archive: New Status in neutron: Fix Released Status in neutron package in Ubuntu: New Bug description: Currently, the neutron API returns an error [1] when trying to delete a neutron network which has a higher MTU than the configured MTU[2][3]. This issue has been noticed in Pike. [1] Error: http://paste.openstack.org/show/619627/ [2] neutron.conf: http://paste.openstack.org/show/619629/ [3] ml2_conf.ini: http://paste.openstack.org/show/619630/ To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1713499/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1582585] Re: the speed of query user from ldap server is very slow
** Also affects: keystone (Ubuntu) Importance: Undecided Status: New ** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1582585 Title: the speed of query user from ldap server is very slow Status in Ubuntu Cloud Archive: New Status in OpenStack Identity (keystone): Fix Released Status in keystone package in Ubuntu: New Bug description: In our project, the speed of query user from ldap server is very slow,our ldap user number is 12,000,the query costs almost 45 seconds The reason is that keystone will generate the uuid for the ldap users one by one and insert db.And second query time later,it also goes to db,not use the cache. So adding the cache to improve the query speed After adding @MEMOIZE to the following function https://github.com/openstack/keystone/blob/master/keystone/identity/core.py#L580. First query time almost costs 50 seconds,but second query time later it only costs 7 seconds. So it is very necessary to improve this feature To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1582585/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1613900] Re: Unable to use 'Any' availability zone when spawning instance
** Also affects: horizon (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1613900 Title: Unable to use 'Any' availability zone when spawning instance Status in Ubuntu Cloud Archive: In Progress Status in OpenStack Dashboard (Horizon): Fix Released Status in horizon package in Ubuntu: In Progress Bug description: While using Mitaka, we found that by default, using js backend, it is not possible to choose 'any' availability zone. The issue is not fixed in master branch. For python implementation the logic is: https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/project/instances/workflows/create_instance.py#L390 The JS implementation miss the logic if number of AZs is >1 https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/project/static/dashboard/project/workflow/launch-instance/launch-instance-model.service.js#L321 Also, JS implementation looks ugly if you have lot of subnets per network... To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1613900/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1613900] Re: Unable to use 'Any' availability zone when spawning instance
** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1613900 Title: Unable to use 'Any' availability zone when spawning instance Status in Ubuntu Cloud Archive: In Progress Status in OpenStack Dashboard (Horizon): Fix Released Status in horizon package in Ubuntu: In Progress Bug description: While using Mitaka, we found that by default, using js backend, it is not possible to choose 'any' availability zone. The issue is not fixed in master branch. For python implementation the logic is: https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/project/instances/workflows/create_instance.py#L390 The JS implementation miss the logic if number of AZs is >1 https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/project/static/dashboard/project/workflow/launch-instance/launch-instance-model.service.js#L321 Also, JS implementation looks ugly if you have lot of subnets per network... To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1613900/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1718287] [NEW] systemd mount targets fail due to device busy or already mounted
Public bug reported: [Issue] After rebooting a 16.04 AWS instance (ami-1d4e7a66) with several external disks attached, formatted, and added to /etc/fstab - systemd mount targets fail to mount with: ● media-v.mount - /media/v Loaded: loaded (/etc/fstab; bad; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2017-09-19 20:12:18 UTC; 1min 54s ago Where: /media/v What: /dev/xvdv Docs: man:fstab(5) man:systemd-fstab-generator(8) Process: 1196 ExecMount=/bin/mount /dev/xvdv /media/v -t ext4 -o defaults (code=exited, status=32) Sep 19 20:12:17 ip-172-31-7-167 systemd[1]: Mounting /media/v... Sep 19 20:12:17 ip-172-31-7-167 mount[1196]: mount: /dev/xvdv is already mounted or /media/v busy Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Mount process exited, code=exited status=32 Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: Failed to mount /media/v. Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Unit entered failed state. >From the cloud-init logs, it appears that the the OVF datasource is mounting >the device to find data: 2017-09-19 20:12:17,502 - util.py[DEBUG]: Peeking at /dev/xvdv (max_bytes=512) 2017-09-19 20:12:17,502 - util.py[DEBUG]: Reading from /proc/mounts (quiet=False) 2017-09-19 20:12:17,502 - util.py[DEBUG]: Read 2570 bytes from /proc/mounts ... 2017-09-19 20:12:17,506 - util.py[DEBUG]: Running command ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdv', '/tmp/tmpw2tyqqid'] with allowed return codes [0] (shell=False, capture=True) 2017-09-19 20:12:17,545 - util.py[DEBUG]: Failed mount of '/dev/xvdv' as 'iso9660': Unexpected error while running command. Command: ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdv', '/tmp/tmpw2tyqqid'] Exit code: 32 Reason: - Stdout: - Stderr: mount: wrong fs type, bad option, bad superblock on /dev/xvdv, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. 2017-09-19 20:12:17,545 - util.py[DEBUG]: Recursively deleting /tmp/tmpw2tyqqid 2017-09-19 20:12:17,545 - DataSourceOVF.py[DEBUG]: /dev/xvdv not mountable as iso9660 [Vitals] Version: 0.7.9-153-g16a7302f-0ubuntu1~16.04.2 OS: Ubuntu 16.04 Provider: AWS - ami-1d4e7a66 [Recreate] To recreate this 1. Launch an AWS instance using AMI ami-1d4e7a66 and attach several disks (I used 25 additional disks) 2. Format and mount all 25: mkdir /media/{b..z} for i in {b..z}; do mkfs -t ext4 /dev/xvd$i mount /dev/xvd$i /media/$i echo "/dev/xvd$i /media/$i ext4 defaults,nofail 0 2" >> /etc/fstab done 3. reboot instance Since this is a race, multiple may be necessary. A reproducer script is attached. ** Affects: cloud-init Importance: Undecided Status: New ** Tags: sts ** Attachment added: "cloud-init.tar" https://bugs.launchpad.net/bugs/1718287/+attachment/4953081/+files/cloud-init.tar -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1718287 Title: systemd mount targets fail due to device busy or already mounted Status in cloud-init: New Bug description: [Issue] After rebooting a 16.04 AWS instance (ami-1d4e7a66) with several external disks attached, formatted, and added to /etc/fstab - systemd mount targets fail to mount with: ● media-v.mount - /media/v Loaded: loaded (/etc/fstab; bad; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2017-09-19 20:12:18 UTC; 1min 54s ago Where: /media/v What: /dev/xvdv Docs: man:fstab(5) man:systemd-fstab-generator(8) Process: 1196 ExecMount=/bin/mount /dev/xvdv /media/v -t ext4 -o defaults (code=exited, status=32) Sep 19 20:12:17 ip-172-31-7-167 systemd[1]: Mounting /media/v... Sep 19 20:12:17 ip-172-31-7-167 mount[1196]: mount: /dev/xvdv is already mounted or /media/v busy Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Mount process exited, code=exited status=32 Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: Failed to mount /media/v. Sep 19 20:12:18 ip-172-31-7-167 systemd[1]: media-v.mount: Unit entered failed state. From the cloud-init logs, it appears that the the OVF datasource is mounting the device to find data: 2017-09-19 20:12:17,502 - util.py[DEBUG]: Peeking at /dev/xvdv (max_bytes=512) 2017-09-19 20:12:17,502 - util.py[DEBUG]: Reading from /proc/mounts (quiet=False) 2017-09-19 20:12:17,502 - util.py[DEBUG]: Read 2570 bytes from /proc/mounts ... 2017-09-19 20:12:17,506 - util.py[DEBUG]: Running command ['mount', '-o', 'ro,sync', '-t', 'iso9660', '/dev/xvdv', '/tmp/tmpw2tyqqid'] with allowed return codes [0] (shell=False, capture=True) 2017-09-19 20:12:17,545 - util.py[DEBUG]: Failed mount of '/dev/xvdv' as 'iso9660': Unexpected error while running command. Command: [
[Yahoo-eng-team] [Bug 1668410] Re: [SRU] Infinite loop trying to delete deleted HA router
** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1668410 Title: [SRU] Infinite loop trying to delete deleted HA router Status in Ubuntu Cloud Archive: New Status in neutron: In Progress Status in OpenStack Security Advisory: Won't Fix Status in neutron package in Ubuntu: Triaged Bug description: [Impact] When deleting a router the logfile is filled up. See full log - http://paste.ubuntu.com/25429257/ I can see the error 'Error while deleting router c0dab368-5ac8-4996-88c9-f5d345a774a6' occured 3343386 times from _safe_router_removed() [1]: $ grep -r 'Error while deleting router c0dab368-5ac8-4996-88c9-f5d345a774a6' |wc -l 3343386 This _safe_router_removed() is invoked by L488 [2], if _safe_router_removed() goes wrong it will return False, then self._resync_router(update) [3] will make the code _safe_router_removed be run again and again. So we saw so many errors 'Error while deleting router X'. [1] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L361 [2] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488 [3] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L457 [Test Case] That's because race condition between neutron server and L3 agent, after neutron server deletes HA interfaces the L3 agent may sync a HA router without HA interface info (just need to trigger L708[1] after deleting HA interfaces and before deleting HA router). If we delete HA router at this time, this problem will happen. So test case we design is as below: 1, First update fixed package, and restart neutron-server by 'sudo service neutron-server restart' 2, Create ha_router neutron router-create harouter --ha=True 3, Delete ports associated with ha_router before deleting ha_router neutron router-port-list harouter |grep 'HA port' |awk '{print $2}' |xargs -l neutron port-delete neutron router-port-list harouter 4, Update ha_router to trigger l3-agent to update ha_router info without ha_port into self.router_info neutron router-update harouter --description=test 5, Delete ha_router this time neutron router-delete harouter [1] https://github.com/openstack/neutron/blob/mitaka- eol/neutron/db/l3_hamode_db.py#L708 [Regression Potential] The fixed patch [1] for neutron-server will no longer return ha_router which is missing ha_ports, so L488 will no longer have chance to call _safe_router_removed() for a ha_router, so the problem has been fundamentally fixed by this patch and no regression potential. Besides, this fixed patch has been in mitaka-eol branch now, and neutron-server mitaka package is based on neutron-8.4.0, so we need to backport it to xenial and mitaka. $ git tag --contains 8c77ee6b20dd38cc0246e854711cb91cffe3a069 mitaka-eol [1] https://review.openstack.org/#/c/440799/2/neutron/db/l3_hamode_db.py [2] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1668410/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1685881] Re: l3-agent-router-add doesn't error/warn about router already existing on agent
Adding neutron as it doesn't appear that this is charm related. The error command that should error/warn is from the neutron cli itself. ** Also affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1685881 Title: l3-agent-router-add doesn't error/warn about router already existing on agent Status in OpenStack neutron-api charm: New Status in neutron: New Bug description: we had an incident on a network that ended up with random packet dropping between nodes within the cloud, and outside of cloud when crossing l3-routers. Steps to reproduce: juju set neutron-api min-agents-per-router=2 juju set neutron-api max-agents-per-router=2 juju set neutron-api l2-population=false juju set neutron-api enable-l3ha=true for i in $(neutron router-list -f value -c id); do neutron router-update $i --admin-state=up=false neutron router-update $i --ha=true neutron router-update $i --admin-state=up=true done juju set neutron-api max-agents-per-router=3 neutron for i in $(neutron router-list -f value -c id); do neutron l3-agent-list-hosting-router $i for j in $(neutron agent-list -f value -c id); do neutron l3-agent-router-add $j $i done done sleep 120 #for settle for i in $(neutron router-list -f value -c id); do neutron l3-agent-list-hosting-router $i done Potentially you may see two active l3-agents for a given router. (We saw this corresponded to rabbitmq messaging failures concurrent with this activity). Our environment had 9 active routers. You'll notice that there's no error that comes out of adding a router to an agent it's already running on. After making these updates, we found that ssh and RDP sessions to the floating IPs associated with VMs across several different networks/routers were exhibiting random session drops as if the route were hosted in multiple locations and we were getting an asymmetric route issue. We had to revert to --ha=false and enable-l3ha=false before we could gather deeper info/SOS reports. May be able to reproduce in lab at some point in the future. To manage notifications about this bug go to: https://bugs.launchpad.net/charm-neutron-api/+bug/1685881/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1629097] Re: neutron-rootwrap processes not getting cleaned up
I saw this last night, and can indeed confirm its related to DNS issues. It was suggested by a colleague that it may be related to the sudo call returning an error indicating that the hostname could not be found, though I spent no time exploring this option today. Restarting the openvswitch-switch service closes all the existing processes but it seems more that the service can't stop cleanly. Also of note is that the service stop took a very long time, whereas with working DNS it took a few seconds at worst. ** Changed in: neutron Status: Expired => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1629097 Title: neutron-rootwrap processes not getting cleaned up Status in neutron: Confirmed Bug description: neutron-rootwrap processes aren't getting cleaned up on Newton. I'm testing with Newton rc3. I was noticing memory exhaustion on my neutron gateway units, which turned out to be due to compounding neutron-rootwrap processes: sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json $ top -n1 -b -o VIRT http://paste.ubuntu.com/23252407/ $ ps aux|grep ovsdb-client http://paste.ubuntu.com/23252658/ Restarting openvswitch cleans up the processes but they just start piling again up soon after: sudo systemctl restart openvswitch-switch At first I thought this was an openvswitch issue, however I reverted the code in get_root_helper_child_pid() and neutron-rootwrap processes started getting cleaned up. See corresponding commit for code that possibly introduced this at [1]. This can be recreated with the openstack charms using xenial-newton- staging. On newton deploys, neutron-gateway and nova-compute units will exhaust memory due to compounding ovsdb-client processes. [1] commit fd93e19f2a415b3803700fc491749daba01a4390 Author: Assaf Muller Date: Fri Mar 18 16:29:26 2016 -0400 Change get_root_helper_child_pid to stop when it finds cmd get_root_helper_child_pid recursively finds the child of pid, until it can no longer find a child. However, the intention is not to find the deepest child, but to strip away root helpers. For example 'sudo neutron-rootwrap x' is supposed to find the pid of x. However, in cases 'x' spawned quick lived children of its own (For example: ip / brctl / ovs invocations), get_root_helper_child_pid returned those pids if called in the wrong time. Change-Id: I582aa5c931c8bfe57f49df6899445698270bb33e Closes-Bug: #1558819 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1629097/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1623327] Re: openstack orchestration service list fails to return endpoint
Based on Brad's comment in #9, there were actions that were missing for the openstack orchestration service. I believe this to no longer be a valid bug, therefore I'm marking remaining tasks as invalid. ** Changed in: python-openstackclient Status: New => Invalid ** Changed in: keystone Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1623327 Title: openstack orchestration service list fails to return endpoint Status in OpenStack Identity (keystone): Invalid Status in python-heatclient: Invalid Status in python-openstackclient: Invalid Bug description: OpenStack service endpoints are created for the heat service, but the openstack client cannot find the endpoints to issue the query against. I suspect this is due to the domain auth tokens included in the initial authentication doesn't include any endpoints with the $(tenant_id)s in the output there. I'm not sure whether this should be a bug against the openstack client or against keystone. I believe its intentional to exclude the endpoints with a tenant_id substitution in the endpoint, but it doesn't make any sense to me as it seems the openstack catalog list command uses this catalog query in order to list endpoints and services, which it only gets the service but not the endpoints. Here's some output collected: > openstack catalog list +--+-++ | Name | Type| Endpoints | +--+-++ | heat | orchestration || | heat-cfn | cloudformation | RegionOne | | | | public: http://10.5.20.176:8000/v1 | | | | RegionOne | | | | admin: http://10.5.20.176:8000/v1| | | | RegionOne | | | | internal: http://10.5.20.176:8000/v1 | | | || ... > openstack endpoint list | grep heat | 85ee6b6e8f814856a3a547982f6b2835 | RegionOne | heat | orchestration | True| internal | http://10.5.20.176:8004/v1/$(tenant_id)s | | 895cb2e4e5d1492e9e40c205f6b0c508 | RegionOne | heat | orchestration | True| public| http://10.5.20.176:8004/v1/$(tenant_id)s | | ad63a139c90749ff9d98a704200d2e49 | RegionOne | heat | orchestration | True| admin | http://10.5.20.176:8004/v1/$(tenant_id)s | > openstack orchestration service list public endpoint for orchestration service not found To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1623327/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1623327] [NEW] openstack orchestration service list fails to return endpoint
Public bug reported: OpenStack service endpoints are created for the heat service, but the openstack client cannot find the endpoints to issue the query against. I suspect this is due to the domain auth tokens included in the initial authentication doesn't include any endpoints with the $(tenant_id)s in the output there. I'm not sure whether this should be a bug against the openstack client or against keystone. I believe its intentional to exclude the endpoints with a tenant_id substitution in the endpoint, but it doesn't make any sense to me as it seems the openstack catalog list command uses this catalog query in order to list endpoints and services, which it only gets the service but not the endpoints. Here's some output collected: > openstack catalog list +--+-++ | Name | Type| Endpoints | +--+-++ | heat | orchestration || | heat-cfn | cloudformation | RegionOne | | | | public: http://10.5.20.176:8000/v1 | | | | RegionOne | | | | admin: http://10.5.20.176:8000/v1| | | | RegionOne | | | | internal: http://10.5.20.176:8000/v1 | | | || ... > openstack endpoint list | grep heat | 85ee6b6e8f814856a3a547982f6b2835 | RegionOne | heat | orchestration | True| internal | http://10.5.20.176:8004/v1/$(tenant_id)s | | 895cb2e4e5d1492e9e40c205f6b0c508 | RegionOne | heat | orchestration | True| public| http://10.5.20.176:8004/v1/$(tenant_id)s | | ad63a139c90749ff9d98a704200d2e49 | RegionOne | heat | orchestration | True| admin | http://10.5.20.176:8004/v1/$(tenant_id)s | > openstack orchestration service list public endpoint for orchestration service not found ** Affects: keystone Importance: Undecided Status: New ** Affects: python-openstackclient Importance: Undecided Status: New ** Tags: canonical-bootstack ** Also affects: python-openstackclient Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1623327 Title: openstack orchestration service list fails to return endpoint Status in OpenStack Identity (keystone): New Status in python-openstackclient: New Bug description: OpenStack service endpoints are created for the heat service, but the openstack client cannot find the endpoints to issue the query against. I suspect this is due to the domain auth tokens included in the initial authentication doesn't include any endpoints with the $(tenant_id)s in the output there. I'm not sure whether this should be a bug against the openstack client or against keystone. I believe its intentional to exclude the endpoints with a tenant_id substitution in the endpoint, but it doesn't make any sense to me as it seems the openstack catalog list command uses this catalog query in order to list endpoints and services, which it only gets the service but not the endpoints. Here's some output collected: > openstack catalog list +--+-++ | Name | Type| Endpoints | +--+-++ | heat | orchestration || | heat-cfn | cloudformation | RegionOne | | | | public: http://10.5.20.176:8000/v1 | | | | RegionOne | | | | admin: http://10.5.20.176:8000/v1| | | | RegionOne | | | | internal: http://10.5.20.176:8000/v1 | | | || ... > openstack endpoint list | grep heat | 85ee6b6e8f814856a3a547982f6b2835 | RegionOne | heat | orchestration | True| internal | http://10.5.20.176:8004/v1/$(tenant_id)s | | 895cb2e4e5d1492e9e40c205f6b0c508 | RegionOne | heat | orchestration | True| public| http://10.5.20.176:8004/v1/$(tenant_id)s | | ad63a139c90749ff9d98a704200d2e49 | RegionOne | heat | orchestration | True| admin
[Yahoo-eng-team] [Bug 1453264] Re: iptables_manager can run very slowly when a large number of security group rules are present
Uploading debdiff based on what is currently available in trusty- proposed since that has been verified and pending release. ** Description changed: + [Impact] + We have customers that typically add a few hundred security group rules or more. We also typically run 30+ VMs per compute node. When about 10+ VMs with a large SG set all get scheduled to the same node, the L2 agent (OVS) can spend many minutes in the iptables_manager.apply() code, so much so that by the time all the rules are updated, the VM has already tried DHCP and failed, leaving it in an unusable state. While there have been some patches that tried to address this in Juno and Kilo, they've either not helped as much as necessary, or broken SGs completely due to re-ordering the of the iptables rules. I've been able to show some pretty bad scaling with just a handful of VMs running in devstack based on today's code (May 8th, 2015) from upstream Openstack. + + + [Test Case] Here's what I tested: 1. I created a security group with 1000 TCP port rules (you could alternately have a smaller number of rules and more VMs, but it's quicker this way) 2. I booted VMs, specifying both the default and "large" SGs, and timed from the second it took Neutron to "learn" about the port until it completed it's work 3. I got a :( pretty quickly And here's some data: 1-3 VM - didn't time, less than 20 seconds 4th VM - 0:36 5th VM - 0:53 6th VM - 1:11 7th VM - 1:25 8th VM - 1:48 9th VM - 2:14 While it's busy adding the rules, the OVS agent is consuming pretty close to 100% of a CPU for most of this time (from top): - PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND + PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 25767 stack 20 0 157936 76572 4416 R 89.2 0.5 50:14.28 python And this is with only ~10K rules at this point! When we start crossing the 20K point VM boot failures start to happen. I'm filing this bug since we need to take a closer look at this in Liberty and fix it, it's been this way since Havana and needs some TLC. I've attached a simple script I've used to recreate this, and will start taking a look at options here. + + + [Regression Potential] + + Minimal since this has been running in upstream stable for several + releases now (Kilo, Liberty, Mitaka). ** Also affects: neutron (Ubuntu) Importance: Undecided Status: New ** Patch added: "trusty patch based on -proposed" https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1453264/+attachment/4730270/+files/lp1453264.debdiff ** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1453264 Title: iptables_manager can run very slowly when a large number of security group rules are present Status in Ubuntu Cloud Archive: New Status in neutron: Fix Released Status in neutron kilo series: Fix Released Status in neutron package in Ubuntu: New Bug description: [Impact] We have customers that typically add a few hundred security group rules or more. We also typically run 30+ VMs per compute node. When about 10+ VMs with a large SG set all get scheduled to the same node, the L2 agent (OVS) can spend many minutes in the iptables_manager.apply() code, so much so that by the time all the rules are updated, the VM has already tried DHCP and failed, leaving it in an unusable state. While there have been some patches that tried to address this in Juno and Kilo, they've either not helped as much as necessary, or broken SGs completely due to re-ordering the of the iptables rules. I've been able to show some pretty bad scaling with just a handful of VMs running in devstack based on today's code (May 8th, 2015) from upstream Openstack. [Test Case] Here's what I tested: 1. I created a security group with 1000 TCP port rules (you could alternately have a smaller number of rules and more VMs, but it's quicker this way) 2. I booted VMs, specifying both the default and "large" SGs, and timed from the second it took Neutron to "learn" about the port until it completed it's work 3. I got a :( pretty quickly And here's some data: 1-3 VM - didn't time, less than 20 seconds 4th VM - 0:36 5th VM - 0:53 6th VM - 1:11 7th VM - 1:25 8th VM - 1:48 9th VM - 2:14 While it's busy adding the rules, the OVS agent is consuming pretty close to 100% of a CPU for most of this time (from top): PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 25767 stack 20 0 157936 76572 4416 R 89.2 0.5 50:14.28 python And this is with only ~10K rules at this point! When we start crossing the 20K point VM boot failures start to happ
[Yahoo-eng-team] [Bug 1607039] [NEW] KVS _update_user_token_list can be more efficient
Public bug reported: Maintaining the user token list and the revocation list in the memcached persistence backend (kvs) is inefficient for larger amounts of tokens due to the use of a linear algorithm for token list maintenance. Since the list is unordered, each token within the list must be checked first to ensure whether it has expired or not, secondly to determine if it has been revoked or not. By changing to an ordered list and using a binary search, expired tokens can be found with less computational overhead. The current algorithm means that the insertion of a new token into the list is O(n) since token expiration validity is done when the list is updated. By using an ordered list, the insertion and validation of the expiration can be reduced to O(log n). ** Affects: keystone Importance: Undecided Status: New ** Tags: sts -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1607039 Title: KVS _update_user_token_list can be more efficient Status in OpenStack Identity (keystone): New Bug description: Maintaining the user token list and the revocation list in the memcached persistence backend (kvs) is inefficient for larger amounts of tokens due to the use of a linear algorithm for token list maintenance. Since the list is unordered, each token within the list must be checked first to ensure whether it has expired or not, secondly to determine if it has been revoked or not. By changing to an ordered list and using a binary search, expired tokens can be found with less computational overhead. The current algorithm means that the insertion of a new token into the list is O(n) since token expiration validity is done when the list is updated. By using an ordered list, the insertion and validation of the expiration can be reduced to O(log n). To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1607039/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1414218] Re: Remove extraneous trace in linux/dhcp.py
** Also affects: cloud-archive Importance: Undecided Status: New ** Changed in: cloud-archive Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1414218 Title: Remove extraneous trace in linux/dhcp.py Status in Ubuntu Cloud Archive: Confirmed Status in neutron: Fix Released Status in neutron juno series: Fix Released Status in neutron package in Ubuntu: New Bug description: [Impact] The debug tracepoint in Dnsmasq._output_hosts_file is extraneous and causes unnecessary performance overhead when creating lots (> 1000) ports at one time. The trace point is unnecessary since the data is being written to disk and the file can be examined in a worst case scenario. The added performance overhead is an order of magnitude in difference (~.5 seconds versus ~.05 seconds at 1500 ports). [Test Case] 1. Deploy OpenStack using neutron for networking 2. Create 1500 ports 3. Observe the performance degradation for each port creation. [Regression Potential] Minimal. This code has been running in stable/juno, stable/kilo, and above for awhile. [Other Questions] This is likely to occur in OpenStack deployments which have large networks deployed. The degradation is gradual, but the performance becomes unacceptable with large enough networks. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1414218/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1554227] Re: DHCP unicast requests are not responded to
** Also affects: nova (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1554227 Title: DHCP unicast requests are not responded to Status in OpenStack Compute (nova): New Status in nova package in Ubuntu: New Bug description: Issue: We run nova-network in VLAN+multi_host mode on Kilo and notice that only one dnsmasq process (either the oldest or newest) on the hypervisor responds to unicast BOOTPREQUESTS. dhclient on VMs will retry until it eventually gives up and broadcasts the request, which is then responded to. Depending on the timing of the DHCP broadcast request, VMs can briefly lose connectivity as they attempt rebinding. According to http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=9380ba70d67db6b69f817d8e318de5ba1e990b12, it seems that passing "--interface" argument, in addition to "--bind- interfaces" is necessary for dnsmasq to work correctly in VLAN mode. Reproduce steps: 1. Create two tenants 2. Create a VM under each tenant, forcing the VMs to run on a single hypervisor. I tested with a vanilla Ubuntu cloud image, but any other image that uses dhclient should also work. 3. On the hypervisor, run dhcpdump -i for each tenant's bridge interface. On at least one of them, you should see unicast BOOTPREQUEST with no corresponding BOOTPREPLY. dnsmasq will reply when the request eventually hits 255.255.255.255. Nova/Openstack/dnsmasq versions: ii nova-api 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - API frontend ii nova-common 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - common files ii nova-compute 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - compute node base ii nova-compute-libvirt 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - compute node libvirt support ii nova-compute-qemu1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - compute node (QEmu) ii nova-network 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - Network manager ii nova-novncproxy 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute - NoVNC proxy ii python-nova 1:2015.1.2-0ubuntu2~cloud0 all OpenStack Compute Python libraries ii python-nova-adminclient 0.1.8-0ubuntu2 amd64client for administering Openstack Nova ii python-novaclient1:2.22.0-0ubuntu2~cloud0 all client library for OpenStack Compute API ii dnsmasq-base 2.68-1ubuntu0.1 amd64Small caching DNS proxy and DHCP/TFTP server ii dnsmasq-utils2.68-1ubuntu0.1 amd64Utilities for manipulating DHCP leases To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1554227/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1374999] Re: iSCSI volume detach does not correctly remove the multipath device descriptors
Marking this is confirmed against the Ubuntu Cloud Archive for Kilo, Juno, and Trusty which are still supported from the Ubuntu perspective and is known not to include the os-brick library dependencies. Certainly, the testing for the change to os-brick needs to be verified that the problem is fixed there, so leaving that as incomplete. ** Changed in: cloud-archive/icehouse Status: Invalid => Confirmed ** Changed in: cloud-archive/juno Status: Invalid => Confirmed ** Changed in: cloud-archive/kilo Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1374999 Title: iSCSI volume detach does not correctly remove the multipath device descriptors Status in Ubuntu Cloud Archive: Confirmed Status in Ubuntu Cloud Archive icehouse series: Confirmed Status in Ubuntu Cloud Archive juno series: Confirmed Status in Ubuntu Cloud Archive kilo series: Confirmed Status in OpenStack Compute (nova): Incomplete Status in nova package in Ubuntu: Triaged Status in nova source package in Trusty: Triaged Bug description: iSCSI volume detach does not correctly remove the multipath device descriptors tested environment: nova-compute on Ubuntu 14.04.1, iscsi_use_multipath=True and iSCSI volume backend is EMC VNX 5300. I created 3 cinder volumes and attached them to a nova instance. Then I detach them one by one. First 2 volumes volumes detached successfully. 3rd volume also successfully detached but ended up with failed multipaths. Here is the terminal log for last volume detach. openstack@W1DEV103:~/devstack$ cinder list +--++--+--+-+--+--+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--++--+--+-+--+--+ | 56a63288-5cc0-4f5c-9197-cde731172dd8 | in-use | None | 1 | None | false | 5bd68785-4acf-43ab-ae13-11b1edc3a62e | +--++--+--+-+--+--+ openstack@W1CN103:/etc/iscsi$ date;sudo multipath -l Fri Sep 19 21:38:13 JST 2014 360060160cf0036002d1475f6e73fe411 dm-2 DGC,VRAID size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=-1 status=active | |- 4:0:0:42 sdb 8:16 active undef running | |- 5:0:0:42 sdd 8:48 active undef running | |- 6:0:0:42 sdf 8:80 active undef running | `- 7:0:0:42 sdh 8:112 active undef running `-+- policy='round-robin 0' prio=-1 status=enabled |- 11:0:0:42 sdp 8:240 active undef running |- 8:0:0:42 sdj 8:144 active undef running |- 9:0:0:42 sdl 8:176 active undef running `- 10:0:0:42 sdn 8:208 active undef running openstack@W1CN103:/etc/iscsi$ date;sudo iscsiadm -m session Fri Sep 19 21:38:19 JST 2014 tcp: [10] 172.23.58.228:3260,4 iqn.1992-04.com.emc:cx.fcn00133400150.a7 tcp: [3] 172.23.58.238:3260,8 iqn.1992-04.com.emc:cx.fcn00133400150.b7 tcp: [4] 172.23.58.235:3260,20 iqn.1992-04.com.emc:cx.fcn00133400150.b4 tcp: [5] 172.23.58.236:3260,6 iqn.1992-04.com.emc:cx.fcn00133400150.b5 tcp: [6] 172.23.58.237:3260,19 iqn.1992-04.com.emc:cx.fcn00133400150.b6 tcp: [7] 172.23.58.225:3260,16 iqn.1992-04.com.emc:cx.fcn00133400150.a4 tcp: [8] 172.23.58.226:3260,2 iqn.1992-04.com.emc:cx.fcn00133400150.a5 tcp: [9] 172.23.58.227:3260,17 iqn.1992-04.com.emc:cx.fcn00133400150.a6 openstack@W1DEV103:~/devstack$ nova volume-detach 5bd68785-4acf-43ab-ae13-11b1edc3a62e 56a63288-5cc0-4f5c-9197-cde731172dd8 openstack@W1DEV103:~/devstack$ openstack@W1DEV103:~/devstack$ cinder list +--+---+--+--+-+--+--+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--+---+--+--+-+--+--+ | 56a63288-5cc0-4f5c-9197-cde731172dd8 | detaching | None | 1 | None | false | 5bd68785-4acf-43ab-ae13-11b1edc3a62e| +--+---+--+--+-+--+--+ openstack@W1DEV103:~/devstack$ openstack@W1DEV103:~/devstack$ cinder list +--+---+--+--+-+--+-+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--+---+--+--+-+--+-+ | 56a63288-5cc0-4f5c-9197-cde731172dd8 | available | None | 1 | None | false | +-
[Yahoo-eng-team] [Bug 1374999] Re: iSCSI volume detach does not correctly remove the multipath device descriptors
** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1374999 Title: iSCSI volume detach does not correctly remove the multipath device descriptors Status in Ubuntu Cloud Archive: New Status in OpenStack Compute (nova): In Progress Status in nova package in Ubuntu: Triaged Status in nova source package in Trusty: Triaged Bug description: iSCSI volume detach does not correctly remove the multipath device descriptors tested environment: nova-compute on Ubuntu 14.04.1, iscsi_use_multipath=True and iSCSI volume backend is EMC VNX 5300. I created 3 cinder volumes and attached them to a nova instance. Then I detach them one by one. First 2 volumes volumes detached successfully. 3rd volume also successfully detached but ended up with failed multipaths. Here is the terminal log for last volume detach. openstack@W1DEV103:~/devstack$ cinder list +--++--+--+-+--+--+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--++--+--+-+--+--+ | 56a63288-5cc0-4f5c-9197-cde731172dd8 | in-use | None | 1 | None | false | 5bd68785-4acf-43ab-ae13-11b1edc3a62e | +--++--+--+-+--+--+ openstack@W1CN103:/etc/iscsi$ date;sudo multipath -l Fri Sep 19 21:38:13 JST 2014 360060160cf0036002d1475f6e73fe411 dm-2 DGC,VRAID size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=-1 status=active | |- 4:0:0:42 sdb 8:16 active undef running | |- 5:0:0:42 sdd 8:48 active undef running | |- 6:0:0:42 sdf 8:80 active undef running | `- 7:0:0:42 sdh 8:112 active undef running `-+- policy='round-robin 0' prio=-1 status=enabled |- 11:0:0:42 sdp 8:240 active undef running |- 8:0:0:42 sdj 8:144 active undef running |- 9:0:0:42 sdl 8:176 active undef running `- 10:0:0:42 sdn 8:208 active undef running openstack@W1CN103:/etc/iscsi$ date;sudo iscsiadm -m session Fri Sep 19 21:38:19 JST 2014 tcp: [10] 172.23.58.228:3260,4 iqn.1992-04.com.emc:cx.fcn00133400150.a7 tcp: [3] 172.23.58.238:3260,8 iqn.1992-04.com.emc:cx.fcn00133400150.b7 tcp: [4] 172.23.58.235:3260,20 iqn.1992-04.com.emc:cx.fcn00133400150.b4 tcp: [5] 172.23.58.236:3260,6 iqn.1992-04.com.emc:cx.fcn00133400150.b5 tcp: [6] 172.23.58.237:3260,19 iqn.1992-04.com.emc:cx.fcn00133400150.b6 tcp: [7] 172.23.58.225:3260,16 iqn.1992-04.com.emc:cx.fcn00133400150.a4 tcp: [8] 172.23.58.226:3260,2 iqn.1992-04.com.emc:cx.fcn00133400150.a5 tcp: [9] 172.23.58.227:3260,17 iqn.1992-04.com.emc:cx.fcn00133400150.a6 openstack@W1DEV103:~/devstack$ nova volume-detach 5bd68785-4acf-43ab-ae13-11b1edc3a62e 56a63288-5cc0-4f5c-9197-cde731172dd8 openstack@W1DEV103:~/devstack$ openstack@W1DEV103:~/devstack$ cinder list +--+---+--+--+-+--+--+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--+---+--+--+-+--+--+ | 56a63288-5cc0-4f5c-9197-cde731172dd8 | detaching | None | 1 | None | false | 5bd68785-4acf-43ab-ae13-11b1edc3a62e| +--+---+--+--+-+--+--+ openstack@W1DEV103:~/devstack$ openstack@W1DEV103:~/devstack$ cinder list +--+---+--+--+-+--+-+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--+---+--+--+-+--+-+ | 56a63288-5cc0-4f5c-9197-cde731172dd8 | available | None | 1 | None | false | +--+---+--+--+-+--+-+ | openstack@W1CN103:/etc/iscsi$ date;sudo multipath -l Fri Sep 19 21:39:23 JST 2014 360060160cf0036002d1475f6e73fe411 dm-2 , size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=-1 status=active | |- #:#:#:# - #:# active undef running | |- #:#:#:# - #:# active undef running | |- #:#:#:# - #:# active undef running | `- #:#:#:# - #:# active undef running `-+- policy='round-robin 0' prio=-1 status=enabled |- #:#:#:# - #:# active undef running |- #:#:#:# - #:# active undef running |- #:#:#:# - #:# ac
[Yahoo-eng-team] [Bug 1353939] Re: Rescue fails with 'Failed to terminate process: Device or resource busy' in the n-cpu log
This fix was made available in 1:2014.2.4-0ubuntu1~cloud4 of nova in the Ubuntu Cloud Archive for Juno. ** Changed in: cloud-archive/juno Status: In Progress => Fix Committed ** Changed in: cloud-archive/juno Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1353939 Title: Rescue fails with 'Failed to terminate process: Device or resource busy' in the n-cpu log Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive juno series: Fix Released Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) juno series: New Status in OpenStack Compute (nova) kilo series: Fix Released Status in nova package in Ubuntu: Invalid Bug description: [Impact] * Users may sometimes fail to shutdown an instance if the associated qemu process is on uninterruptable sleep (typically IO). [Test Case] * 1. create some IO load in a VM 2. look at the associated qemu, make sure it has STAT D in ps output 3. shutdown the instance 4. with the patch in place, nova will retry calling libvirt to shutdown the instance 3 times to wait for the signal to be delivered to the qemu process. [Regression Potential] * None message: "Failed to terminate process" AND message:'InstanceNotRescuable' AND message: 'Exception during message handling' AND tags:"screen-n-cpu.txt" The above log stash-query reports back only the failed jobs, the 'Failed to terminate process' close other failed rescue tests, but tempest does not always reports them as an error at the end. message: "Failed to terminate process" AND tags:"screen-n-cpu.txt" Usual console log: Details: (ServerRescueTestJSON:test_rescue_unrescue_instance) Server 0573094d-53da-40a5-948a-747d181462f5 failed to reach RESCUE status and task state "None" within the required time (196 s). Current status: SHUTOFF. Current task state: None. http://logs.openstack.org/82/107982/2/gate/gate-tempest-dsvm-postgres- full/90726cb/console.html#_2014-08-07_03_50_26_520 Usual n-cpu exception: http://logs.openstack.org/82/107982/2/gate/gate-tempest-dsvm-postgres-full/90726cb/logs/screen-n-cpu.txt.gz#_2014-08-07_03_32_02_855 2014-08-07 03:32:02.855 ERROR oslo.messaging.rpc.dispatcher [req-39ce7a3d-5ceb-41f5-8f9f-face7e608bd1 ServerRescueTestJSON-2035684545 ServerRescueTestJSON-1017508309] Exception during message handling: Instance 0573094d-53da-40a5-948a-747d181462f5 cannot be rescued: Driver Error: Failed to terminate process 26425 with SIGKILL: Device or resource busy 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last): 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher incoming.message)) 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args) 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args) 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/compute/manager.py", line 408, in decorated_function 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/exception.py", line 88, in wrapped 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher payload) 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/openstack/common/excutils.py", line 82, in __exit__ 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/exception.py", line 71, in wrapped 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher return f(self, context, *args, **kw) 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/compute/manager.py", line 292, in decorated_function 2014-08-07 03:32:02.855 22829 TRACE oslo.messaging.rpc.dispatcher pass 2014-08-07 03:32:02.855
[Yahoo-eng-team] [Bug 1414218] [NEW] Remove extraneous trace in linux/dhcp.py
Public bug reported: The debug tracepoint in Dnsmasq._output_hosts_file is extraneous and causes unnecessary performance overhead due to string formating when creating lots (> 1000) ports at one time. The trace point is unnecessary since the data is being written to disk and the file can be examined in a worst case scenario. The added performance overhead is an order of magnitude in difference (~.5 seconds versus ~.05 seconds at 1500 ports). ** Affects: neutron Importance: Undecided Assignee: Billy Olsen (billy-olsen) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => Billy Olsen (billy-olsen) ** Changed in: neutron Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1414218 Title: Remove extraneous trace in linux/dhcp.py Status in OpenStack Neutron (virtual network service): In Progress Bug description: The debug tracepoint in Dnsmasq._output_hosts_file is extraneous and causes unnecessary performance overhead due to string formating when creating lots (> 1000) ports at one time. The trace point is unnecessary since the data is being written to disk and the file can be examined in a worst case scenario. The added performance overhead is an order of magnitude in difference (~.5 seconds versus ~.05 seconds at 1500 ports). To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1414218/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp