[Yahoo-eng-team] [Bug 1752986] [NEW] Live migrate UnexpectedTaskStateError
Public bug reported: Description === Occasionally, when performing live migration, the instance goes to ERROR state with the message "UnexpectedTaskStateError: Conflict updating instance . Expected: {'task_state': [u'migrating']}. Actual: {'task_state': None}" The migration is always successful so far and the instance is always found residing on the target host. Updating the instances table "node" and "host" rows in the nova database with the destination host and then resetting the instance state to ACTIVE gets things back in order. The issue appears to occur randomly across 16 uniform compute nodes and seems like a race condition. Steps to reproduce == 1. Boot an instance to Compute01 2. Issue a live-migrate command for instance targeting Compute02 (this can be done via Horizon and via python-openstackclient) # openstack server migrate --shared-migration --live computehost02 e8928cb2-afae-4cca-93db-f218e9f22324 3. Live-migration works, the instance remains accessible and is moved to the new host. However, ~20% of the time, the instance goes to ERROR state and some cleanup must be done in the database. MariaDB [nova]> update instances set node='computehost02',host='computehost02' where uuid= 'e8928cb2-afae-4cca-93db-f218e9f22324'; # openstack server set --state active e8928cb2-afae-4cca-93db-f218e9f22324 Expected result === The migrated instance should move successfully and return to ACTIVE state. Actual result = Instances occassional end up in ERROR state after a "successful" live-migration. Environment === 1. This is a Newton environment with Nova Libvirt/KVM backed by Ceph. Networking is provided by Neutron ML2 linux bridge agent. root@computehost02:~# nova-compute --version 14.0.8 root@computehost02:~# dpkg -l | grep libvir ii libvirt-bin 1.3.1-1ubuntu10.15 amd64programs for the libvirt library ii libvirt0:amd64 1.3.1-1ubuntu10.15 amd64library for interfacing with different virtualization systems ii python-libvirt 1.3.1-1ubuntu1.1 amd64libvirt Python bindings root@computehost02:~# dpkg -l | grep qemu ii ipxe-qemu 1.0.0+git-20150424.a25a16d-1ubuntu1.2 all PXE boot firmware - ROM images for qemu ii qemu 1:2.5+dfsg-5ubuntu10.16 amd64fast processor emulator ii qemu-block-extra:amd64 1:2.5+dfsg-5ubuntu10.16 amd64extra block backend modules for qemu-system and qemu-utils ii qemu-slof20151103+dfsg-1ubuntu1 all Slimline Open Firmware -- QEMU PowerPC version ii qemu-system 1:2.5+dfsg-5ubuntu10.16 amd64QEMU full system emulation binaries ii qemu-system-arm 1:2.5+dfsg-5ubuntu10.16 amd64QEMU full system emulation binaries (arm) ii qemu-system-common 1:2.5+dfsg-5ubuntu10.16 amd64QEMU full system emulation binaries (common files) ii qemu-system-mips 1:2.5+dfsg-5ubuntu10.16 amd64QEMU full system emulation binaries (mips) ii qemu-system-misc 1:2.5+dfsg-5ubuntu10.16 amd64QEMU full system emulation binaries (miscelaneous) ii qemu-system-ppc 1:2.5+dfsg-5ubuntu10.16 amd64QEMU full system emulation binaries (ppc) ii qemu-system-sparc1:2.5+dfsg-5ubuntu10.16 amd64QEMU full system emulation binaries (sparc) ii qemu-system-x86 1:2.5+dfsg-5ubuntu10.16 amd64QEMU full system emulation binaries (x86) ii qemu-user1:2.5+dfsg-5ubuntu10.16 amd64QEMU user mode emulation binaries ii qemu-utils 1:2.5+dfsg-5ubuntu10.16 amd64QEMU utilities root@computehost02:~# root@cephhost:~# dpkg -l | grep -i ceph ii ceph 10.2.10-1xenial amd64distributed storage and file system ii ceph-base10.2.10-1xenial amd64common ceph daemon libraries and management tools ii ceph-common 10.2.10-1xenial amd64common utilities to mount and interact with a ceph storage cluster ii ceph-mon 10.2.10-1xenial amd64monitor server for the ceph
[Yahoo-eng-team] [Bug 1570958] Re: Need neutron-ns-metadata-proxy child ProcessMonitor for dhcp agent
** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1570958 Title: Need neutron-ns-metadata-proxy child ProcessMonitor for dhcp agent Status in neutron: Invalid Bug description: Related to bug 1257775 and bug 1257524 The l3-agent is able to periodically check that child process neutron- ns-metadata-proxy is still running and respawn it if not. It seems we should periodically check child processes (in addition to dnsmasq) of the dhcp agent as well since the dhcp agent is responsible for spawning the neutron-ns-metadata-proxy process for networks not attached to a router. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1570958/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1570958] [NEW] Need neutron-ns-metadata-proxy child ProcessMonitor for dhcp agent
Public bug reported: Related to bug 1257775 and bug 1257524 The l3-agent is able to periodically check that child process neutron- ns-metadata-proxy is still running and respawn it if not. It seems we should periodically check child processes (in addition to dnsmasq) of the dhcp agent as well since the dhcp agent is responsible for spawning the neutron-ns-metadata-proxy process for networks not attached to a router. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1570958 Title: Need neutron-ns-metadata-proxy child ProcessMonitor for dhcp agent Status in neutron: New Bug description: Related to bug 1257775 and bug 1257524 The l3-agent is able to periodically check that child process neutron- ns-metadata-proxy is still running and respawn it if not. It seems we should periodically check child processes (in addition to dnsmasq) of the dhcp agent as well since the dhcp agent is responsible for spawning the neutron-ns-metadata-proxy process for networks not attached to a router. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1570958/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1484290] [NEW] Neutron migration to Juno breaks router functionality if ports in tenant other than router
Public bug reported: During recent upgrade from Icehouse to Juno, we lost all interfaces on the backside of the router. Plugging the appropriate values into the neutron.routerports table corrected the issue. From /usr/local/lib/python2.7/dist-packages/neutron/db/migration/alembic_migrations/versions/544673ac99ab_add_router_port_table.py , I find the SQL statement to populate the routerports table: ... SQL_STATEMENT = ( insert into routerports select p.device_id as router_id, p.id as port_id, p.device_owner as port_type from ports p join routers r on (p.device_id=r.id) where (r.tenant_id=p.tenant_id AND p.device_owner='network:router_interface') OR (p.tenant_id='' AND p.device_owner='network:router_gateway') ) ... Running the same statement reflects the status of the routerports table when the issue was discovered: MariaDB [neutron] select p.device_id as router_id, p.id as port_id, p.device_owner as port_type from ports p join routers r on (p.device_id=r.id) where (r.tenant_id=p.tenant_id AND p.device_owner='network:router_interface') OR (p.tenant_id='' AND p.device_owner='network:router_gateway'); +--+--+--+ | router_id| port_id | port_type| +--+--+--+ | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 7b280662-37eb-435e-bc20-37b9b824c0b1 | network:router_gateway | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 8db9bfd3-bd4a-4863-92ef-9c2dbdcbbc0f | network:router_interface | +--+--+--+ 2 rows in set (0.00 sec) Removing tenant_id from the WHERE clause seems to be what should have happened. Not sure why we care what the tenant_id is: MariaDB [neutron] select p.device_id as router_id, p.id as port_id, p.device_owner as port_type from ports p join routers r on (p.device_id=r.id) where (p.device_owner='network:router_interface') OR (p.device_owner='network:router_gateway'); +--+--+--+ | router_id| port_id | port_type| +--+--+--+ | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 23f676d6-7e71-473a-9685-6955ad02d566 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 31412033-c2ac-4843-9cbe-0a580bac2463 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 39dbb96e-3862-4246-990f-4dc5d2d0e524 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 5c2f484f-75cc-4499-b464-7b44eea70376 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 630e7601-2997-447f-ad2c-c14fe5915fc3 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 6af57709-c40a-4070-a92a-b5c4e4895a05 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 796e8e72-0cde-4c02-9a3c-6eb7afc8393f | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 7b280662-37eb-435e-bc20-37b9b824c0b1 | network:router_gateway | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 8db9bfd3-bd4a-4863-92ef-9c2dbdcbbc0f | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 95fdb58d-1eb4-421f-8515-692c5bd22056 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | b19f4f0e-a31c-4fc7-b1f4-7d888ba2786d | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | cf1423db-ed37-4028-be57-a83e9e63803a | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | d0021fde-8cb2-4f52-88ef-a01cdb7104ea | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | edbc477f-a92b-46a5-9ae0-0a529541c248 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | f07d22db-bda7-4b88-aeae-07f63c7e28f4 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | f2e1b38b-f432-4579-8f14-dfeb9a3a4593 | network:router_interface | | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | ffcc8003-f331-47d9-9619-bbaf35d9cb60 | network:router_interface | +--+--+--+ 17 rows in set (0.00 sec) MariaDB [neutron] select * from routerports; +--+--+--+ | router_id| port_id | port_type| +--+--+--+ | 8c5ca5e2-5dc1-4586-8bbe-601a394029fb | 23f676d6-7e71-473a-9685-6955ad02d566 | network:router_interface | |
[Yahoo-eng-team] [Bug 1474467] [NEW] default_schedule_zone should be list
Public bug reported: I'd like to re-open or re-state the issue reported in https://bugs.launchpad.net/nova/+bug/1037371 . Let us say that I have 3 availability zones: nova, az1, az2. I do not care if I land in nova or az1 if no AZ is specified on boot but az2 is special and I do *not* want to land there default. The only way around this that I can think of would be to disable the hypervisors in the az2 AZ and boot to them manually. However, if I disable the nodes in AZ2 I cannot simply boot to az2 and let the scheduler make the appropriate choice about where to schedule the instance. It seems like it would make sense for default_schedule_zone to be a list option or since that might be a pain to keep track of... for there to be a sort of inverse option like excluded_schedule_zones. ** Affects: nova Importance: Undecided Status: New ** Tags: scheduler -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1474467 Title: default_schedule_zone should be list Status in OpenStack Compute (nova): New Bug description: I'd like to re-open or re-state the issue reported in https://bugs.launchpad.net/nova/+bug/1037371 . Let us say that I have 3 availability zones: nova, az1, az2. I do not care if I land in nova or az1 if no AZ is specified on boot but az2 is special and I do *not* want to land there default. The only way around this that I can think of would be to disable the hypervisors in the az2 AZ and boot to them manually. However, if I disable the nodes in AZ2 I cannot simply boot to az2 and let the scheduler make the appropriate choice about where to schedule the instance. It seems like it would make sense for default_schedule_zone to be a list option or since that might be a pain to keep track of... for there to be a sort of inverse option like excluded_schedule_zones. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1474467/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1432873] [NEW] Add FDB bridge entry fails if old entry not removed
Public bug reported: Running on Ubuntu 14.04 with Linuxbridge agent and L2pop with vxlan networks. In situations where remove_fdb_entries messages are lost/never consumed, future add_fdb_bridge_entry attempts will fail with the following example error message: 2015-03-16 21:10:08.520 30207 ERROR neutron.agent.linux.utils [req-390ab63a-9d3c-4d0e-b75b-200e9f5b97c6 None] Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'bridge', 'fdb', 'add', 'fa:16:3e:a5:15:35', 'dev', 'vxlan-15', 'dst', '172.30.100.60'] Exit code: 2 Stdout: '' Stderr: 'RTNETLINK answers: File exists\n' In our case, instances were unable to communicate with their Neutron router because vxlan traffic was being forwarded to the wrong vxlan endpoint. This was corrected by either migrating the router to a new agent or by executing a bridge fdb del for the fdb entry corresponding with the Neutron router mac address. Once deleted, the LB agent added the appropriate fdb entry at the next polling event. If anything is unclear, please let me know. ** Affects: neutron Importance: Undecided Status: New ** Tags: l2-pop lb linuxbridge vxlan -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1432873 Title: Add FDB bridge entry fails if old entry not removed Status in OpenStack Neutron (virtual network service): New Bug description: Running on Ubuntu 14.04 with Linuxbridge agent and L2pop with vxlan networks. In situations where remove_fdb_entries messages are lost/never consumed, future add_fdb_bridge_entry attempts will fail with the following example error message: 2015-03-16 21:10:08.520 30207 ERROR neutron.agent.linux.utils [req-390ab63a-9d3c-4d0e-b75b-200e9f5b97c6 None] Command: ['sudo', '/usr/local/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'bridge', 'fdb', 'add', 'fa:16:3e:a5:15:35', 'dev', 'vxlan-15', 'dst', '172.30.100.60'] Exit code: 2 Stdout: '' Stderr: 'RTNETLINK answers: File exists\n' In our case, instances were unable to communicate with their Neutron router because vxlan traffic was being forwarded to the wrong vxlan endpoint. This was corrected by either migrating the router to a new agent or by executing a bridge fdb del for the fdb entry corresponding with the Neutron router mac address. Once deleted, the LB agent added the appropriate fdb entry at the next polling event. If anything is unclear, please let me know. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1432873/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1313009] [NEW] Memory reported improperly in admin dashboard
Public bug reported: The admin dashboard works with memory totals and usages as integers. This means that, for example, if you have a total of 1.95 TB of memory in your hypervisors you'll see it reported as 1 TB. ** Affects: horizon Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1313009 Title: Memory reported improperly in admin dashboard Status in OpenStack Dashboard (Horizon): New Bug description: The admin dashboard works with memory totals and usages as integers. This means that, for example, if you have a total of 1.95 TB of memory in your hypervisors you'll see it reported as 1 TB. To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1313009/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1302106] [NEW] LDAP non-URL safe characters cause auth failure
Public bug reported: An Openstack user attempting to integrate Keystone with AD has reported that when his user contains a comma (full name CN='Doe, John'), a 'Bad search filter' error is thrown. If the full name CN is instead 'John Doe', authorization succeeds. dpkg -l |grep keystone ii keystone 1:2013.2.2-0ubuntu1~cloud0 OpenStack identity service - Daemons ii python-keystone 1:2013.2.2-0ubuntu1~cloud0 OpenStack identity service - Python library ii python-keystoneclient1:0.3.2-0ubuntu1~cloud0 Client library for OpenStack Identity API Relevant error message: Authorization Failed: An unexpected error prevented the server from fulfilling your request. {'desc': 'Bad search filter'} (HTTP 500) Relevant stack trace: 2014-03-31 15:44:27.459 3018 ERROR keystone.common.wsgi [-] {'desc': 'Bad search filter'} 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi Traceback (most recent call last): 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/common/wsgi.py, line 238, in __call__ 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi result = method(context, **params) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/token/controllers.py, line 94, in authenticate 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi context, auth) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/token/controllers.py, line 272, in _authenticate_local 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi user_id, tenant_id) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/token/controllers.py, line 369, in _get_project_roles_and_ref 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi user_id, tenant_id) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/identity/core.py, line 475, in get_roles_for_user_and_project 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi user_id, tenant_id) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/assignment/core.py, line 160, in get_roles_for_user_and_project 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi group_role_list = _get_group_project_roles(user_id, project_ref) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/assignment/core.py, line 111, in _get_group_project_roles 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi group_refs = self.identity_api.list_groups_for_user(user_id) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/identity/core.py, line 177, in wrapper 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi return f(self, *args, **kwargs) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/identity/core.py, line 425, in list_groups_for_user 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi group_list = driver.list_groups_for_user(user_id) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/identity/backends/ldap.py, line 154, in list_groups_for_user 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi return self.group.list_user_groups(user_dn) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/identity/backends/ldap.py, line 334, in list_user_groups 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi memberships = self.get_all(query) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/common/ldap/core.py, line 388, in get_all 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi for x in self._ldap_get_all(filter)] 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/common/ldap/core.py, line 364, in _ldap_get_all 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi self.attribute_mapping.values()) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/keystone/common/ldap/core.py, line 571, in search_s 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi res = self.conn.search_s(dn, scope, query, attrlist) 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi File /usr/lib/python2.7/dist-packages/ldap/ldapobject.py, line 502, in search_s 2014-03-31 15:44:27.459 3018 TRACE keystone.common.wsgi return self.search_ext_s(base,scope,filterstr,attrlist,attrsonly,None,None,timeout=self.timeout) 2014-03-31 15:44:27.459 3018 TRACE