[Yahoo-eng-team] [Bug 1785425] Re: tempest-full-py3 runs on test-only changes
Reviewed: https://review.openstack.org/589039 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=65cdcf4aaa9aec2fbeb29c9734b07efff32a2c49 Submitter: Zuul Branch:master commit 65cdcf4aaa9aec2fbeb29c9734b07efff32a2c49 Author: ghanshyam Date: Mon Aug 6 07:23:16 2018 + Define irrelevant-files for tempest-full-py3 job tempest-full-py3 job runs on nova check and gate pipeline as part of 'integrated-gate-py35' template. Unlike tempest-full or other job, nova job definition list on project-config side[1] does not define the irrelevant-files for tempest-full-py job which leads to run this job on doc/test file only change also. This commit defines the irrelevant-files for tempest-full-py3 on nova's zuul.yaml. Closes-Bug: #1785425 [1] https://github.com/openstack-infra/project-config/blob/57907fbf04c3f9a69d390efee800d42697faae16/zuul.d/projects.yaml#L9283 Change-Id: I887177c078a53c53e84289a6b134d7ea357dfbef ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1785425 Title: tempest-full-py3 runs on test-only changes Status in OpenStack Compute (nova): Fix Released Bug description: I'm not sure if this is due to the tempest-full rename to tempest- full-py3 but it seems like this didn't used to be an issue. But we now run tempest-full-py3 even on test-only changes, like this change: https://review.openstack.org/#/c/588935/ My guess is we had this handled for tempest-full because of the irrelevant-files list in project-config: https://github.com/openstack-infra/project- config/blob/3d9c5399fdb9b16d3d0391f1d4f32e904db16388/zuul.d/projects.yaml#L9369 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1785425/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785382] Re: GET /resource_providers/{uuid}/allocations doesn't get all the allocations
Reviewed: https://review.openstack.org/57 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e13b765e55758019c243167f0a99c92a0d740400 Submitter: Zuul Branch:master commit e13b765e55758019c243167f0a99c92a0d740400 Author: Tetsuro Nakamura Date: Sat Aug 4 19:22:50 2018 +0900 Not use project table for user table `GET /resource_provider/{uuid}/allocations` API didn't return all the allocations made by multiple users. This was because the placement wrongly used project table for user table. This patch fixes it with the test case. Change-Id: I7c808dec5de1204ced0d1f1b31d8398de8c51679 Closes-Bug: #1785382 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1785382 Title: GET /resource_providers/{uuid}/allocations doesn't get all the allocations Status in OpenStack Compute (nova): Fix Released Bug description: Description === GET /resource_providers/{uuid}/allocations doesn't get all the allocations Reproduce = 1. Set 1 resource provider with some inventories 2. A user (userA) in a project(projectX) makes 1 consumer (Consumer1) allocate on the rp 3. The same user (userA) in the project(projectX) makes another consumer (Consumer2) allocate on the rp 4. Another user (userB) in the project(projectX) makes another consumer (Consumer3) allocate on the rp 5. An admin uses `GET /resource_providers/{rp_uuid}/allocations` to get the consumers allocated. Expected The admin gets 3 consumers for the response, Consumer1, 2 and 3. Actual == The admin gets 2 consumers for the response, Consumer1, 2. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1785382/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785511] Re: cpu_quota does not throttle cpu usage
Consider it ignored! :) ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1785511 Title: cpu_quota does not throttle cpu usage Status in OpenStack Compute (nova): Invalid Bug description: Environment is Kolla-Ansible 6.1.0 deployed (Queens) on physical hardware (Xeon Gold 6154 compute nodes). Deploying an image with the following flavor (for a "nano" sized instance) does not throttle the single vCPU to 1/10th of a physical thread or core. The VM has full performance of a single physical thread. openstack flavor create --property cpu_quota=1 --property cpu_period=10 --vcpus=1 --ram 512 --disk 0 --public "t5.test" virsh on the compute node lists the following scheduling info: virsh # edit 25 Domain instance-001e XML configuration not changed. virsh # schedinfo 25 Scheduler : posix cpu_shares : 1024 vcpu_period: 10 vcpu_quota : -1 emulator_period: 10 emulator_quota : -1 global_period : 10 global_quota : -1 iothread_period: 10 iothread_quota : -1 Manually forcing the vcpu_quota to 1 (instead of 10), results in the VM being throttled to 10% of a physical thread or core: schedinfo 25 --set vcpu_quota=1 NOTE that Libvirt uses vcpu_quota (with a "v") whereas the openstack flavor properties use cpu_quota. So, I tried to use vcpu_quota and vcpu_period in the OpenStack flavor properties, but this did not affect Libvirt's XML config file. Am I missing something or is this a bug? Thanks! Eric To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1785511/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785193] Re: changing a node's cell results in duplicate hypervisors
Changing the database like this isn't really supported. You'd likely need to delete the old cell and map the hosts to a new cell using the nova-manage cell_v2 * commands like create_cell, map_instances and discover_hosts. ** Tags added: cells ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1785193 Title: changing a node's cell results in duplicate hypervisors Status in OpenStack Compute (nova): Invalid Bug description: Description === Changing a node's cell through nova.conf and then restarting nova-compute will result in duplicate hypervisor records for that node. Steps to reproduce == 1. edit nova.conf on a compute node, to update [database].connection and [DEFAULT].transport_url to another cell's. 2 restart nova-compute service 3 nova hypervisor-list Expected result === Not sure what to expect yet. Actual result = +--+-+---+-+ | ID | Hypervisor hostname | State | Status | +--+-+---+-+ | a40d3d87-21c0-5728-a9ea-c2bdc544a148 | compute | up| enabled | | b87f790c-4160-4540-b4a9-2dbcfbe9d019 | compute | up| enabled | +--+-+---+-+ The following error is also captured in nova-compute.log: nova.scheduler.client.report [req-ee9c94db-9ea6-7358-97a7-444a31db3a2c - - - - -] [req-7b4fa3d6-3f17-5895-a3c5-54562d02b10b] Failed to create resource provider record in placement API for UUID a40d3d87-21c0-5728 -a9ea-c2bdc544a148. Got 409: {"errors": [{"status": 409, "request_id": "req-7b4fa3d6-3f17-5895-a3c5-54562d02b10b", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: compute already exists. ", "title": "Conflict"}]}. Environment === openstack-nova-compute-17.0.5-1.el7.noarch (queens release) Logs & Configs == not related To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1785193/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785164] Re: Identity API v3: POST method for "Create Limits" is abnormal for a domain-id
Reviewed: https://review.openstack.org/588460 Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=bf3a8c09a095c2b037f114ba4770b59d6351c73b Submitter: Zuul Branch:master commit bf3a8c09a095c2b037f114ba4770b59d6351c73b Author: wangxiyuan Date: Fri Aug 3 15:33:44 2018 +0800 Do not allow create limits for domain Keystone now doesn't support domain-level limits. When creating limits, if the input project_id is a domain id, it should not be allowed. Change-Id: Ifafd96113499d533341870960f294dd5fada477d Closes-Bug: #1785164 ** Changed in: keystone Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1785164 Title: Identity API v3: POST method for "Create Limits" is abnormal for a domain-id Status in OpenStack Identity (keystone): Fix Released Bug description: /v3/limits POST method for "Create Limits" When setting a domain-id for "project_id" in request body, it still successfully creates a limit for the domain. It is strange since we use a "project_id" to create limits for domain. My test request was: { "limits":[ { "service_id": "10656fdd41e1429f8cb57f097935f327", "project_id": "default", "region_id": "RegionOne", "resource_name": "snapshot", "resource_limit": 5 }, { "service_id": "10656fdd41e1429f8cb57f097935f327", "project_id": "default", "resource_name": "volume", "resource_limit": 10, "description": "Number of volumes for project" } ] } and it successfully returned 201 - Created. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1785164/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785189] Re: Floatingip and router bandwidth speed limit failure
Hi, As already pointed out by Liu Yulong, Newton release does not support this feature. Marking bug invalid. Please don't change it again unless there is an explanation and further details added to this bug justifying that action ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1785189 Title: Floatingip and router bandwidth speed limit failure Status in neutron: Invalid Bug description: Environment version: centos7.4 Neutron version: newton I have added these L3 QoS patches into newton branch: https://review.openstack.org/#/c/453458/ https://review.openstack.org/#/c/424466/ https://review.openstack.org/#/c/521079/ But I don't think these patch is useful. For large bandwidths, the speed limit does not work at all.As long as the router speed limit, floatingip speed limit, scp file has been falling from 2Mbps, and finally interrupted. The iperf test is extremely unstable, sometimes 10 Mbps, sometimes 0bps. For example,The rate limit rule of the router is limited to 1 Gbps, router netns is iperf client, controller code is iperf server. Here is test result: [root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 tc -s -p filter show dev qg-d2e58140-fa filter parent 1: protocol ip pref 1 u32 filter parent 1: protocol ip pref 1 u32 fh 800: ht divisor 1 filter parent 1: protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid :1 (rule hit 7557 success 7525) match IP src 172.18.0.133/32 (success 7525 ) police 0x15a rate 1024Mbit burst 100Mb mtu 2Kb action drop overhead 0b ref 1 bind 1 Sent 12795449 bytes 8549 pkts (dropped 969, overlimits 969) iperf tests: [root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 iperf3 -c 172.18.0.4 -i 1 Connecting to host 172.18.0.4, port 5201 [ 4] local 172.18.0.133 port 51674 connected to 172.18.0.4 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 119 KBytes 972 Kbits/sec 18 2.83 KBytes [ 4] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec5 2.83 KBytes [ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec5 2.83 KBytes [ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec5 2.83 KBytes [ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec5 2.83 KBytes [ 4] 5.00-6.00 sec 63.6 KBytes 522 Kbits/sec 37 2.83 KBytes [ 4] 6.00-7.00 sec 1.64 MBytes 13.7 Mbits/sec 336 4.24 KBytes [ 4] 7.00-8.00 sec 1.34 MBytes 11.2 Mbits/sec 279 2.83 KBytes [ 4] 8.00-9.00 sec 1.96 MBytes 16.5 Mbits/sec 406 2.83 KBytes [ 4] 9.00-10.00 sec 334 KBytes 2.73 Mbits/sec 75 2.83 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 5.44 MBytes 4.56 Mbits/sec 1171 sender [ 4] 0.00-10.00 sec 5.34 MBytes 4.48 Mbits/sec receiver iperf Done. It is normal to use the command to delete the tc rule and do the bandwidth test. [root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 tc filter del dev qg-d2e58140-fa parent 1: prio 1 handle 800::800 u32 [root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 tc -s -p filter show dev qg-d2e58140-fa [root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 iperf3 -c 172.18.0.4 -i 1 Connecting to host 172.18.0.4, port 5201 [ 4] local 172.18.0.133 port 47530 connected to 172.18.0.4 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 88.2 MBytes 740 Mbits/sec1407 KBytes [ 4] 1.00-2.00 sec 287 MBytes 2.41 Gbits/sec 354491 KBytes [ 4] 2.00-3.00 sec 1.04 GBytes 8.94 Gbits/sec 1695932 KBytes [ 4] 3.00-4.00 sec 1008 MBytes 8.45 Gbits/sec 4233475 KBytes [ 4] 4.00-5.00 sec 1.03 GBytes 8.85 Gbits/sec 1542925 KBytes [ 4] 5.00-6.00 sec 1008 MBytes 8.45 Gbits/sec 4507748 KBytes [ 4] 6.00-7.00 sec 1.05 GBytes 9.06 Gbits/sec 1550798 KBytes [ 4] 7.00-8.00 sec 1.06 GBytes 9.08 Gbits/sec 1251933 KBytes [ 4] 8.00-9.00 sec 1.02 GBytes 8.77 Gbits/sec 3595942 KBytes [ 4] 9.00-10.00 sec 1024 MBytes 8.59 Gbits/sec 3867897 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 8.54 GBytes 7.33 Gbits/sec 22595 sender [ 4] 0.00-10.00 sec 8.54 GBytes 7.33 Gbits/sec receiver iperf Done. I am not sure if it i
[Yahoo-eng-team] [Bug 1785668] [NEW] nova-compute doesn't check image signature if imagecache exists
Public bug reported: Description === nova-compute doesn't verify image signature/certificate in barbican component if local imagecache exists for this image on compute node. Steps to reproduce == Preconditions: Nova, Glance, Barbican components (Pike) are installed with default settings and policy.json. Environment has 1 compute node (to simplify the case). * Create signed glance image. Please follow https://docs.openstack.org/glance/pike/user/signature.html * Create separate project and user with "member" role in it. * Login as member user and try to boot VM from your signed image. Actual and expected result: VM is not booted. Error: Server failed to build and is in ERROR status Details: {u'message': u'Build of instance aborted: Signature verification for the image failed: Unable to retrieve certificate with ID: .', u'code': 500, u'created': u'2018-07-18T15:53:15Z'} * Login as admin. Boot VM from the image. Actual and expected result: VM is Active. * Login as member user again. Boot VM from the image. Actual result: VM is Active. Expected result: User doesn't have enough rights to boot VM, because image cannot be verified (cannot retrieve certificate from barbican). However, since compute node has imagecache of this image, nova-compute boots VM. On compute node: ls -la /var/lib/nova/instances/_base/ total 38424 drwxr-xr-x 2 nova nova 4096 Aug 5 17:12 . drwxr-xr-x 7 nova nova 4096 Aug 6 16:34 .. -rw-r--r-- 1 libvirt-qemu kvm41126400 Aug 6 16:32 5dfc15a8b8ab3ac68ff5d442fed2564adbaa4149 Environment === Openstack Pike, nova 2:16.1.3-1~u16.04 python-novaclient 2:9.1.1-1~u16.04 qemu-kvm 1:2.11+dfsg-1.4~u16.04 libvirt 4.0.0-1.7~u16.04 python-libvirt 3.5.0-1.1~u16.04 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1785668 Title: nova-compute doesn't check image signature if imagecache exists Status in OpenStack Compute (nova): New Bug description: Description === nova-compute doesn't verify image signature/certificate in barbican component if local imagecache exists for this image on compute node. Steps to reproduce == Preconditions: Nova, Glance, Barbican components (Pike) are installed with default settings and policy.json. Environment has 1 compute node (to simplify the case). * Create signed glance image. Please follow https://docs.openstack.org/glance/pike/user/signature.html * Create separate project and user with "member" role in it. * Login as member user and try to boot VM from your signed image. Actual and expected result: VM is not booted. Error: Server failed to build and is in ERROR status Details: {u'message': u'Build of instance aborted: Signature verification for the image failed: Unable to retrieve certificate with ID: .', u'code': 500, u'created': u'2018-07-18T15:53:15Z'} * Login as admin. Boot VM from the image. Actual and expected result: VM is Active. * Login as member user again. Boot VM from the image. Actual result: VM is Active. Expected result: User doesn't have enough rights to boot VM, because image cannot be verified (cannot retrieve certificate from barbican). However, since compute node has imagecache of this image, nova-compute boots VM. On compute node: ls -la /var/lib/nova/instances/_base/ total 38424 drwxr-xr-x 2 nova nova 4096 Aug 5 17:12 . drwxr-xr-x 7 nova nova 4096 Aug 6 16:34 .. -rw-r--r-- 1 libvirt-qemu kvm41126400 Aug 6 16:32 5dfc15a8b8ab3ac68ff5d442fed2564adbaa4149 Environment === Openstack Pike, nova 2:16.1.3-1~u16.04 python-novaclient 2:9.1.1-1~u16.04 qemu-kvm 1:2.11+dfsg-1.4~u16.04 libvirt 4.0.0-1.7~u16.04 python-libvirt 3.5.0-1.1~u16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1785668/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785606] Re: When working with python3 header banner is not rendered properly
** Also affects: vitrage-dashboard Importance: Undecided Status: New ** Changed in: vitrage-dashboard Assignee: (unassigned) => Yuval Adar (yadar) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1785606 Title: When working with python3 header banner is not rendered properly Status in OpenStack Dashboard (Horizon): In Progress Status in Vitrage Dashboard: New Bug description: When running horizon under python 3, the header banner response returned is in bytes format, and python 3 interprets it different from python 2, thus breaking the banner completely To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1785606/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785656] [NEW] test_internal_dns.InternalDNSTest fails even though dns-integration extension isn't loaded
Public bug reported: We're seeing this on the Networking-ODL CI [1]. The test neutron_tempest_plugin.scenario.test_internal_dns.InternalDNSTest is being executed even though there's a decorator to prevent it from running [2] Either the checker isn't working or something is missing, since other DNS tests are being skipped automatically due to the extension not being loaded. [1] http://logs.openstack.org/91/584591/5/check/networking-odl-tempest-oxygen/df17c02/ [2] http://git.openstack.org/cgit/openstack/neutron-tempest-plugin/tree/neutron_tempest_plugin/scenario/test_internal_dns.py#n28 ** Affects: networking-odl Importance: Critical Status: Confirmed ** Affects: neutron Importance: High Status: Confirmed ** Also affects: networking-odl Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1785656 Title: test_internal_dns.InternalDNSTest fails even though dns-integration extension isn't loaded Status in networking-odl: Confirmed Status in neutron: Confirmed Bug description: We're seeing this on the Networking-ODL CI [1]. The test neutron_tempest_plugin.scenario.test_internal_dns.InternalDNSTest is being executed even though there's a decorator to prevent it from running [2] Either the checker isn't working or something is missing, since other DNS tests are being skipped automatically due to the extension not being loaded. [1] http://logs.openstack.org/91/584591/5/check/networking-odl-tempest-oxygen/df17c02/ [2] http://git.openstack.org/cgit/openstack/neutron-tempest-plugin/tree/neutron_tempest_plugin/scenario/test_internal_dns.py#n28 To manage notifications about this bug go to: https://bugs.launchpad.net/networking-odl/+bug/1785656/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1773102] Re: Abnormal request id in logs
Reviewed: https://review.openstack.org/587772 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=233ea582f7b58b3188bfa523264e9052eefd00e5 Submitter: Zuul Branch:master commit 233ea582f7b58b3188bfa523264e9052eefd00e5 Author: Radoslav Gerganov Date: Wed Aug 1 13:54:31 2018 +0300 Reload oslo_context after calling monkey_patch() oslo.context is storing a global thread-local variable which keeps the request context for the current thread. If oslo.context is imported before calling monkey_patch(), then this thread-local won't be green and instead of having one request per green thread, we will have one request object which will be overwritten every time when a new context is created. To workaround the problem, always reload oslo_context.context after calling monkey_patch() to make sure it uses green thread locals. Change-Id: Id059e5576c3fc78dd893fde15c963e182f1157f6 Closes-Bug: #1773102 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1773102 Title: Abnormal request id in logs Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) pike series: Triaged Status in OpenStack Compute (nova) queens series: Triaged Bug description: Description === After VM creation, the request id related to periodic tasks in nova-compute.log is changed to the same as the request id related to the VM creation task. Steps to reproduce == * nova boot xxx * check the nova-compute.log on the compute node hosting the VM * search the request id related to VM creation task Expected result === The request id related to periodic tasks should be different from the request id related to VM creation task. Actual result = The request id related to periodic task is changed to the same as the request id related to VM creation task after VM creation task is handled. Environment === 1. OpenStack version OS: CentOS nova version: openstack-nova-compute-17.0.2-1.el7.noarch 2. hypervisor Libvirt + QEMU 2. storage type LVM 3. Which networking type did you use? Neutron with Linuxbridge Logs & Configs == 1. before nova-compute handling VM creation task: 2018-05-24 03:08:15.264 27469 DEBUG oslo_service.periodic_task [req-c63d0555-7bf1-42da-abb7-556cc9eede8c 809cb6c22acc445c843db1a806d4e817 68b078adaf13420391fdb0fde1608816 - default default] Running periodic task ComputeManager._reclaim_queued_deletes run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215 2018-05-24 03:08:15.265 27469 DEBUG nova.compute.manager [req-c63d0555-7bf1-42da-abb7-556cc9eede8c 809cb6c22acc445c843db1a806d4e817 68b078adaf13420391fdb0fde1608816 - default default] CONF.reclaim_instance_interval <= 0, skipping... _reclaim_queued_deletes /usr/lib/python2.7/site-packages/nova/compute/manager.py:7238 2018-05-24 03:08:18.269 27469 DEBUG oslo_service.periodic_task [req-c63d0555-7bf1-42da-abb7-556cc9eede8c 809cb6c22acc445c843db1a806d4e817 68b078adaf13420391fdb0fde1608816 - default default] Running periodic task ComputeManager._sync_scheduler_instance_info run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215 2. begin to handle VM creation task: 2018-05-24 03:08:26.244 27469 DEBUG oslo_concurrency.lockutils [req-2d5b3957-9749-40ba-9b94-e8260c7145bf 9de813eb53ba4ac982a37df462783d5d 3ce4f026aed1411baa6e8013b13f9257 - default default] Lock "a0ded3b0-0e60-4d82-a516-588871c4917c" acquired by "nova.compute.manager._locked_do_build_and_run_instance" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:273 2018-05-24 03:08:26.312 27469 DEBUG oslo_service.periodic_task [req-2d5b3957-9749-40ba-9b94-e8260c7145bf 9de813eb53ba4ac982a37df462783d5d 3ce4f026aed1411baa6e8013b13f9257 - default default] Running periodic task ComputeManager._heal_instance_info_cache run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215 2018-05-24 03:08:26.312 27469 DEBUG nova.compute.manager [req-2d5b3957-9749-40ba-9b94-e8260c7145bf 9de813eb53ba4ac982a37df462783d5d 3ce4f026aed1411baa6e8013b13f9257 - default default] Starting heal instance info cache _heal_instance_info_cache /usr/lib/python2.7/site-packages/nova/compute/manager.py:6572 2018-05-24 03:08:26.312 27469 DEBUG nova.compute.manager [req-2d5b3957-9749-40ba-9b94-e8260c7145bf 9de813eb53ba4ac982a37df462783d5d 3ce4f026aed1411baa6e8013b13f9257 - default default] Rebuilding the list of instances to heal _heal_instance_info_cache /usr/lib/python2.7/site-packages/nova/compute/manager.py:6576 2018-05-24 03:08:26.334 27469 DEBUG nova.comp
[Yahoo-eng-team] [Bug 1667756] Re: Backup HA router sending traffic, traffic from switch interrupted
** Also affects: neutron (Ubuntu) Importance: Undecided Status: New ** Also affects: neutron (Ubuntu Xenial) Importance: Undecided Status: New ** Changed in: neutron (Ubuntu Xenial) Status: New => Triaged ** Changed in: neutron (Ubuntu Xenial) Importance: Undecided => High ** Changed in: neutron (Ubuntu) Status: New => Invalid ** Also affects: cloud-archive Importance: Undecided Status: New ** Also affects: cloud-archive/mitaka Importance: Undecided Status: New ** Changed in: cloud-archive/mitaka Status: New => Triaged ** Changed in: cloud-archive/mitaka Importance: Undecided => High ** Changed in: cloud-archive Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1667756 Title: Backup HA router sending traffic, traffic from switch interrupted Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Triaged Status in neutron: Fix Released Status in neutron package in Ubuntu: Invalid Status in neutron source package in Xenial: Triaged Bug description: As outlined in https://review.openstack.org/#/c/142843/, backup HA routers should not send any traffic. Any traffic will cause the connected switch to learn a new port for the associated src mac address since the mac address will be in use on the primary HA router. We are observing backup routers sending IPv6 RA and RS messages probably in response to incoming IPv6 RA messages. The subnets associated with the HA routers are not intended for IPv6 traffic. A typical traffic sequence is: Packet from external switch... 08:81:f4:a6:dc:01 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 110: (hlim 255, next-header ICMPv6 (58) payload length: 56) fe80:52:0:136c::fe > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 56 Immediately followed by a packet from the backup HA router... fa:16:3e:a7:ae:63 > 33:33:ff:a7:ae:63, ethertype IPv6 (0x86dd), length 86: (hlim 1, next-header Options (0) payload length: 32) :: > ff02::1:ffa7:ae63: HBH (rtalert: 0x) (padn) [icmp6 sum ok] ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ffa7:ae63 Another pkt... fa:16:3e:a7:ae:63 > 33:33:ff:a7:ae:63, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) :: > ff02::1:ffa7:ae63: [icmp6 sum ok] ICMP6, neighbor solicitation, length 24, who has 2620:52:0:136c:f816:3eff:fea7:ae63 Another Pkt... fa:16:3e:a7:ae:63 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 86: (hlim 255, next-header ICMPv6 (58) payload length: 32) At this point, the switch has updated its mac table and traffic to the fa:16:3e:a7:ae:63 address has been redirected to the backup host. SSH/ping traffic resumes at a later time when the primary router node sends traffic with the fa:16:3e:a7:ae:63 source address. This problem is reproducible in our environment as follows: 1. Deploy OSP10 2. Create external network 3. Create external subnet (IPv4) 4. Create an internal network and VM 5. Attach floating ip 6. ssh into the VM through the FIP or ping the FIP 7. you will start to see ssh freeze or the ping fail occasionally Additional info: Setting accept_ra=0 on the backup host routers stops the problem from happening. Unfortunately, on a reboot, we loose the setting. The current sysctl files have accept_ra=0. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1667756/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1784713] Re: cloud-init profile.d files use bash-specific builtin "local"
** Also affects: cloud-init Importance: Undecided Status: New ** Changed in: cloud-init Importance: Undecided => Low ** Changed in: cloud-init Status: New => Fix Committed ** Changed in: cloud-init Assignee: (unassigned) => Scott Moser (smoser) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1784713 Title: cloud-init profile.d files use bash-specific builtin "local" Status in cloud-init: Fix Committed Status in cloud-init package in Ubuntu: Confirmed Bug description: /etc/profile, which is read by the Bourne-like shells (including ksh), runs the scripts in /etc/profile.d. The scripts from the cloud-init package Z99-cloudinit-warnings.sh and Z99-cloud-locale-test.sh use the bash-builtin "local" which is not recognized by ksh resulting in these errors at login: /etc/profile[23]: .[141]: local: not found [No such file or directory] /etc/profile[23]: .[142]: local: not found [No such file or directory] /etc/profile[23]: .[143]: local: not found [No such file or directory] /etc/profile[23]: .[178]: local: not found [No such file or directory] /etc/profile[23]: .[179]: local: not found [No such file or directory] $ grep -n local\ Z99* Z99-cloudinit-warnings.sh:7:local warning="" idir="/var/lib/cloud/instance" n=0 Z99-cloudinit-warnings.sh:8:local warndir="$idir/warnings" Z99-cloudinit-warnings.sh:9:local ufile="$HOME/.cloud-warnings.skip" sfile="$warndir/.skip" Z99-cloud-locale-test.sh:14:local bad_names="" bad_lcs="" key="" val="" var="" vars="" bad_kv="" Z99-cloud-locale-test.sh:15:local w1 w2 w3 w4 remain Z99-cloud-locale-test.sh:56:local bad invalid="" to_gen="" sfile="/usr/share/i18n/SUPPORTED" Z99-cloud-locale-test.sh:57:local pkgs="" Z99-cloud-locale-test.sh:70:local pkgs="" ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: cloud-init 18.2-27-g6ef92c98-0ubuntu1~18.04.1 ProcVersionSignature: Ubuntu 4.15.0-29.31-generic 4.15.18 Uname: Linux 4.15.0-29-generic x86_64 ApportVersion: 2.20.9-0ubuntu7.2 Architecture: amd64 CloudName: Other Date: Tue Jul 31 20:04:30 2018 PackageArchitecture: all ProcEnviron: TERM=xterm PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/ksh SourcePackage: cloud-init UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1784713/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785615] [NEW] DNS resolution through eventlet contact nameservers if there's an IPv4 or IPv6 entry present in hosts file
Public bug reported: When trying to resolve a hostname on a node with no nameservers configured and only one entry is present for it in /etc/hosts (IPv4 or IPv6), eventlet will try to fetch the other entry over the network. This changes the behavior from what the original getaddrinfo() implementation does and causes 30 second delays and often timeouts when, for example, metadata agent tries to contact Nova [0]. Here it's a simple reproducer which shows the behavior when we do the monkey patching: import eventlet import socket import time print socket.getaddrinfo('overcloud.internalapi.localdomain', 80, 0, socket.SOCK_STREAM) print time.time() eventlet.monkey_patch() print socket.getaddrinfo('overcloud.internalapi.localdomain', 80, 0, socket.SOCK_STREAM) print time.time() Eventlet issue reported here [1] and fix got merged in master branch. [0] https://github.com/openstack/neutron/blob/13.0.0.0b3/neutron/agent/metadata/agent.py#L189 [1] https://github.com/eventlet/eventlet/issues/511 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1785615 Title: DNS resolution through eventlet contact nameservers if there's an IPv4 or IPv6 entry present in hosts file Status in neutron: New Bug description: When trying to resolve a hostname on a node with no nameservers configured and only one entry is present for it in /etc/hosts (IPv4 or IPv6), eventlet will try to fetch the other entry over the network. This changes the behavior from what the original getaddrinfo() implementation does and causes 30 second delays and often timeouts when, for example, metadata agent tries to contact Nova [0]. Here it's a simple reproducer which shows the behavior when we do the monkey patching: import eventlet import socket import time print socket.getaddrinfo('overcloud.internalapi.localdomain', 80, 0, socket.SOCK_STREAM) print time.time() eventlet.monkey_patch() print socket.getaddrinfo('overcloud.internalapi.localdomain', 80, 0, socket.SOCK_STREAM) print time.time() Eventlet issue reported here [1] and fix got merged in master branch. [0] https://github.com/openstack/neutron/blob/13.0.0.0b3/neutron/agent/metadata/agent.py#L189 [1] https://github.com/eventlet/eventlet/issues/511 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1785615/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1777475] Re: Undercloud vm in state error after update of the undercloud.
** Changed in: tripleo Status: Triaged => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1777475 Title: Undercloud vm in state error after update of the undercloud. Status in OpenStack Compute (nova): New Status in tripleo: Fix Released Bug description: Hi, after an update of the undercloud, the undercloud vm is in error: [stack@undercloud-0 ~]$ openstack server list +--+--+++++ | ID | Name | Status | Networks | Image | Flavor | +--+--+++++ | 9f80c38a-9f33-4a18-88e0-b89776e62150 | compute-0| ERROR | ctlplane=192.168.24.18 | overcloud-full | compute| | e87efe17-b939-4df2-af0c-8e2effd58c95 | controller-1 | ERROR | ctlplane=192.168.24.9 | overcloud-full | controller | | 5a3ea20c-75e8-49fe-90b6-edad01fc0a48 | controller-2 | ERROR | ctlplane=192.168.24.13 | overcloud-full | controller | | ba0f26e7-ec2c-4e61-be8e-05edf00ce78a | controller-0 | ERROR | ctlplane=192.168.24.8 | overcloud-full | controller | +--+--+++++ Originally found starting there https://bugzilla.redhat.com/show_bug.cgi?id=1590297#c14 It boils down to a ordering issue between openstack-ironic-conductor and openstack-nova-compute, a simple reproducer is: sudo systemctl stop openstack-ironic-conductor sudo systemctl restart openstack-nova-compute on the undercloud. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1777475/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785608] [NEW] [RFE] neutron ovs agent support baremetal port using smart nic
Public bug reported: Problem description === While Ironic today supports Neutron provisioned network connectivity for Bare-Metal servers through ML2 mechanism driver, the existing support is based largely on configuration of TORs through vendor-specific mechanism drivers, with limited capabilities. Proposed change === There is a wide range of smart/intelligent NICs emerging on the market. These NICs generally incorporate one or more general purpose CPU cores along with data-plane packet processing accelerations, and can efficiently run virtual switches such as OVS, while maintaining the existing interfaces to the SDN layer. The goal is to enable running the standard Neutron Open vSwitch L2 agent, providing a generic, vendor-agnostic bare metal networking service with feature parity compared to the virtualization use-case. * Neutron ml2 ovs changes: Update the neuton ml2 ovs to bind bare metal port with smartNIC flag in the binding profile. * Neutron ovs agent changes: Example of SmartNIC model:: +---+ |Server | | | | +A | +--|+ | | +--|+ |SmartNIC | |+-+B-+ | ||OVS | | |+-+C-+ | +--|+ | A - port on the baremetal B - port that represent the baremetal port in the SmartNIC C - port to the wire Add/Remove Port B to the ovs br-int with external-ids This part is mimc the nova-compute that plug the port to the ovs bridge. The external-ids information is: 'external-ids:iface-id=%s' % port_id 'external-ids:iface-status=active' 'external-ids:attached-mac=%s' % ironic_port.address 'external-ids:node-uuid=%s' % node_uuid ** Affects: neutron Importance: Undecided Status: New ** Description changed: Problem description === While Ironic today supports Neutron provisioned network connectivity for Bare-Metal servers through ML2 mechanism driver, the existing support is based largely on configuration of TORs through vendor-specific mechanism drivers, with limited capabilities. - Proposed change === There is a wide range of smart/intelligent NICs emerging on the market. These NICs generally incorporate one or more general purpose CPU cores along with data-plane packet processing accelerations, and can efficiently run virtual switches such as OVS, while maintaining the existing interfaces to the SDN layer. The goal is to enable running the standard Neutron Open vSwitch L2 agent, providing a generic, vendor-agnostic bare metal networking service with feature parity compared to the virtualization use-case. * Neutron ml2 ovs changes: - Update the neuton ml2 ovs to bind bare metal port with smartNIC flag in the - binding profile. + Update the neuton ml2 ovs to bind bare metal port with smartNIC flag in the + binding profile. * Neutron ovs agent changes: Example of SmartNIC model:: - +---+ - |Server | - | | - | +A | - +--|+ - | - | - +--|+ - |SmartNIC | - |+-+B-+ | - ||OVS | | - |+-+C-+ | - +--|+ - | + +---+ + |Server | + | | + | +A | + +--|+ + | + | + +--|+ + |SmartNIC | + |+-+B-+ | + ||OVS | | + |+-+C-+ | + +--|+ + | - A - port on the baremetal - B - port that represent the baremetal port in the SmartNIC - C - port to the wire + A - port on the baremetal + B - port that represent the baremetal port in the SmartNIC + C - port to the wire - Add/Remove Port B to the ovs br-int with external-ids - This part is mimc the nova-compute that plug the port to the ovs bridge. - The external-ids information is: + Add/Remove Port B to the ovs br-int with external-ids + This part is mimc the nova-compute that plug the port to the ovs bridge. + The external-ids information is: - 'external-ids:iface-id=%s' % port_id - 'external-ids:iface-status=active' - 'external-ids:attached-mac=%s' % ironic_port.address - 'external-ids:node-uuid=%s' % node_uuid + 'external-ids:iface-id=%s' % port_id + 'external-ids:iface-status=active' + 'external-ids:attached-mac=%s' % ironic_port.address + 'external-ids:node-uuid=%s' % node_uuid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1785608 Title: [RFE] neutron ovs agent support baremetal port using smart nic Status in neutron: New Bug description: Problem description === While Ironic today supports Neutron provisioned network connectivity for Bare-Metal
[Yahoo-eng-team] [Bug 1785606] [NEW] When working with python3 header banner is not rendered properly
Public bug reported: When running horizon under python 3, the header banner response returned is in bytes format, and python 3 interprets it different from python 2, thus breaking the banner completely ** Affects: horizon Importance: Undecided Assignee: Yuval Adar (yadar) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1785606 Title: When working with python3 header banner is not rendered properly Status in OpenStack Dashboard (Horizon): In Progress Bug description: When running horizon under python 3, the header banner response returned is in bytes format, and python 3 interprets it different from python 2, thus breaking the banner completely To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1785606/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1745405] Re: tempest-full job triggered for irrelevant changes
Marking it Invalid in Nova as it was fixed in by zuul ** Changed in: nova Status: In Progress => Invalid ** Changed in: nova Importance: Undecided => Low -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1745405 Title: tempest-full job triggered for irrelevant changes Status in OpenStack Compute (nova): Invalid Bug description: The nova section of the project-config defines a list of irrelevant- files for the tempest-full job execution [1]. The tempest-full job's grandparent job (ancestor hierarchy: tempest-full -> devstack-tempest -> devstack) in the devstack repo defines a different list of irrelevant-files [2]. Job inheritance does not allow overriding the irrelevant-files in child jobs. This means that the tempest-full job now triggered on patches only changing irrelevant-files (like nova/tests/...) in nova. See an exmaple in [3] where an irrelevant file only change actually triggered tempest-full. [1] https://github.com/openstack-infra/project-config/blob/5ddbd62a46e17dd2fdee07bec32aa65e3b637ff3/zuul.d/projects.yaml#L10674-L10688 [2] https://github.com/openstack-dev/devstack/blob/614cab33c40159f0bc10d92c9f8dc3f9783708d9/.zuul.yaml#L83-L90 [3] https://review.openstack.org/#/c/537936/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1745405/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1745431] Re: neutron-grenade job is triggered for irrelevant changes
Marking as invalid in Nova as it is fixed by zuul ** Changed in: nova Importance: Undecided => Low ** Changed in: nova Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1745431 Title: neutron-grenade job is triggered for irrelevant changes Status in OpenStack Compute (nova): Invalid Bug description: The nova section of the project-config defines a list of irrelevant- files for the neutron-grenade job execution [1]. The neutron-grenade original definition in the neutron repo defines a different list of irrelevant-files [2]. Job redefinition does not allow overriding the irrelevant-files in the new job. This means that the neutron-grenade job now triggered on patches only changing irrelevant-files (like nova/tests/...) in nova. See an example in [3] where an irrelevant- file only change actually triggered neutron-grenade. [1] https://github.com/openstack-infra/project- config/blob/5ddbd62a46e17dd2fdee07bec32aa65e3b637ff3/zuul.d/projects.yaml#L10689-L10703 [2] https://github.com/openstack/neutron/blob/7e3d6a18fb928bcd303a44c1736d0d6ca9c7f0ab/.zuul.yaml#L261-L270 [3] https://review.openstack.org/#/c/537936/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1745431/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785582] [NEW] Connectivity to instance after L3 router migration from Legacy to HA fails
Public bug reported: Scenario test neutron.tests.tempest.scenario.test_migration.NetworkMigrationFromLegacy.test_from_legacy_to_ha fails because of no connectivity to VM after migration. We observed it on Pike version mostly but I think that the same issue might be also in newer versions. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/scenario/test_migration.py", line 68, in test_from_legacy_to_ha after_dvr=False, after_ha=True) File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/scenario/test_migration.py", line 55, in _test_migration self._check_connectivity() File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/scenario/test_dvr.py", line 29, in _check_connectivity self.keypair['private_key']) File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/scenario/base.py", line 204, in check_connectivity ssh_client.test_connection_auth() File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 207, in test_connection_auth connection = self._get_ssh_connection() File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 121, in _get_ssh_connection password=self.password) tempest.lib.exceptions.SSHTimeout: Connection to the 10.0.0.224 via SSH timed out. User: cirros, Password: None >From my investigation it looks that it is because of race between two >different operations on router. 1. router is switched to admin_state down, so port is set to DOWN also, 2. neutron-server got info from ovs agent that port is down 3. but now, other thread changes router from legacy to ha so owner of this port changes from DEVICE_OWNER_ROUTER_INTF to DEVICE_OWNER_HA_REPLICATED_INT and also router is still "on" this host (as it's now backup node for router) so in https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L258 l2pop says: ok, I'm not sending remove_fdb_entries to this mac address on this port and old entries are still on other nodes :/ because later when this port is up on different host (new master node) add_fdb_entries is also not send to hosts because of https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L307 which was added in https://github.com/openstack/neutron/commit/26d8702b9d7cc5a4293b97bc435fa85983be9f01 I tried to run this tests with waiting until router's port will be really down before calling migration to HA and then it passed 151 times for me. So it clearly shows that this is an issue here. I think that it should be fixed in neutron's code instead of test as this isn't test-only issue. ** Affects: neutron Importance: Medium Assignee: Slawek Kaplonski (slaweq) Status: Confirmed ** Tags: l3-ha ** Changed in: neutron Assignee: (unassigned) => Slawek Kaplonski (slaweq) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1785582 Title: Connectivity to instance after L3 router migration from Legacy to HA fails Status in neutron: Confirmed Bug description: Scenario test neutron.tests.tempest.scenario.test_migration.NetworkMigrationFromLegacy.test_from_legacy_to_ha fails because of no connectivity to VM after migration. We observed it on Pike version mostly but I think that the same issue might be also in newer versions. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/scenario/test_migration.py", line 68, in test_from_legacy_to_ha after_dvr=False, after_ha=True) File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/scenario/test_migration.py", line 55, in _test_migration self._check_connectivity() File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/scenario/test_dvr.py", line 29, in _check_connectivity self.keypair['private_key']) File "/usr/lib/python2.7/site-packages/neutron/tests/tempest/scenario/base.py", line 204, in check_connectivity ssh_client.test_connection_auth() File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 207, in test_connection_auth connection = self._get_ssh_connection() File "/usr/lib/python2.7/site-packages/tempest/lib/common/ssh.py", line 121, in _get_ssh_connection password=self.password) tempest.lib.exceptions.SSHTimeout: Connection to the 10.0.0.224 via SSH timed out. User: cirros, Password: None From my investigation it looks that it is because of race between two different operations on router. 1. router is switched to admin_state down, so port is set to DOWN also, 2. neutron-server got info from ovs agent that port is down 3. but now, other thread changes router from legacy to ha so owner of this port changes from DEVICE_OWNER_ROUTER_INTF to DEVICE_OWNER_HA_REPLICATED_INT and also router is still "on" this