[Yahoo-eng-team] [Bug 2054799] Re: Issue with Project administration at Cloud Admin level
** Also affects: cloud-archive/yoga Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/2054799 Title: Issue with Project administration at Cloud Admin level Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive yoga series: New Status in OpenStack Dashboard (Horizon): Fix Released Bug description: We are not able to see the list of users assigned to a project in Horizon. Scenario: - Log in as Cloud Admin - Set Domain Context (k8s) - Go to projects section - Click on project Permissions_Roles_Test - Go to Users Expectation: Get a table with the users assigned to this project. Result: Get an error - https://i.imgur.com/TminwUy.png [Test steps] 1, Create an ordinary openstack test env with horizon. 2, Prepared some test data (eg: one domain k8s, one project k8s, and one user k8s-admain with the role k8s-admin-role) openstack domain create k8s openstack role create k8s-admin-role openstack project create --domain k8s k8s openstack user create --project-domain k8s --project k8s --domain k8s --password password k8s-admin openstack role add --user k8s-admin --user-domain k8s --project k8s --project-domain k8s k8s-admin-role $ openstack role assignment list --project k8s --names ++---+---+-+++---+ | Role | User | Group | Project | Domain | System | Inherited | ++---+---+-+++---+ | k8s-admin-role | k8s-admin@k8s | | k8s@k8s ||| False | ++---+---+-+++---+ 3, Log in horizon dashboard with admin user(eg: admin/openstack/admin_domain). 4, Click 'Identity -> Domains' to set domain context to the domain 'k8s'. 5, Click 'Identity -> Project -> k8s project -> Users'. 6, This is the result, it said 'Unable to disaply the users of this project' - https://i.imgur.com/TminwUy.png 7, These are some logs ==> /var/log/apache2/error.log <== [Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 140254008985152] [remote 10.5.3.120:58978] Recoverable error: 'e900b8934d11458b8eb9db21671c1b11' ==> /var/log/apache2/ssl_access.log <== 10.5.3.120 - - [23/Feb/2024:10:03:11 +] "GET /identity/07123041ee0544e0ab32e50dde780afd/detail/?tab=project_details__users HTTP/1.1" 200 1125 "https://10.5.3.120/identity/07123041ee0544e0ab32e50dde780afd/detail/"; "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" [Some Analyses] This action will call this function in horizon [1]. This function will firstly get a list of users (api.keystone.user_list) [2], then role assignment list (api.keystone.get_project_users_roles) [3]. Without setting domain context, this works fine. However, if setting domain context, the project displayed is in a different domain. The user list from [2] only contains users of the user's own domain, while the role assignment list [3] includes users in another domain since the project is in another domain. From horizon's debug log, here is an example of user list: {"users": [{"email": "juju@localhost", "id": "8cd8f92ac2f94149a91488ad66f02382", "name": "admin", "domain_id": "103a4eb1712f4eb9873240d5a7f66599", "enabled": true, "password_expires_at": null, "options": {}, "links": {"self": "https://192.168.1.59:5000/v3/users/8cd8f92ac2f94149a91488ad66f02382"}}], "links": {"next": null, "self": "https://192.168.1.59:5000/v3/users";, "previous": null}} Here is an example of role assignment list: {"role_assignments": [{"links": {"assignment": "https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/a70745ed9ac047ad88b917f24df3c873/roles/f606fafcb4fd47018aeffec2b07b7e84"}, "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": {"id": "a70745ed9ac047ad88b917f24df3c873"}, "role": {"id": "f606fafcb4fd47018aeffec2b07b7e84"}}, {"links": {"assignment": "https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/fd7a79e2a4044c17873c08daa9ed37a1/roles/b936a9d998be4500900a5a9174b16b42"}, "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": {"id": "fd7a79e2a4044c17873c08daa9ed37a1"}, "role": {"id": "b936a9d998be4500900a5a9174b16b42"}}], "links": {"next": null, "self": "https://192.168.1.59:5000/v3/role_assignments?scope.project.id=82e250e8492b49a1a05467994d33ea1b&include_subtree=True";, "previous": null}} Then later in the horizon function, it tries to get user details from user list for users in role assignment list [4], and fails, because users in role assignment list don't exist in user list. Horizon throws an error like: [Fri F
[Yahoo-eng-team] [Bug 2054799] Re: Issue with Project administration at Cloud Admin level
** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/2054799 Title: Issue with Project administration at Cloud Admin level Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive yoga series: New Status in OpenStack Dashboard (Horizon): Fix Released Bug description: We are not able to see the list of users assigned to a project in Horizon. Scenario: - Log in as Cloud Admin - Set Domain Context (k8s) - Go to projects section - Click on project Permissions_Roles_Test - Go to Users Expectation: Get a table with the users assigned to this project. Result: Get an error - https://i.imgur.com/TminwUy.png [Test steps] 1, Create an ordinary openstack test env with horizon. 2, Prepared some test data (eg: one domain k8s, one project k8s, and one user k8s-admain with the role k8s-admin-role) openstack domain create k8s openstack role create k8s-admin-role openstack project create --domain k8s k8s openstack user create --project-domain k8s --project k8s --domain k8s --password password k8s-admin openstack role add --user k8s-admin --user-domain k8s --project k8s --project-domain k8s k8s-admin-role $ openstack role assignment list --project k8s --names ++---+---+-+++---+ | Role | User | Group | Project | Domain | System | Inherited | ++---+---+-+++---+ | k8s-admin-role | k8s-admin@k8s | | k8s@k8s ||| False | ++---+---+-+++---+ 3, Log in horizon dashboard with admin user(eg: admin/openstack/admin_domain). 4, Click 'Identity -> Domains' to set domain context to the domain 'k8s'. 5, Click 'Identity -> Project -> k8s project -> Users'. 6, This is the result, it said 'Unable to disaply the users of this project' - https://i.imgur.com/TminwUy.png 7, These are some logs ==> /var/log/apache2/error.log <== [Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 140254008985152] [remote 10.5.3.120:58978] Recoverable error: 'e900b8934d11458b8eb9db21671c1b11' ==> /var/log/apache2/ssl_access.log <== 10.5.3.120 - - [23/Feb/2024:10:03:11 +] "GET /identity/07123041ee0544e0ab32e50dde780afd/detail/?tab=project_details__users HTTP/1.1" 200 1125 "https://10.5.3.120/identity/07123041ee0544e0ab32e50dde780afd/detail/"; "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" [Some Analyses] This action will call this function in horizon [1]. This function will firstly get a list of users (api.keystone.user_list) [2], then role assignment list (api.keystone.get_project_users_roles) [3]. Without setting domain context, this works fine. However, if setting domain context, the project displayed is in a different domain. The user list from [2] only contains users of the user's own domain, while the role assignment list [3] includes users in another domain since the project is in another domain. From horizon's debug log, here is an example of user list: {"users": [{"email": "juju@localhost", "id": "8cd8f92ac2f94149a91488ad66f02382", "name": "admin", "domain_id": "103a4eb1712f4eb9873240d5a7f66599", "enabled": true, "password_expires_at": null, "options": {}, "links": {"self": "https://192.168.1.59:5000/v3/users/8cd8f92ac2f94149a91488ad66f02382"}}], "links": {"next": null, "self": "https://192.168.1.59:5000/v3/users";, "previous": null}} Here is an example of role assignment list: {"role_assignments": [{"links": {"assignment": "https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/a70745ed9ac047ad88b917f24df3c873/roles/f606fafcb4fd47018aeffec2b07b7e84"}, "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": {"id": "a70745ed9ac047ad88b917f24df3c873"}, "role": {"id": "f606fafcb4fd47018aeffec2b07b7e84"}}, {"links": {"assignment": "https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/fd7a79e2a4044c17873c08daa9ed37a1/roles/b936a9d998be4500900a5a9174b16b42"}, "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": {"id": "fd7a79e2a4044c17873c08daa9ed37a1"}, "role": {"id": "b936a9d998be4500900a5a9174b16b42"}}], "links": {"next": null, "self": "https://192.168.1.59:5000/v3/role_assignments?scope.project.id=82e250e8492b49a1a05467994d33ea1b&include_subtree=True";, "previous": null}} Then later in the horizon function, it tries to get user details from user list for users in role assignment list [4], and fails, because users in role assignment list don't exist in user list. Horizon throws an error like: [Fri Feb 23
[Yahoo-eng-team] [Bug 2054799] [NEW] Issue with Project administration at Cloud Admin level
Public bug reported: We are not able to see the list of users assigned to a project in Horizon. Scenario: - Log in as Cloud Admin - Set Domain Context (k8s) - Go to projects section - Click on project Permissions_Roles_Test - Go to Users Expectation: Get a table with the users assigned to this project. Result: Get an error - https://i.imgur.com/TminwUy.png [Test steps] 1, Create an ordinary openstack test env with horizon. 2, Prepared some test data (eg: one domain k8s, one project k8s, and one user k8s-admain with the role k8s-admin-role) openstack domain create k8s openstack role create k8s-admin-role openstack project create --domain k8s k8s openstack user create --project-domain k8s --project k8s --domain k8s --password password k8s-admin openstack role add --user k8s-admin --user-domain k8s --project k8s --project-domain k8s k8s-admin-role $ openstack role assignment list --project k8s --names ++---+---+-+++---+ | Role | User | Group | Project | Domain | System | Inherited | ++---+---+-+++---+ | k8s-admin-role | k8s-admin@k8s | | k8s@k8s ||| False | ++---+---+-+++---+ 3, Log in horizon dashboard with admin user(eg: admin/openstack/admin_domain). 4, Click 'Identity -> Domains' to set domain context to the domain 'k8s'. 5, Click 'Identity -> Project -> k8s project -> Users'. 6, This is the result, it said 'Unable to disaply the users of this project' - https://i.imgur.com/TminwUy.png 7, These are some logs ==> /var/log/apache2/error.log <== [Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 140254008985152] [remote 10.5.3.120:58978] Recoverable error: 'e900b8934d11458b8eb9db21671c1b11' ==> /var/log/apache2/ssl_access.log <== 10.5.3.120 - - [23/Feb/2024:10:03:11 +] "GET /identity/07123041ee0544e0ab32e50dde780afd/detail/?tab=project_details__users HTTP/1.1" 200 1125 "https://10.5.3.120/identity/07123041ee0544e0ab32e50dde780afd/detail/"; "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" [Some Analyses] This action will call this function in horizon [1]. This function will firstly get a list of users (api.keystone.user_list) [2], then role assignment list (api.keystone.get_project_users_roles) [3]. Without setting domain context, this works fine. However, if setting domain context, the project displayed is in a different domain. The user list from [2] only contains users of the user's own domain, while the role assignment list [3] includes users in another domain since the project is in another domain. >From horizon's debug log, here is an example of user list: {"users": [{"email": "juju@localhost", "id": "8cd8f92ac2f94149a91488ad66f02382", "name": "admin", "domain_id": "103a4eb1712f4eb9873240d5a7f66599", "enabled": true, "password_expires_at": null, "options": {}, "links": {"self": "https://192.168.1.59:5000/v3/users/8cd8f92ac2f94149a91488ad66f02382"}}], "links": {"next": null, "self": "https://192.168.1.59:5000/v3/users";, "previous": null}} Here is an example of role assignment list: {"role_assignments": [{"links": {"assignment": "https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/a70745ed9ac047ad88b917f24df3c873/roles/f606fafcb4fd47018aeffec2b07b7e84"}, "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": {"id": "a70745ed9ac047ad88b917f24df3c873"}, "role": {"id": "f606fafcb4fd47018aeffec2b07b7e84"}}, {"links": {"assignment": "https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/fd7a79e2a4044c17873c08daa9ed37a1/roles/b936a9d998be4500900a5a9174b16b42"}, "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": {"id": "fd7a79e2a4044c17873c08daa9ed37a1"}, "role": {"id": "b936a9d998be4500900a5a9174b16b42"}}], "links": {"next": null, "self": "https://192.168.1.59:5000/v3/role_assignments?scope.project.id=82e250e8492b49a1a05467994d33ea1b&include_subtree=True";, "previous": null}} Then later in the horizon function, it tries to get user details from user list for users in role assignment list [4], and fails, because users in role assignment list don't exist in user list. Horizon throws an error like: [Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 140254008985152] [remote 10.5.3.120:58978] Recoverable error: 'e900b8934d11458b8eb9db21671c1b11' This id is the id of a user, which is used as a key to find a user in the user list. But user list doesn't have this id, so it fails. [1] https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/identity/projects/tabs.py#L85 [2] https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/identity/projects/tabs.py#L96 [3] https://github.com/openstack/horizon/blob/master/opens
[Yahoo-eng-team] [Bug 1996594] [NEW] OVN metadata randomly stops working
Public bug reported: We found that OVN metadata will not work randomly when OVN is writing a snapshot. 1, At 12:30:35, OVN started to transfer leadership to write a snapshot $ find sosreport-juju-2752e1-*/var/log/ovn/* |xargs zgrep -i -E 'Transferring leadership' sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.322Z|80962|raft|INFO|Transferring leadership to write a snapshot. sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T17:52:53.024Z|82382|raft|INFO|Transferring leadership to write a snapshot. sosreport-juju-2752e1-7-lxd-27-xxx-2022-08-18-hhxxqci/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.330Z|92698|raft|INFO|Transferring leadership to write a snapshot. 2, At 12:30:36, neutron-ovn-metadata-agent reported OVSDB Error $ find sosreport-srv1*/var/log/neutron/* |xargs zgrep -i -E 'OVSDB Error' sosreport-srv1xxx2d-xxx-2022-08-18-cuvkufw/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.103 75556 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available sosreport-srv1xxx6d-xxx-2022-08-18-bgnovqu/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.104 2171 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available 3, At 12:57:53, we saw the error 'No port found in network', then we will hit the problem that OVN metadata does not work randomly 2022-08-18 12:57:53.800 3730 ERROR neutron.agent.ovn.metadata.server [-] No port found in network 63e2c276-60dd-40e3-baa1-c16342eacce2 with IP address 100.94.98.135 After the problem occurs, restarting neutron-ovn-metadata-agent or restarting haproxy instance as follows can be used as a workaround. /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2 haproxy -f /var/lib/neutron/ovn-metadata- proxy/63e2c276-60dd-40e3-baa1-c16342eacce2.conf One lp bug #1990978 [1] is trying to reducing the frequency of transfers, it should be beneficial to this problem. But it only reduces the occurrence of problems, not completely avoiding them. I wonder if we need to add some retry logic on the neutron side NOTE: The openstack version we are using is focal-xena, and openvswitch's version is 2.16.0-0ubuntu2.1~cloud0 [1] https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1990978 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1996594 Title: OVN metadata randomly stops working Status in neutron: New Bug description: We found that OVN metadata will not work randomly when OVN is writing a snapshot. 1, At 12:30:35, OVN started to transfer leadership to write a snapshot $ find sosreport-juju-2752e1-*/var/log/ovn/* |xargs zgrep -i -E 'Transferring leadership' sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.322Z|80962|raft|INFO|Transferring leadership to write a snapshot. sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T17:52:53.024Z|82382|raft|INFO|Transferring leadership to write a snapshot. sosreport-juju-2752e1-7-lxd-27-xxx-2022-08-18-hhxxqci/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.330Z|92698|raft|INFO|Transferring leadership to write a snapshot. 2, At 12:30:36, neutron-ovn-metadata-agent reported OVSDB Error $ find sosreport-srv1*/var/log/neutron/* |xargs zgrep -i -E 'OVSDB Error' sosreport-srv1xxx2d-xxx-2022-08-18-cuvkufw/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.103 75556 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available sosreport-srv1xxx6d-xxx-2022-08-18-bgnovqu/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.104 2171 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available 3, At 12:57:53, we saw the error 'No port found in network', then we will hit the problem that OVN metadata does not work randomly 2022-08-18 12:57:53.800 3730 ERROR neutron.agent.ovn.metadata.server [-] No port found in network 63e2c276-60dd-40e3-baa1-c16342eacce2 with IP address 100.94.98.135 After the problem occurs, restarting neutron-ovn-metadata-agent or restarting haproxy instance as follows can be used as a workaround. /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2 haproxy -f /var/lib/neutron/ovn-metadata- proxy/63e2c276-60dd-40e3-baa1-c16342eacce2.conf One lp bug #1990978 [1] is trying to reducing the frequency of transfers, it should be beneficial to this problem. But it only reduces the occurrence of problems, not completely avoiding them. I wonder if we need to add some retry logic
[Yahoo-eng-team] [Bug 1947127] Re: Some DNS extensions not working with OVN
** Patch added: "focal-xena.debdiff" https://bugs.launchpad.net/neutron/+bug/1947127/+attachment/5586824/+files/focal-xena.debdiff ** Summary changed: - Some DNS extensions not working with OVN + [SRU] Some DNS extensions not working with OVN ** Changed in: cloud-archive Status: Confirmed => Fix Released ** Tags added: sts sts-sru-needed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1947127 Title: [SRU] Some DNS extensions not working with OVN Status in Ubuntu Cloud Archive: Fix Released Status in neutron: Fix Released Status in neutron package in Ubuntu: New Bug description: [Impact] On a fresh devstack install with the q-dns service enable from the neutron devstack plugin, some features still don't work, e.g.: $ openstack subnet set private-subnet --dns-publish-fixed-ip BadRequestException: 400: Client Error for url: https://10.250.8.102:9696/v2.0/subnets/9f50c79e-6396-4c5b-be92-f64aa0f25beb, Unrecognized attribute(s) 'dns_publish_fixed_ip' $ openstack port create p1 --network private --dns-name p1 --dns-domain a.b. BadRequestException: 400: Client Error for url: https://10.250.8.102:9696/v2.0/ports, Unrecognized attribute(s) 'dns_domain' The reason seems to be that https://review.opendev.org/c/openstack/neutron/+/686343/31/neutron/common/ovn/extensions.py only added dns_domain_keywords, but not e.g. dns_domain_ports as supported by OVN [Test Case] Create a normal OpenStack neutron test environment to see if we can successfully run the following commands: openstack subnet set private_subnet --dns-publish-fixed-ip openstack port create p1 --network private --dns-name p1 --dns-domain a.b. [Regression Potential] The fix has merged into the upstream stable/xena branch [1], here's just SRU into the 19.1.0 branch of UCA xena, so it is a clean backport and might be helpful for deployments migrating to OVN. [1] https://review.opendev.org/c/openstack/neutron/+/838650 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1947127/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1948656] Re: toggling explicitly_egress_direct from true to false does not clean the openflow flows on the integration bridge
Successfully find a better workaround to take advantage of delete_accepted_egress_direct_flow in _unbind_distributed_router_interface_port [1]. # eg: mac of the old snat-xxx port is fa:16:3e:7a:11:7d neutron router-interface-delete provider-router $(openstack subnet show private_subnet -cid -fvalue) # eg: mac of the new snat-xxx port is fa:16:3e:e6:f9:b2 neutron router-interface-add provider-router $(openstack subnet show private_subnet -cid -fvalue) openstack port list --device-owner network:router_centralized_snat The code path is: process_deleted_ports -> port_unbound -> unbind_port_from_dvr -> _unbind_centralized_snat_port_on_dvr_subnet -> delete_accepted_egress_direct_flow The egress direct flow for the old snat-xxx port won't disappear # ovs-ofctl dump-flows br-int |grep fa:16:3e:7a:11:7d |grep -E 'priority=12|priority=10' cookie=0x59874eed7c9fa42a, duration=76882.302s, table=94, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=12,reg6=0x1,dl_dst=fa:16:3e:7a:11:7d actions=output:16 cookie=0x59874eed7c9fa42a, duration=76882.302s, table=94, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=10,reg6=0x1,dl_src=fa:16:3e:7a:11:7d,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:1,output:2 but the egress direct flow won't produce as well # ovs-ofctl dump-flows br-int |grep 'fa:16:3e:e6:f9:b2' |grep -E 'priority=12|priority=10' So north-south traffic will resume to work again. # ip netns exec snat-10140acd-28e6-4110-ae67-76115b72b37c ping -c1 192.168.21.114 PING 192.168.21.114 (192.168.21.114) 56(84) bytes of data. 64 bytes from 192.168.21.114: icmp_seq=1 ttl=64 time=1.86 ms [1] https://review.opendev.org/c/openstack/neutron/+/704506/1/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#678 ** Changed in: neutron Status: Triaged => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1948656 Title: toggling explicitly_egress_direct from true to false does not clean the openflow flows on the integration bridge Status in neutron: Invalid Bug description: As the comment [1] says, the following flows are not clearup after explicitly_egress_direct is toggled from true to false # ovs-ofctl dump-flows br-int |grep fa:16:3e:7a:11:7d |grep -E 'priority=12|priority=10' cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, n_bytes=0, idle_age=2148, priority=12,reg6=0x1,dl_dst=fa:16:3e:7a:11:7d actions=output:16 cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, n_bytes=0, idle_age=2148, priority=10,reg6=0x1,dl_src=fa:16:3e:7a:11:7d,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:1,output:2 There seems to be no way to trigger delete_accepted_egress_direct_flow [2] for above snat-xxx port (fa:16:3e:7a:11:7d). [1] https://bugs.launchpad.net/neutron/+bug/1945306/comments/9 [2] https://review.opendev.org/c/openstack/neutron/+/704506/1/neutron/agent/linux/openvswitch_firewall/firewall.py#1140 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1948656/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1948656] [NEW] toggling explicitly_egress_direct from true to false does not clean flows
Public bug reported: As the comment [1] says, the following flows are not clearup after explicitly_egress_direct is toggled from true to false # ovs-ofctl dump-flows br-int |grep fa:16:3e:7a:11:7d |grep -E 'priority=12|priority=10' cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, n_bytes=0, idle_age=2148, priority=12,reg6=0x1,dl_dst=fa:16:3e:7a:11:7d actions=output:16 cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, n_bytes=0, idle_age=2148, priority=10,reg6=0x1,dl_src=fa:16:3e:7a:11:7d,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:1,output:2 There seems to be no way to trigger delete_accepted_egress_direct_flow [2] for above snat-xxx port (fa:16:3e:7a:11:7d). [1] https://bugs.launchpad.net/neutron/+bug/1945306/comments/9 [2] https://review.opendev.org/c/openstack/neutron/+/704506/1/neutron/agent/linux/openvswitch_firewall/firewall.py#1140 ** Affects: neutron Importance: Undecided Status: New ** Tags: sts ** Tags added: sts -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1948656 Title: toggling explicitly_egress_direct from true to false does not clean flows Status in neutron: New Bug description: As the comment [1] says, the following flows are not clearup after explicitly_egress_direct is toggled from true to false # ovs-ofctl dump-flows br-int |grep fa:16:3e:7a:11:7d |grep -E 'priority=12|priority=10' cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, n_bytes=0, idle_age=2148, priority=12,reg6=0x1,dl_dst=fa:16:3e:7a:11:7d actions=output:16 cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, n_bytes=0, idle_age=2148, priority=10,reg6=0x1,dl_src=fa:16:3e:7a:11:7d,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=mod_vlan_vid:1,output:2 There seems to be no way to trigger delete_accepted_egress_direct_flow [2] for above snat-xxx port (fa:16:3e:7a:11:7d). [1] https://bugs.launchpad.net/neutron/+bug/1945306/comments/9 [2] https://review.opendev.org/c/openstack/neutron/+/704506/1/neutron/agent/linux/openvswitch_firewall/firewall.py#1140 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1948656/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1945306] [NEW] north-south traffic not working when VM and main router are not on the same host
Public bug reported: Some newly created VM's are not able to reach "outside" resources (e.g. apt repositories) on the l3ha + dvr env, this problem can be easily reproduced as long as VM and main router are not on the same host, and 'apt update' command can not be run inside VM, so the north-south traffic is broken. Here are steps to easily reproduce it. 1, set up wallaby or ussuri vrrp + dvr env (it works on train, not work on ussuri and wallaby) 2, create a test vm, query host by: nova show |grep host 3, query main router by: neutron l3-agent-list-hosting-router $(openstack router show provider-router -fvalue -cid) 4, make sure VM and main router are not on the same host 5, on main router host, it will fail to run: ip netns exec snat-xxx ping -c1 I've done some bisect, I found: 15.3.4 (bionic-train) - no problem 1c2e10f859 - no problem 16.4.0 (bionic-ussuri) - has problem 16.0.0-0ubuntu3- has problem, and also have multiple active routers problem 16.0.0~b3~git2020041516.5f42488a9a-0ubuntu2 - BAD version, all routers are in standby state so we can't do any test 16.1.0 (focal) - has problem, and also have multiple active routers problem 16.2.0 (focal) - has problem 16.3.0 (focal) - has problem 16.4.0 (focal-ussuri) - has problem focal-wallaby - has problem Because I often have multiple standby issue with some commit id (eg: 14dd3e95ca) so that I can't continue bisect. I also used 'ovs-appctl ofproto/trace' and tcpdump to do some debugs, the results are as follows. train - works sg-xxx -> vm - https://pastebin.ubuntu.com/p/MHNVf8wXtb/ tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/Fqxp4mvkgV/ tcpdump on vm's tap - https://pastebin.ubuntu.com/p/YppWc2Pg33/ tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/MPmQ5xbnT2/ - can get icmp reply ussuri - not work sg-xxx -> vm - https://pastebin.ubuntu.com/p/hKfSB9gmd9/ tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/NCcnGS4gdj/ - sg-xxx can't get icmp reply tcpdump on vm's tap - https://pastebin.ubuntu.com/p/DHdVbB66NT/ - VM can't get sg-xxx's arp reply tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/4hJ7vdRRC4/ - can't get arp reply It looks like VM can't get arp reply for sg-xxx interface, ** Affects: neutron Importance: Undecided Status: New ** Description changed: Some newly created VM's are not able to reach "outside" resources (e.g. - apt repositories) on then l3ha + dvr env, I can easily reproduce this - problem as long as VM and main router are not on the same host, and 'apt - update' command can not be run inside VM, so the north-south traffic is - broken. + apt repositories) on the l3ha + dvr env, this problem can be easily + reproduced as long as VM and main router are not on the same host, and + 'apt update' command can not be run inside VM, so the north-south + traffic is broken. Here are steps to easily reproduce it. 1, set up wallaby or ussuri vrrp + dvr env (it works on train, not work on ussuri and wallaby) 2, create a test vm, query host by: nova show |grep host 3, query main router by: neutron l3-agent-list-hosting-router $(openstack router show provider-router -fvalue -cid) 4, make sure VM and main router are not on the same host - 5, on main router host, it will fail to run: ip netns exec snat-xxx ping -c1 + 5, on main router host, it will fail to run: ip netns exec snat-xxx ping -c1 I've done some bisect, I found: 15.3.4 (bionic-train) - no problem 1c2e10f859 - no problem 16.4.0 (bionic-ussuri) - has problem 16.0.0-0ubuntu3- has problem, and also have multiple active routers problem 16.0.0~b3~git2020041516.5f42488a9a-0ubuntu2 - BAD version, all routers are in standby state so we can't do any test 16.1.0 (focal) - has problem, and also have multiple active routers problem 16.2.0 (focal) - has problem 16.3.0 (focal) - has problem 16.4.0 (focal-ussuri) - has problem focal-wallaby - has problem Because I often have multiple standby issue with some commit id (eg: 14dd3e95ca) so that I can't continue bisect. I also used 'ovs-appctl ofproto/trace' and tcpdump to do some debugs, the results are as follows. train - works sg-xxx -> vm - https://pastebin.ubuntu.com/p/MHNVf8wXtb/ tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/Fqxp4mvkgV/ tcpdump on vm's tap - https://pastebin.ubuntu.com/p/YppWc2Pg33/ tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/MPmQ5xbnT2/ - can get icmp reply ussuri - not work sg-xxx -> vm - https://pastebin.ubuntu.com/p/hKfSB9gmd9/ tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/NCcnGS4gdj/ - sg-xxx can't get icmp reply tcpdump on vm's tap - https://pastebin.ubuntu.com/p/DHdVbB66NT/ - VM can't get sg-xxx's arp reply tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/4hJ7vdRRC4/ - can't get arp reply It looks like VM can't get arp reply for sg-xxx interface, -- You received this bug notification because you are a
[Yahoo-eng-team] [Bug 1681627] Re: [SRU] Page not found error on refreshing bowser (in AngularJS-based detail page)
** Changed in: cloud-archive/pike Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1681627 Title: [SRU] Page not found error on refreshing bowser (in AngularJS-based detail page) Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive ocata series: New Status in Ubuntu Cloud Archive pike series: Fix Released Status in OpenStack Dashboard (Horizon): Fix Released Status in Zun UI: Fix Released Bug description: [Impact] When clicking instances snapshot detail on Images menu and then refresh the pages will get an error: ``` The page you were looking for doesn't exist You may have mistyped the address or the page may have moved. ``` [Test Case] 1. Deply a OpenStack env with horizon 2. Click instances snapshot detail on Images menu 3. Refresh the page 4. Check if you will see the error 'The page you were looking for doesn't exist' [Regression Potential] This problem has been fixed in Queens with two patches [1][2], we need to backport them into Ocata as well. But in fact, directly backporting these two primitive patches [1][2] into Ocata will not be able to work, because: 1, In Ocata release, getDetailsPath returns "'project/ngdetails/OS::Glance::Image/' + item.id;" https://github.com/openstack/horizon/blob/stable/ocata/openstack_dashboard/static/app/core/images/images.service.js#L59 function getDetailsPath(item) { return 'project/ngdetails/OS::Glance::Image/' + item.id; } 2, In > Ocata release, eg: Pike release, getDetailsPath returns "detailRoute + 'OS::Glance::Image/' + item.id" https://github.com/openstack/horizon/blob/stable/pike/openstack_dashboard/static/app/core/images/images.service.js#L69 function getDetailsPath(item) { return detailRoute + 'OS::Glance::Image/' + item.id; } So we will see the error 'The current URL, project/ngdetails/OS::Glance::Image/46ef8cab-dfc3-4690-8abb- d416978d237e, didn't match any of these.' when backporting two primitive patches into Ocata. So the following simple changes need to be made in urls.py in addition to the primitive backport patches as well. -ngdetails_url = url(r'^ngdetails/', +ngdetails_url = url(r'^project/ngdetails/', [1] https://review.openstack.org/#/c/541676/ [2] https://review.openstack.org/#/c/553970/ [Original Bug Report] Once I get into the container detail view, refresh the browser will show a page not found error: The current URL, ngdetails/OS::Zun::Container/c54ba416-a955-45b2 -848b-aee57b748e08, didn't match any of these Full output: http://paste.openstack.org/show/605296/ To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1681627/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1793102] Re: ha_vrrp_health_check_interval causes constantly VRRP transitions
I can't reproduce this problem today, very lucky, all are running well. but it's pity that I lost my last test env so that today I am using a new test env and I can't compare the difference between two. But I believe that's because my new test env has included the following patch. thanks all https://github.com/acassen/keepalived/commit/e90a633c34fbe6ebbb891aa98bf29ce579b8b45c ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1793102 Title: ha_vrrp_health_check_interval causes constantly VRRP transitions Status in neutron: Invalid Bug description: Commit 185d6cbc648fd041402a5034b04b818da5c7136e added support for keepalived VRRP health check, but it will cause constantly VRRP transitions if you actually enable the option ha_vrrp_health_check_interval. It seems to be because keepalived can't run ha_check_script_1.sh well, while we can run ha_check_script_1.sh well by hand. Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: /var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh exited with status 1 Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Script(ha_health_check_1) failed Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) Entering FAULT STATE Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) removing protocol Virtual Routes Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) removing protocol VIPs. Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) removing protocol E-VIPs. Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) Now in FAULT state root@juju-23f84c-queens-dvr-5:~# ll /var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh -r-x-w 1 neutron neutron 109 Sep 18 03:45 /var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh* To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1793102/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1744079] Re: [SRU] disk over-commit still not correctly calculated during live migration
** Also affects: nova (Ubuntu) Importance: Undecided Status: New ** No longer affects: nova (Ubuntu) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1744079 Title: [SRU] disk over-commit still not correctly calculated during live migration Status in Ubuntu Cloud Archive: New Status in OpenStack Compute (nova): Fix Released Bug description: [Impact] nova compares disk space with disk_available_least field, which is possible to be negative, due to overcommit. So the migration may fail because of a "Migration pre-check error: Unable to migrate dfcd087a-5dff-439d-8875-2f702f081539: Disk of instance is too large(available on destination host:-3221225472 < need:22806528)" when trying a migration to another compute that has plenty of free space in his disk. [Test Case] Deploy openstack environment. Make sure there is a negative disk_available_least and a adequate free_disk_gb in one test compute node, then migrate a VM to it with disk-overcommit (openstack server migrate --live --block-migration --disk-overcommit ). You will see above migration pre-check error. This is the formula to compute disk_available_least and free_disk_gb. disk_free_gb = disk_info_dict['free'] disk_over_committed = self._get_disk_over_committed_size_total() available_least = disk_free_gb * units.Gi - disk_over_committed data['disk_available_least'] = available_least / units.Gi The following command can be used to query the value of disk_available_least nova hypervisor-show |grep disk Steps to Reproduce: 1. set disk_allocation_ratio config option > 1.0 2. qemu-img resize cirros-0.3.0-x86_64-disk.img +40G 3. glance image-create --disk-format qcow2 ... 4. boot VMs based on resized image 5. we see disk_available_least becomes negative [Regression Potential] Minimal - we're just changing from the following line: disk_available_gb = dst_compute_info['disk_available_least'] to the following codes: if disk_over_commit: disk_available_gb = dst_compute_info['free_disk_gb'] else: disk_available_gb = dst_compute_info['disk_available_least'] When enabling overcommit, disk_available_least is possible to be negative, so we should use free_disk_gb instead of it by backporting the following two fixes. https://git.openstack.org/cgit/openstack/nova/commit/?id=e097c001c8e0efe8879da57264fcb7bdfdf2 https://git.openstack.org/cgit/openstack/nova/commit/?id=e2cc275063658b23ed88824100919a6dfccb760d This is the code path for check_can_live_migrate_destination: _migrate_live(os-migrateLive API, migrate_server.py) -> migrate_server -> _live_migrate -> _build_live_migrate_task -> _call_livem_checks_on_host -> check_can_live_migrate_destination BTW, redhat also has a same bug - https://bugzilla.redhat.com/show_bug.cgi?id=1477706 [Original Bug Report] Change I8a705114d47384fcd00955d4a4f204072fed57c2 (written by me... sigh) addressed a bug which prevented live migration to a target host with overcommitted disk when made with microversion <2.25. It achieved this, but the fix is still not correct. We now do: if disk_over_commit: disk_available_gb = dst_compute_info['local_gb'] Unfortunately local_gb is *total* disk, not available disk. We actually want free_disk_gb. Fun fact: due to the way we calculate this for filesystems, without taking into account reserved space, this can also be negative. The test we're currently running is: could we fit this guest's allocated disks on the target if the target disk was empty. This is at least better than it was before, as we don't spuriously fail early. In fact, we're effectively disabling a test which is disabled for microversion >=2.25 anyway. IOW we should fix it, but it's probably not a high priority. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1744079/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1744079] Re: [SRU] disk over-commit still not correctly calculated during live migration
** Tags added: sts-sponsor ** Also affects: cloud-archive Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1744079 Title: [SRU] disk over-commit still not correctly calculated during live migration Status in Ubuntu Cloud Archive: New Status in OpenStack Compute (nova): Fix Released Bug description: [Impact] nova compares disk space with disk_available_least field, which is possible to be negative, due to overcommit. So the migration may fail because of a "Migration pre-check error: Unable to migrate dfcd087a-5dff-439d-8875-2f702f081539: Disk of instance is too large(available on destination host:-3221225472 < need:22806528)" when trying a migration to another compute that has plenty of free space in his disk. [Test Case] Deploy openstack environment. Make sure there is a negative disk_available_least and a adequate free_disk_gb in one test compute node, then migrate a VM to it with disk-overcommit (openstack server migrate --live --block-migration --disk-overcommit ). You will see above migration pre-check error. This is the formula to compute disk_available_least and free_disk_gb. disk_free_gb = disk_info_dict['free'] disk_over_committed = self._get_disk_over_committed_size_total() available_least = disk_free_gb * units.Gi - disk_over_committed data['disk_available_least'] = available_least / units.Gi The following command can be used to query the value of disk_available_least nova hypervisor-show |grep disk Steps to Reproduce: 1. set disk_allocation_ratio config option > 1.0 2. qemu-img resize cirros-0.3.0-x86_64-disk.img +40G 3. glance image-create --disk-format qcow2 ... 4. boot VMs based on resized image 5. we see disk_available_least becomes negative [Regression Potential] Minimal - we're just changing from the following line: disk_available_gb = dst_compute_info['disk_available_least'] to the following codes: if disk_over_commit: disk_available_gb = dst_compute_info['free_disk_gb'] else: disk_available_gb = dst_compute_info['disk_available_least'] When enabling overcommit, disk_available_least is possible to be negative, so we should use free_disk_gb instead of it by backporting the following two fixes. https://git.openstack.org/cgit/openstack/nova/commit/?id=e097c001c8e0efe8879da57264fcb7bdfdf2 https://git.openstack.org/cgit/openstack/nova/commit/?id=e2cc275063658b23ed88824100919a6dfccb760d This is the code path for check_can_live_migrate_destination: _migrate_live(os-migrateLive API, migrate_server.py) -> migrate_server -> _live_migrate -> _build_live_migrate_task -> _call_livem_checks_on_host -> check_can_live_migrate_destination BTW, redhat also has a same bug - https://bugzilla.redhat.com/show_bug.cgi?id=1477706 [Original Bug Report] Change I8a705114d47384fcd00955d4a4f204072fed57c2 (written by me... sigh) addressed a bug which prevented live migration to a target host with overcommitted disk when made with microversion <2.25. It achieved this, but the fix is still not correct. We now do: if disk_over_commit: disk_available_gb = dst_compute_info['local_gb'] Unfortunately local_gb is *total* disk, not available disk. We actually want free_disk_gb. Fun fact: due to the way we calculate this for filesystems, without taking into account reserved space, this can also be negative. The test we're currently running is: could we fit this guest's allocated disks on the target if the target disk was empty. This is at least better than it was before, as we don't spuriously fail early. In fact, we're effectively disabling a test which is disabled for microversion >=2.25 anyway. IOW we should fix it, but it's probably not a high priority. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1744079/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1793102] [NEW] ha_vrrp_health_check_interval causes constantly VRRP transitions
Public bug reported: Commit 185d6cbc648fd041402a5034b04b818da5c7136e added support for keepalived VRRP health check, but it will cause constantly VRRP transitions if you actually enable the option ha_vrrp_health_check_interval. It seems to be because keepalived can't run ha_check_script_1.sh well, while we can run ha_check_script_1.sh well by hand. Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: /var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh exited with status 1 Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Script(ha_health_check_1) failed Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) Entering FAULT STATE Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) removing protocol Virtual Routes Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) removing protocol VIPs. Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) removing protocol E-VIPs. Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) Now in FAULT state root@juju-23f84c-queens-dvr-5:~# ll /var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh -r-x-w 1 neutron neutron 109 Sep 18 03:45 /var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh* ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1793102 Title: ha_vrrp_health_check_interval causes constantly VRRP transitions Status in neutron: New Bug description: Commit 185d6cbc648fd041402a5034b04b818da5c7136e added support for keepalived VRRP health check, but it will cause constantly VRRP transitions if you actually enable the option ha_vrrp_health_check_interval. It seems to be because keepalived can't run ha_check_script_1.sh well, while we can run ha_check_script_1.sh well by hand. Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: /var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh exited with status 1 Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Script(ha_health_check_1) failed Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) Entering FAULT STATE Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) removing protocol Virtual Routes Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) removing protocol VIPs. Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) removing protocol E-VIPs. Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: VRRP_Instance(VR_1) Now in FAULT state root@juju-23f84c-queens-dvr-5:~# ll /var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh -r-x-w 1 neutron neutron 109 Sep 18 03:45 /var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh* To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1793102/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1374508] Re: Mismatch happens between BDM and domain XML If instance does not respond to ACPI hotplug during detach/attach.
** Also affects: nova (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1374508 Title: Mismatch happens between BDM and domain XML If instance does not respond to ACPI hotplug during detach/attach. Status in OpenStack Compute (nova): Fix Released Status in nova package in Ubuntu: New Status in nova source package in Trusty: New Bug description: tempest.api.compute.servers.test_server_rescue_negative:ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume This test passes however it fails to properly cleanup after itself - the detach completes but without running the necessary iscsiadm commands. In nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver.disconnect_volume the list returned by self.connection._get_all_block_devices includes the host_device which means that self._disconnect_from_iscsi_portal is never run. You can see evidence of this in /etc/iscsi/nodes as well as errors logged in /var/log/syslog I'm guessing there is a race between the unrescue and the detach within libvirt. In nova.virt.libvirt.driver.LibvirtDriver.detach_volume if I put in a sleep before virt_dom.detachDeviceFlags(xml, flags) the detach appears to work properly however if I sleep after that line it does not appear to have any effect. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1374508/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1635554] Re: Delete Router / race condition
** Changed in: neutron Status: Invalid => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1635554 Title: Delete Router / race condition Status in neutron: Confirmed Bug description: When deleting a router the logfile is filled up. CentOS7 Newton(RDO) 2016-10-21 09:45:02.526 16200 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:140 2016-10-21 09:45:02.526 16200 WARNING neutron.agent.l3.namespaces [-] Namespace qrouter-8cf5-5c5c-461c-84f3-c8abeca8f79a does not exist. Skipping delete 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent [-] Error while deleting router 8cf5-5c5c-461c-84f3-c8abeca8f79a 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 357, in _safe_router_removed 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent self._router_removed(router_id) 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 376, in _router_removed 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent ri.delete(self) 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 381, in delete 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent self.destroy_state_change_monitor(self.process_monitor) 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 325, in destroy_state_change_monitor 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent pm = self._get_state_change_monitor_process_manager() 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 296, in _get_state_change_monitor_process_manager 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent default_cmd_callback=self._get_state_change_monitor_callback()) 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 299, in _get_state_change_monitor_callback 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent ha_device = self.get_ha_device_name() 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 137, in get_ha_device_name 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent return (HA_DEV_PREFIX + self.ha_port['id'])[:self.driver.DEV_NAME_LEN] 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent TypeError: 'NoneType' object has no attribute '__getitem__' 2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent 2016-10-21 09:45:02.528 16200 DEBUG neutron.agent.l3.agent [-] Finished a router update for 8cf5-5c5c-461c-84f3-c8abeca8f79a _process_router_update /usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:504 See full log http://paste.openstack.org/show/586656/ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1635554/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1681998] [NEW] Bypass the dirty BDM enty no matter how it is produced
Public bug reported: Sometimes the following dirty BDM enty (1.row) can be seen in the database that multiple BDMs with the same image_id and instance_uuid. mysql> select * from block_device_mapping where volume_id='153bcab4-1f88-440c-9782-3c661a7502a8' \G *** 1. row *** created_at: 2017-02-02 02:28:45 updated_at: NULL deleted_at: NULL id: 9754 device_name: /dev/vdb delete_on_termination: 0 snapshot_id: NULL volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8 volume_size: NULL no_device: NULL connection_info: NULL instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc deleted: 0 source_type: volume destination_type: volume guest_format: NULL device_type: NULL disk_bus: NULL boot_index: NULL image_id: NULL *** 2. row *** created_at: 2017-02-02 02:29:31 updated_at: 2017-02-27 10:59:42 deleted_at: NULL id: 9757 device_name: /dev/vdc delete_on_termination: 0 snapshot_id: NULL volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8 volume_size: NULL no_device: NULL connection_info: {"driver_volume_type": "rbd", "serial": "153bcab4-1f88-440c-9782-3c661a7502a8", "data": {"secret_type": "ceph", "name": "cinder-ceph/volume-153bcab4-1f88-440c-9782-3c661a7502a8", "secret_uuid": null, "qos_specs": null, "hosts": ["10.7.1.202", "10.7.1.203", "10.7.1.204"], "auth_enabled": true, "access_mode": "rw", "auth_username": "cinder-ceph", "ports": ["6789", "6789", "6789"]}} instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc deleted: 0 source_type: volume destination_type: volume guest_format: NULL device_type: disk disk_bus: virtio boot_index: NULL image_id: NULL then it cause we fail to detach the volume and see the following error since connection_info of row 1 is NULL. 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher self._detach_volume(context, instance, bdm) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4801, in _detach_volume 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher connection_info = jsonutils.loads(bdm.connection_info) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 215, in loads 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher return json.loads(encodeutils.safe_decode(s, encoding), **kwargs) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in safe_decode 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher raise TypeError("%s can't be decoded" % type(text)) 2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher TypeError: can't be decoded This kind of dirty data can be produced when happened to fail to run this line _attach_volume()#volume_bdm.destroy() [1], I think these conditions may cause it to happen: 1, lose the database during the operation volume_bdm.destroy() 2, lose an MQ connection or RPC timing out during the operation volume_bdm.destroy() If you lose the database during any operation, things are going to be bad, so in general I'm not sure how realistic guarding for that case is. Losing an MQ connection or RPC timing out is probably more realistic. Seems the fix [2] is trying to solve the point 2. However, I'm thinking if we can bypass the dirty BDM entry according to the condition that connection_info is NULL no matter how it is produced. [1] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3724 [2] https://review.openstack.org/#/c/290793 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1681998 Title: Bypass the dirty BDM enty no matter how it is produced Status in OpenStack Compute (nova): New Bug description: Sometimes the following dirty BDM enty (1.row) can be seen in the database that multiple BDMs with the same image_id and instance_uuid. mysql> select * from block_device_mapping where volume_id='153bcab4-1f88-440c-9782-3c661a7502a8' \G *** 1. row *** created_at: 2017-02-02 02:28:45 updated_at: NULL deleted_at: NULL id: 9754 device_name: /dev/vdb delete_on_termination: 0
[Yahoo-eng-team] [Bug 1515896] Re: Update port of admin state to False for neutron floating IP port does not take effect
** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1515896 Title: Update port of admin state to False for neutron floating IP port does not take effect Status in neutron: Invalid Bug description: It is expected that when the admin-state of the port is down, it should seize all the operations specific to that port. But this is not the case with a floating IP port. When the port's admin-state is made False, floating IP continues to be operational. root@controller:~# neutron port-show bc3f8bf6-c4bf-451a-8b5a-0d0b7624b5ca +---+--+ | Field | Value | +---+--+ | admin_state_up| False | | allowed_address_pairs | | | binding:host_id | | | binding:profile | {} | | binding:vif_details | {} | | binding:vif_type | unbound | | binding:vnic_type | normal | | device_id | ca0d0355-9ebe-46c5-9a31-2e2253da2d40 | | device_owner | network:floatingip | | extra_dhcp_opts | | | fixed_ips | {"subnet_id": "eb1339bd-b552-4207-8856-ccff1de04f47", "ip_address": "10.0.2.18"} | | id| bc3f8bf6-c4bf-451a-8b5a-0d0b7624b5ca | | mac_address | fa:16:3e:c7:0a:98 | | name | | | network_id| dda8f089-25b0-4e13-886e-b0b1bc8f5801 | | security_groups | | | status| DOWN | | tenant_id | | +---+--+ root@controller:~# ping 10.0.2.18 PING 10.0.2.18 (10.0.2.18) 56(84) bytes of data. 64 bytes from 10.0.2.18: icmp_seq=1 ttl=63 time=2.46 ms 64 bytes from 10.0.2.18: icmp_seq=2 ttl=63 time=23.4 ms ^C --- 10.0.2.18 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 2.466/12.949/23.433/10.484 ms Observation: 1) neutron port-update FLOATING_IP_PORT --admin-state-up False 2) neutron port-update PRIVATE_IP_PORT --admin-state-up False 3) neutron port-update PRIVATE_IP_PORT --admin-state-up True 4) Ping of Floating IP does not work now. 5) neutron port-update FLOATING_IP_PORT --admin-state-up True 6) Ping of Floating IP should start working now. Also, this results in the failure of tempest test: tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_update_instance_port_admin_state To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1515896/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1450294] [NEW] Enable password support for vnc session
Public bug reported: qemu supports that password based authentication is used for client connections by adding password option for -vnc as below [1]. -vnc 0.0.0.0:1,password -k en-us qemu xml configuration file provides a VNC password in clear text. but openstack doesn't support to configure vpn password, see the following codes: if ((CONF.vnc_enabled and virt_type not in ('lxc', 'uml'))): graphics = vconfig.LibvirtConfigGuestGraphics() graphics.type = "vnc" graphics.keymap = CONF.vnc_keymap graphics.listen = CONF.vncserver_listen guest.add_device(graphics) add_video_driver = True [1], http://www.cyberciti.biz/faq/linux-kvm-vnc-for-guest-machine/ ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1450294 Title: Enable password support for vnc session Status in OpenStack Compute (Nova): New Bug description: qemu supports that password based authentication is used for client connections by adding password option for -vnc as below [1]. -vnc 0.0.0.0:1,password -k en-us qemu xml configuration file provides a VNC password in clear text. but openstack doesn't support to configure vpn password, see the following codes: if ((CONF.vnc_enabled and virt_type not in ('lxc', 'uml'))): graphics = vconfig.LibvirtConfigGuestGraphics() graphics.type = "vnc" graphics.keymap = CONF.vnc_keymap graphics.listen = CONF.vncserver_listen guest.add_device(graphics) add_video_driver = True [1], http://www.cyberciti.biz/faq/linux-kvm-vnc-for-guest-machine/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1450294/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1433226] [NEW] Add some unit tests for ipsec strongswan vpnaas driver
Public bug reported: Add some unit tests for ipsec strongswan vpnaas driver ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1433226 Title: Add some unit tests for ipsec strongswan vpnaas driver Status in OpenStack Neutron (virtual network service): New Bug description: Add some unit tests for ipsec strongswan vpnaas driver To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1433226/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1433223] [NEW] Add functional tests for ipsec strongswan vpnaas driver
Public bug reported: Add functional tests for ipsec strongswan vpnaas driver ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1433223 Title: Add functional tests for ipsec strongswan vpnaas driver Status in OpenStack Neutron (virtual network service): New Bug description: Add functional tests for ipsec strongswan vpnaas driver To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1433223/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1418656] Re: Sometimes vpnservice's status can't be updated
I can't find this issue recently, maybe is caused by my developement env problem. ** Changed in: neutron Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1418656 Title: Sometimes vpnservice's status can't be updated Status in OpenStack Neutron (virtual network service): Invalid Bug description: 2015-02-05 23:56:11.890 12178 ERROR neutron.openstack.common.loopingcall [-] in fixed duration looping call 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall Traceback (most recent call last): 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall File "/bak/openstack/neutron/neutron/openstack/common/loopingcall.py", line 81, in _inner 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall self.f(*self.args, **self.kw) 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 674, in report_status 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall previous_status = self.get_process_status_cache(process) 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 628, in get_process_status_cache 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 'id': process.vpnservice['id'], 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall TypeError: 'NoneType' object has no attribute '__getitem__' 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1418656/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1430166] [NEW] cisco.l3.plugging_drivers has been moved out from neutron repo
Public bug reported: The following commit in neutron move cisco.l3.plugging_drivers out of neutron. commit 41166d533383e3490ffe6c2b1b200053d90e0b83 Merge: 4663a15 b6ba733 Author: Jenkins Date: Mon Mar 9 17:52:19 2015 + Merge "Vendor decomposition to move CSR1000v support to the networking-cisco repo" but neutron-vpnaas still refer it, so the unit test failed as below: hua@hua-ThinkPad-T440p:/bak/openstack/neutron-vpnaas$ python setup.py testr --slowest --testr-args= running testr running=OS_STDOUT_CAPTURE=1 OS_STDERR_CAPTURE=1 OS_LOG_CAPTURE=1 ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron_vpnaas/tests/unit} --list --- import errors --- Failed to import test module: neutron_vpnaas.tests.unit.services.vpn.service_drivers.test_cisco_ipsec Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/unittest2/loader.py", line 445, in _find_test_path module = self._get_module_from_name(name) File "/usr/local/lib/python2.7/dist-packages/unittest2/loader.py", line 384, in _get_module_from_name __import__(name) File "neutron_vpnaas/tests/unit/services/vpn/service_drivers/test_cisco_ipsec.py", line 27, in from neutron_vpnaas.services.vpn.service_drivers \ File "neutron_vpnaas/services/vpn/service_drivers/cisco_ipsec.py", line 16, in from neutron.plugins.cisco.l3.plugging_drivers import ( ImportError: No module named l3.plugging_drivers Non-zero exit code (2) from test listing. error: testr failed (3) ** Affects: neutron Importance: Undecided Assignee: Hua Zhang (zhhuabj) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1430166 Title: cisco.l3.plugging_drivers has been moved out from neutron repo Status in OpenStack Neutron (virtual network service): In Progress Bug description: The following commit in neutron move cisco.l3.plugging_drivers out of neutron. commit 41166d533383e3490ffe6c2b1b200053d90e0b83 Merge: 4663a15 b6ba733 Author: Jenkins Date: Mon Mar 9 17:52:19 2015 + Merge "Vendor decomposition to move CSR1000v support to the networking-cisco repo" but neutron-vpnaas still refer it, so the unit test failed as below: hua@hua-ThinkPad-T440p:/bak/openstack/neutron-vpnaas$ python setup.py testr --slowest --testr-args= running testr running=OS_STDOUT_CAPTURE=1 OS_STDERR_CAPTURE=1 OS_LOG_CAPTURE=1 ${PYTHON:-python} -m subunit.run discover -t ./ ${OS_TEST_PATH:-./neutron_vpnaas/tests/unit} --list --- import errors --- Failed to import test module: neutron_vpnaas.tests.unit.services.vpn.service_drivers.test_cisco_ipsec Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/unittest2/loader.py", line 445, in _find_test_path module = self._get_module_from_name(name) File "/usr/local/lib/python2.7/dist-packages/unittest2/loader.py", line 384, in _get_module_from_name __import__(name) File "neutron_vpnaas/tests/unit/services/vpn/service_drivers/test_cisco_ipsec.py", line 27, in from neutron_vpnaas.services.vpn.service_drivers \ File "neutron_vpnaas/services/vpn/service_drivers/cisco_ipsec.py", line 16, in from neutron.plugins.cisco.l3.plugging_drivers import ( ImportError: No module named l3.plugging_drivers Non-zero exit code (2) from test listing. error: testr failed (3) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1430166/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1430100] [NEW] vpnaas service doesn't work caused by a refactoring commit
Public bug reported: The refactoring commit 56fd82 moves router_info and NAT rules staffs from l3-agent into vpn device driver, which cause two problems: 1, The router is maintained in the driver, and not the VPN service. The router instance should not be deleted. 2, NAT rules has been moved from l3-agent into vpn device driver, but something in vpn device driver is still refering NAT rules related methods in l3-agent. ** Affects: neutron Importance: Undecided Assignee: Hua Zhang (zhhuabj) Status: New ** Changed in: neutron Assignee: (unassigned) => Hua Zhang (zhhuabj) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1430100 Title: vpnaas service doesn't work caused by a refactoring commit Status in OpenStack Neutron (virtual network service): New Bug description: The refactoring commit 56fd82 moves router_info and NAT rules staffs from l3-agent into vpn device driver, which cause two problems: 1, The router is maintained in the driver, and not the VPN service. The router instance should not be deleted. 2, NAT rules has been moved from l3-agent into vpn device driver, but something in vpn device driver is still refering NAT rules related methods in l3-agent. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1430100/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1420139] [NEW] VPNPluginDbTestCase unit test failed with upstream submit I16b5e5b2
Public bug reported: today I found the unit test case VPNPluginDbTestCase doesn't work as below error log shows. I debug the code and find the reason, that's because the upstream submit I16b5e5b2 ( https://review.openstack.org/#/c/151375/7/neutron/services/provider_configuration.py ), it trys to read services_provider configrations items in neutron-{service}.conf file. on the other hand, VPNPluginDbTestCase still try to override service_provider, so error 'Invalid: Driver neutron_vpnaas.services.vpn.service_drivers.ipsec.IPsecVPNDriver is not unique across providers' is throwed. if not vpnaas_provider: vpnaas_provider = ( constants.VPN + ':vpnaas:neutron_vpnaas.services.vpn.' 'service_drivers.ipsec.IPsecVPNDriver:default') cfg.CONF.set_override('service_provider', [vpnaas_provider], 'service_providers') Traceback (most recent call last): File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/unit/services/vpn/test_vpnaas_driver_plugin.py", line 47, in setUp vpnaas_plugin=VPN_DRIVER_CLASS) File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/unit/db/vpn/test_db_vpnaas.py", line 437, in setUp service_plugins=service_plugins File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/base.py", line 53, in setUp plugin, service_plugins, ext_mgr) File "/bak/openstack/neutron/neutron/tests/unit/test_db_plugin.py", line 120, in setUp self.api = router.APIRouter() File "/bak/openstack/neutron/neutron/api/v2/router.py", line 74, in __init__ plugin = manager.NeutronManager.get_plugin() File "/bak/openstack/neutron/neutron/manager.py", line 222, in get_plugin return weakref.proxy(cls.get_instance().plugin) File "/bak/openstack/neutron/neutron/manager.py", line 216, in get_instance cls._create_instance() File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 431, in inner return f(*args, **kwargs) File "/bak/openstack/neutron/neutron/manager.py", line 202, in _create_instance cls._instance = cls() File "/bak/openstack/neutron/neutron/manager.py", line 128, in __init__ self._load_service_plugins() File "/bak/openstack/neutron/neutron/manager.py", line 175, in _load_service_plugins provider) File "/bak/openstack/neutron/neutron/manager.py", line 143, in _get_plugin_instance return plugin_class() File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/plugin.py", line 44, in __init__ constants.VPN, self) File "/bak/openstack/neutron/neutron/services/service_base.py", line 64, in load_drivers service_type_manager = sdb.ServiceTypeManager.get_instance() File "/bak/openstack/neutron/neutron/db/servicetype_db.py", line 41, in get_instance cls._instance = cls() File "/bak/openstack/neutron/neutron/db/servicetype_db.py", line 45, in __init__ self._load_conf() File "/bak/openstack/neutron/neutron/db/servicetype_db.py", line 49, in _load_conf pconf.parse_service_provider_opt()) File "/bak/openstack/neutron/neutron/services/provider_configuration.py", line 139, in __init__ self.add_provider(prov) File "/bak/openstack/neutron/neutron/services/provider_configuration.py", line 160, in add_provider self._ensure_driver_unique(provider['driver']) File "/bak/openstack/neutron/neutron/services/provider_configuration.py", line 147, in _ensure_driver_unique raise n_exc.Invalid(msg) Invalid: Driver neutron_vpnaas.services.vpn.service_drivers.ipsec.IPsecVPNDriver is not unique across providers ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1420139 Title: VPNPluginDbTestCase unit test failed with upstream submit I16b5e5b2 Status in OpenStack Neutron (virtual network service): New Bug description: today I found the unit test case VPNPluginDbTestCase doesn't work as below error log shows. I debug the code and find the reason, that's because the upstream submit I16b5e5b2 ( https://review.openstack.org/#/c/151375/7/neutron/services/provider_configuration.py ), it trys to read services_provider configrations items in neutron-{service}.conf file. on the other hand, VPNPluginDbTestCase still try to override service_provider, so error 'Invalid: Driver neutron_vpnaas.services.vpn.service_drivers.ipsec.IPsecVPNDriver is not unique across providers' is throwed. if not vpnaas_provider: vpnaas_provider = ( constants.VPN + ':vpnaas:neutron_vpnaas.services.vpn.' 'service_drivers.ipsec.IPsecVPNDriver:default') cfg.CONF.set_override('service_provider', [vpnaas_provider], 'service_provid
[Yahoo-eng-team] [Bug 1418798] [NEW] upstread RouterInfo refactor cause vpnaas unit test failure
Public bug reported: Unit tests for PS25 of strongswan driver (https://review.openstack.org/#/c/144391/ ) failed, it's caused by upstream RouteInfo refactor. == ERROR: test_actions_after_router_added (neutron_vpnaas.tests.unit.services.vpn.test_vpn_service.TestVPNServiceEventHandlers) neutron_vpnaas.tests.unit.services.vpn.test_vpn_service.TestVPNServiceEventHandlers.test_actions_after_router_added -- _StringException: Empty attachments: pythonlogging:'' pythonlogging:'neutron.api.extensions' Traceback (most recent call last): File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/unit/services/vpn/test_vpn_service.py", line 206, in test_actions_after_router_added FAKE_ROUTER_ID, self.conf.root_helper, {}) TypeError: __init__() takes at least 6 arguments (4 given) ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1418798 Title: upstread RouterInfo refactor cause vpnaas unit test failure Status in OpenStack Neutron (virtual network service): New Bug description: Unit tests for PS25 of strongswan driver (https://review.openstack.org/#/c/144391/ ) failed, it's caused by upstream RouteInfo refactor. == ERROR: test_actions_after_router_added (neutron_vpnaas.tests.unit.services.vpn.test_vpn_service.TestVPNServiceEventHandlers) neutron_vpnaas.tests.unit.services.vpn.test_vpn_service.TestVPNServiceEventHandlers.test_actions_after_router_added -- _StringException: Empty attachments: pythonlogging:'' pythonlogging:'neutron.api.extensions' Traceback (most recent call last): File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/unit/services/vpn/test_vpn_service.py", line 206, in test_actions_after_router_added FAKE_ROUTER_ID, self.conf.root_helper, {}) TypeError: __init__() takes at least 6 arguments (4 given) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1418798/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1418656] [NEW] Sometimes vpnservice's status can't be updated
Public bug reported: 2015-02-05 23:56:11.890 12178 ERROR neutron.openstack.common.loopingcall [-] in fixed duration looping call 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall Traceback (most recent call last): 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall File "/bak/openstack/neutron/neutron/openstack/common/loopingcall.py", line 81, in _inner 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall self.f(*self.args, **self.kw) 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 674, in report_status 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall previous_status = self.get_process_status_cache(process) 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 628, in get_process_status_cache 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 'id': process.vpnservice['id'], 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall TypeError: 'NoneType' object has no attribute '__getitem__' 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1418656 Title: Sometimes vpnservice's status can't be updated Status in OpenStack Neutron (virtual network service): New Bug description: 2015-02-05 23:56:11.890 12178 ERROR neutron.openstack.common.loopingcall [-] in fixed duration looping call 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall Traceback (most recent call last): 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall File "/bak/openstack/neutron/neutron/openstack/common/loopingcall.py", line 81, in _inner 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall self.f(*self.args, **self.kw) 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 674, in report_status 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall previous_status = self.get_process_status_cache(process) 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall File "/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py", line 628, in get_process_status_cache 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 'id': process.vpnservice['id'], 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall TypeError: 'NoneType' object has no attribute '__getitem__' 2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1418656/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1414253] [NEW] separate openswan special staff from general vpnaas framework
Public bug reported: initial vpnaas effort puts general vpn framework and openswan stuff into a file (device_drivers.ipsec.py) which will cause other vpn driver implementations import this file, thus they will contain a bunch of openswan stuff. so we had better refactor openswan out in to its own files and give some symmetry to these files ** Affects: neutron Importance: Undecided Assignee: Hua Zhang (zhhuabj) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1414253 Title: separate openswan special staff from general vpnaas framework Status in OpenStack Neutron (virtual network service): New Bug description: initial vpnaas effort puts general vpn framework and openswan stuff into a file (device_drivers.ipsec.py) which will cause other vpn driver implementations import this file, thus they will contain a bunch of openswan stuff. so we had better refactor openswan out in to its own files and give some symmetry to these files To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1414253/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp