[Yahoo-eng-team] [Bug 2054799] Re: Issue with Project administration at Cloud Admin level

2024-04-29 Thread Hua Zhang
** Also affects: cloud-archive/yoga
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/2054799

Title:
  Issue with Project administration at Cloud Admin level

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive yoga series:
  New
Status in OpenStack Dashboard (Horizon):
  Fix Released

Bug description:
  We are not able to see the list of users assigned to a project in Horizon.
  Scenario:
  - Log in as Cloud Admin
  - Set Domain Context (k8s)
  - Go to projects section
  - Click on project Permissions_Roles_Test
  - Go to Users

  Expectation: Get a table with the users assigned to this project.
  Result: Get an error - https://i.imgur.com/TminwUy.png

  
  [Test steps]

  1, Create an ordinary openstack test env with horizon.

  2, Prepared some test data (eg: one domain k8s, one project k8s, and
  one user k8s-admain with the role k8s-admin-role)

  openstack domain create k8s
  openstack role create k8s-admin-role
  openstack project create --domain k8s k8s
  openstack user create --project-domain k8s --project k8s --domain k8s 
--password password k8s-admin
  openstack role add --user k8s-admin --user-domain k8s --project k8s 
--project-domain k8s k8s-admin-role
  $ openstack role assignment list --project k8s --names
  
++---+---+-+++---+
  | Role   | User  | Group | Project | Domain | System | 
Inherited |
  
++---+---+-+++---+
  | k8s-admin-role | k8s-admin@k8s |   | k8s@k8s ||| False  
   |
  
++---+---+-+++---+

  3, Log in horizon dashboard with admin user(eg:
  admin/openstack/admin_domain).

  4, Click 'Identity -> Domains' to set domain context to the domain
  'k8s'.

  5, Click 'Identity -> Project -> k8s project -> Users'.

  6, This is the result, it said 'Unable to disaply the users of this
  project' - https://i.imgur.com/TminwUy.png

  7, These are some logs

  ==> /var/log/apache2/error.log <==
  [Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 
140254008985152] [remote 10.5.3.120:58978] Recoverable error: 
'e900b8934d11458b8eb9db21671c1b11'
  ==> /var/log/apache2/ssl_access.log <==
  10.5.3.120 - - [23/Feb/2024:10:03:11 +] "GET 
/identity/07123041ee0544e0ab32e50dde780afd/detail/?tab=project_details__users 
HTTP/1.1" 200 1125 
"https://10.5.3.120/identity/07123041ee0544e0ab32e50dde780afd/detail/; 
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/119.0.0.0 Safari/537.36"

  
  [Some Analyses]

  This action will call this function in horizon [1].
  This function will firstly get a list of users (api.keystone.user_list) [2], 
then role assignment list (api.keystone.get_project_users_roles) [3].
  Without setting domain context, this works fine.
  However, if setting domain context, the project displayed is in a different 
domain.
  The user list from [2] only contains users of the user's own domain, while 
the role assignment list [3] includes users in another domain since the project 
is in another domain.

  From horizon's debug log, here is an example of user list:
  {"users": [{"email": "juju@localhost", "id": 
"8cd8f92ac2f94149a91488ad66f02382", "name": "admin", "domain_id": 
"103a4eb1712f4eb9873240d5a7f66599", "enabled": true, "password_expires_at": 
null, "options": {}, "links": {"self": 
"https://192.168.1.59:5000/v3/users/8cd8f92ac2f94149a91488ad66f02382"}}], 
"links": {"next": null, "self": "https://192.168.1.59:5000/v3/users;, 
"previous": null}}

  Here is an example of role assignment list:
  {"role_assignments": [{"links": {"assignment": 
"https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/a70745ed9ac047ad88b917f24df3c873/roles/f606fafcb4fd47018aeffec2b07b7e84"},
 "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": 
{"id": "a70745ed9ac047ad88b917f24df3c873"}, "role": {"id": 
"f606fafcb4fd47018aeffec2b07b7e84"}}, {"links": {"assignment": 
"https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/fd7a79e2a4044c17873c08daa9ed37a1/roles/b936a9d998be4500900a5a9174b16b42"},
 "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": 
{"id": "fd7a79e2a4044c17873c08daa9ed37a1"}, "role": {"id": 
"b936a9d998be4500900a5a9174b16b42"}}], "links": {"next": null, "self": 
"https://192.168.1.59:5000/v3/role_assignments?scope.project.id=82e250e8492b49a1a05467994d33ea1b_subtree=True;,
 "previous": null}}

  Then later in the horizon function, it tries to get user details from user 
list for users in role assignment list [4], and fails,
  because users in role assignment list don't exist in user list.

  Horizon throws an error like:
  [Fri Feb 23 

[Yahoo-eng-team] [Bug 2054799] Re: Issue with Project administration at Cloud Admin level

2024-04-29 Thread Hua Zhang
** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/2054799

Title:
  Issue with Project administration at Cloud Admin level

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive yoga series:
  New
Status in OpenStack Dashboard (Horizon):
  Fix Released

Bug description:
  We are not able to see the list of users assigned to a project in Horizon.
  Scenario:
  - Log in as Cloud Admin
  - Set Domain Context (k8s)
  - Go to projects section
  - Click on project Permissions_Roles_Test
  - Go to Users

  Expectation: Get a table with the users assigned to this project.
  Result: Get an error - https://i.imgur.com/TminwUy.png

  
  [Test steps]

  1, Create an ordinary openstack test env with horizon.

  2, Prepared some test data (eg: one domain k8s, one project k8s, and
  one user k8s-admain with the role k8s-admin-role)

  openstack domain create k8s
  openstack role create k8s-admin-role
  openstack project create --domain k8s k8s
  openstack user create --project-domain k8s --project k8s --domain k8s 
--password password k8s-admin
  openstack role add --user k8s-admin --user-domain k8s --project k8s 
--project-domain k8s k8s-admin-role
  $ openstack role assignment list --project k8s --names
  
++---+---+-+++---+
  | Role   | User  | Group | Project | Domain | System | 
Inherited |
  
++---+---+-+++---+
  | k8s-admin-role | k8s-admin@k8s |   | k8s@k8s ||| False  
   |
  
++---+---+-+++---+

  3, Log in horizon dashboard with admin user(eg:
  admin/openstack/admin_domain).

  4, Click 'Identity -> Domains' to set domain context to the domain
  'k8s'.

  5, Click 'Identity -> Project -> k8s project -> Users'.

  6, This is the result, it said 'Unable to disaply the users of this
  project' - https://i.imgur.com/TminwUy.png

  7, These are some logs

  ==> /var/log/apache2/error.log <==
  [Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 
140254008985152] [remote 10.5.3.120:58978] Recoverable error: 
'e900b8934d11458b8eb9db21671c1b11'
  ==> /var/log/apache2/ssl_access.log <==
  10.5.3.120 - - [23/Feb/2024:10:03:11 +] "GET 
/identity/07123041ee0544e0ab32e50dde780afd/detail/?tab=project_details__users 
HTTP/1.1" 200 1125 
"https://10.5.3.120/identity/07123041ee0544e0ab32e50dde780afd/detail/; 
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/119.0.0.0 Safari/537.36"

  
  [Some Analyses]

  This action will call this function in horizon [1].
  This function will firstly get a list of users (api.keystone.user_list) [2], 
then role assignment list (api.keystone.get_project_users_roles) [3].
  Without setting domain context, this works fine.
  However, if setting domain context, the project displayed is in a different 
domain.
  The user list from [2] only contains users of the user's own domain, while 
the role assignment list [3] includes users in another domain since the project 
is in another domain.

  From horizon's debug log, here is an example of user list:
  {"users": [{"email": "juju@localhost", "id": 
"8cd8f92ac2f94149a91488ad66f02382", "name": "admin", "domain_id": 
"103a4eb1712f4eb9873240d5a7f66599", "enabled": true, "password_expires_at": 
null, "options": {}, "links": {"self": 
"https://192.168.1.59:5000/v3/users/8cd8f92ac2f94149a91488ad66f02382"}}], 
"links": {"next": null, "self": "https://192.168.1.59:5000/v3/users;, 
"previous": null}}

  Here is an example of role assignment list:
  {"role_assignments": [{"links": {"assignment": 
"https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/a70745ed9ac047ad88b917f24df3c873/roles/f606fafcb4fd47018aeffec2b07b7e84"},
 "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": 
{"id": "a70745ed9ac047ad88b917f24df3c873"}, "role": {"id": 
"f606fafcb4fd47018aeffec2b07b7e84"}}, {"links": {"assignment": 
"https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/fd7a79e2a4044c17873c08daa9ed37a1/roles/b936a9d998be4500900a5a9174b16b42"},
 "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": 
{"id": "fd7a79e2a4044c17873c08daa9ed37a1"}, "role": {"id": 
"b936a9d998be4500900a5a9174b16b42"}}], "links": {"next": null, "self": 
"https://192.168.1.59:5000/v3/role_assignments?scope.project.id=82e250e8492b49a1a05467994d33ea1b_subtree=True;,
 "previous": null}}

  Then later in the horizon function, it tries to get user details from user 
list for users in role assignment list [4], and fails,
  because users in role assignment list don't exist in user list.

  Horizon throws an error like:
  [Fri Feb 23 

[Yahoo-eng-team] [Bug 2054799] [NEW] Issue with Project administration at Cloud Admin level

2024-02-23 Thread Hua Zhang
Public bug reported:

We are not able to see the list of users assigned to a project in Horizon.
Scenario:
- Log in as Cloud Admin
- Set Domain Context (k8s)
- Go to projects section
- Click on project Permissions_Roles_Test
- Go to Users

Expectation: Get a table with the users assigned to this project.
Result: Get an error - https://i.imgur.com/TminwUy.png


[Test steps]

1, Create an ordinary openstack test env with horizon.

2, Prepared some test data (eg: one domain k8s, one project k8s, and one
user k8s-admain with the role k8s-admin-role)

openstack domain create k8s
openstack role create k8s-admin-role
openstack project create --domain k8s k8s
openstack user create --project-domain k8s --project k8s --domain k8s 
--password password k8s-admin
openstack role add --user k8s-admin --user-domain k8s --project k8s 
--project-domain k8s k8s-admin-role
$ openstack role assignment list --project k8s --names
++---+---+-+++---+
| Role   | User  | Group | Project | Domain | System | 
Inherited |
++---+---+-+++---+
| k8s-admin-role | k8s-admin@k8s |   | k8s@k8s ||| False
 |
++---+---+-+++---+

3, Log in horizon dashboard with admin user(eg:
admin/openstack/admin_domain).

4, Click 'Identity -> Domains' to set domain context to the domain
'k8s'.

5, Click 'Identity -> Project -> k8s project -> Users'.

6, This is the result, it said 'Unable to disaply the users of this
project' - https://i.imgur.com/TminwUy.png

7, These are some logs

==> /var/log/apache2/error.log <==
[Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 140254008985152] 
[remote 10.5.3.120:58978] Recoverable error: 'e900b8934d11458b8eb9db21671c1b11'
==> /var/log/apache2/ssl_access.log <==
10.5.3.120 - - [23/Feb/2024:10:03:11 +] "GET 
/identity/07123041ee0544e0ab32e50dde780afd/detail/?tab=project_details__users 
HTTP/1.1" 200 1125 
"https://10.5.3.120/identity/07123041ee0544e0ab32e50dde780afd/detail/; 
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/119.0.0.0 Safari/537.36"


[Some Analyses]

This action will call this function in horizon [1].
This function will firstly get a list of users (api.keystone.user_list) [2], 
then role assignment list (api.keystone.get_project_users_roles) [3].
Without setting domain context, this works fine.
However, if setting domain context, the project displayed is in a different 
domain.
The user list from [2] only contains users of the user's own domain, while the 
role assignment list [3] includes users in another domain since the project is 
in another domain.

>From horizon's debug log, here is an example of user list:
{"users": [{"email": "juju@localhost", "id": 
"8cd8f92ac2f94149a91488ad66f02382", "name": "admin", "domain_id": 
"103a4eb1712f4eb9873240d5a7f66599", "enabled": true, "password_expires_at": 
null, "options": {}, "links": {"self": 
"https://192.168.1.59:5000/v3/users/8cd8f92ac2f94149a91488ad66f02382"}}], 
"links": {"next": null, "self": "https://192.168.1.59:5000/v3/users;, 
"previous": null}}

Here is an example of role assignment list:
{"role_assignments": [{"links": {"assignment": 
"https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/a70745ed9ac047ad88b917f24df3c873/roles/f606fafcb4fd47018aeffec2b07b7e84"},
 "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": 
{"id": "a70745ed9ac047ad88b917f24df3c873"}, "role": {"id": 
"f606fafcb4fd47018aeffec2b07b7e84"}}, {"links": {"assignment": 
"https://192.168.1.59:5000/v3/projects/82e250e8492b49a1a05467994d33ea1b/users/fd7a79e2a4044c17873c08daa9ed37a1/roles/b936a9d998be4500900a5a9174b16b42"},
 "scope": {"project": {"id": "82e250e8492b49a1a05467994d33ea1b"}}, "user": 
{"id": "fd7a79e2a4044c17873c08daa9ed37a1"}, "role": {"id": 
"b936a9d998be4500900a5a9174b16b42"}}], "links": {"next": null, "self": 
"https://192.168.1.59:5000/v3/role_assignments?scope.project.id=82e250e8492b49a1a05467994d33ea1b_subtree=True;,
 "previous": null}}

Then later in the horizon function, it tries to get user details from user list 
for users in role assignment list [4], and fails,
because users in role assignment list don't exist in user list.

Horizon throws an error like:
[Fri Feb 23 10:03:12.201024 2024] [wsgi:error] [pid 47342:tid 140254008985152] 
[remote 10.5.3.120:58978] Recoverable error: 'e900b8934d11458b8eb9db21671c1b11'

This id is the id of a user, which is used as a key to find a user in the user 
list.
But user list doesn't have this id, so it fails.

[1] 
https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/identity/projects/tabs.py#L85
[2] 
https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/identity/projects/tabs.py#L96
[3] 

[Yahoo-eng-team] [Bug 1996594] [NEW] OVN metadata randomly stops working

2022-11-15 Thread Hua Zhang
Public bug reported:

We found that OVN metadata will not work randomly when OVN is writing a
snapshot.

1, At 12:30:35, OVN started to transfer leadership to write a snapshot

$ find sosreport-juju-2752e1-*/var/log/ovn/* |xargs zgrep -i -E 'Transferring 
leadership'
sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.322Z|80962|raft|INFO|Transferring
 leadership to write a snapshot.
sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T17:52:53.024Z|82382|raft|INFO|Transferring
 leadership to write a snapshot.
sosreport-juju-2752e1-7-lxd-27-xxx-2022-08-18-hhxxqci/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.330Z|92698|raft|INFO|Transferring
 leadership to write a snapshot.

2, At 12:30:36, neutron-ovn-metadata-agent reported OVSDB Error

$ find sosreport-srv1*/var/log/neutron/* |xargs zgrep -i -E 'OVSDB Error'
sosreport-srv1xxx2d-xxx-2022-08-18-cuvkufw/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18
 12:30:36.103 75556 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: 
no error details available
sosreport-srv1xxx6d-xxx-2022-08-18-bgnovqu/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18
 12:30:36.104 2171 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: 
no error details available

3, At 12:57:53, we saw the error 'No port found in network', then we
will hit the problem that OVN metadata does not work randomly

2022-08-18 12:57:53.800 3730 ERROR neutron.agent.ovn.metadata.server [-]
No port found in network 63e2c276-60dd-40e3-baa1-c16342eacce2 with IP
address 100.94.98.135

After the problem occurs, restarting neutron-ovn-metadata-agent or
restarting haproxy instance as follows can be used as a workaround.

/usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec
ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2 haproxy -f
/var/lib/neutron/ovn-metadata-
proxy/63e2c276-60dd-40e3-baa1-c16342eacce2.conf

One lp bug #1990978 [1] is trying to reducing the frequency of transfers, it 
should be beneficial to this problem.
But it only reduces the occurrence of problems, not completely avoiding them. I 
wonder if we need to add some retry logic on the neutron side

NOTE: The openstack version we are using is focal-xena, and
openvswitch's version is 2.16.0-0ubuntu2.1~cloud0

[1] https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1990978

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1996594

Title:
  OVN metadata randomly stops working

Status in neutron:
  New

Bug description:
  We found that OVN metadata will not work randomly when OVN is writing
  a snapshot.

  1, At 12:30:35, OVN started to transfer leadership to write a snapshot

  $ find sosreport-juju-2752e1-*/var/log/ovn/* |xargs zgrep -i -E 'Transferring 
leadership'
  
sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.322Z|80962|raft|INFO|Transferring
 leadership to write a snapshot.
  
sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T17:52:53.024Z|82382|raft|INFO|Transferring
 leadership to write a snapshot.
  
sosreport-juju-2752e1-7-lxd-27-xxx-2022-08-18-hhxxqci/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.330Z|92698|raft|INFO|Transferring
 leadership to write a snapshot.

  2, At 12:30:36, neutron-ovn-metadata-agent reported OVSDB Error

  $ find sosreport-srv1*/var/log/neutron/* |xargs zgrep -i -E 'OVSDB Error'
  
sosreport-srv1xxx2d-xxx-2022-08-18-cuvkufw/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18
 12:30:36.103 75556 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: 
no error details available
  
sosreport-srv1xxx6d-xxx-2022-08-18-bgnovqu/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18
 12:30:36.104 2171 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: 
no error details available

  3, At 12:57:53, we saw the error 'No port found in network', then we
  will hit the problem that OVN metadata does not work randomly

  2022-08-18 12:57:53.800 3730 ERROR neutron.agent.ovn.metadata.server
  [-] No port found in network 63e2c276-60dd-40e3-baa1-c16342eacce2 with
  IP address 100.94.98.135

  After the problem occurs, restarting neutron-ovn-metadata-agent or
  restarting haproxy instance as follows can be used as a workaround.

  /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec
  ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2 haproxy -f
  /var/lib/neutron/ovn-metadata-
  proxy/63e2c276-60dd-40e3-baa1-c16342eacce2.conf

  One lp bug #1990978 [1] is trying to reducing the frequency of transfers, it 
should be beneficial to this problem.
  But it only reduces the occurrence of problems, not completely avoiding them. 
I wonder if we need to add some retry 

[Yahoo-eng-team] [Bug 1947127] Re: Some DNS extensions not working with OVN

2022-05-04 Thread Hua Zhang
** Patch added: "focal-xena.debdiff"
   
https://bugs.launchpad.net/neutron/+bug/1947127/+attachment/5586824/+files/focal-xena.debdiff

** Summary changed:

- Some DNS extensions not working with OVN
+ [SRU] Some DNS extensions not working with OVN

** Changed in: cloud-archive
   Status: Confirmed => Fix Released

** Tags added: sts sts-sru-needed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1947127

Title:
  [SRU] Some DNS extensions not working with OVN

Status in Ubuntu Cloud Archive:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New

Bug description:
  [Impact]

  On a fresh devstack install with the q-dns service enable from the
  neutron devstack plugin, some features still don't work, e.g.:

  $ openstack subnet set private-subnet --dns-publish-fixed-ip
  BadRequestException: 400: Client Error for url: 
https://10.250.8.102:9696/v2.0/subnets/9f50c79e-6396-4c5b-be92-f64aa0f25beb, 
Unrecognized attribute(s) 'dns_publish_fixed_ip'

  $ openstack port create p1 --network private --dns-name p1 --dns-domain a.b.
  BadRequestException: 400: Client Error for url: 
https://10.250.8.102:9696/v2.0/ports, Unrecognized attribute(s) 'dns_domain'

  The reason seems to be that
  
https://review.opendev.org/c/openstack/neutron/+/686343/31/neutron/common/ovn/extensions.py
  only added dns_domain_keywords, but not e.g. dns_domain_ports as
  supported by OVN

  [Test Case]

  Create a normal OpenStack neutron test environment to see if we can
  successfully run the following commands:

  openstack subnet set private_subnet --dns-publish-fixed-ip
  openstack port create p1 --network private --dns-name p1 --dns-domain a.b.

  [Regression Potential]

  The fix has merged into the upstream stable/xena branch [1], here's
  just SRU into the 19.1.0 branch of UCA xena, so it is a clean backport
  and might be helpful for deployments migrating to OVN.

  [1] https://review.opendev.org/c/openstack/neutron/+/838650

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1947127/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1948656] Re: toggling explicitly_egress_direct from true to false does not clean the openflow flows on the integration bridge

2021-10-26 Thread Hua Zhang
Successfully find a better workaround to take advantage of
delete_accepted_egress_direct_flow in
_unbind_distributed_router_interface_port [1].

# eg: mac of the old snat-xxx port is fa:16:3e:7a:11:7d
neutron router-interface-delete provider-router $(openstack subnet show 
private_subnet -cid -fvalue)
# eg: mac of the new snat-xxx port is fa:16:3e:e6:f9:b2
neutron router-interface-add provider-router $(openstack subnet show 
private_subnet -cid -fvalue)
openstack port list --device-owner network:router_centralized_snat

The code path is:

process_deleted_ports -> port_unbound -> unbind_port_from_dvr ->
_unbind_centralized_snat_port_on_dvr_subnet ->
delete_accepted_egress_direct_flow

The egress direct flow for the old snat-xxx port won't disappear

# ovs-ofctl dump-flows br-int |grep fa:16:3e:7a:11:7d |grep -E 
'priority=12|priority=10'
 cookie=0x59874eed7c9fa42a, duration=76882.302s, table=94, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=12,reg6=0x1,dl_dst=fa:16:3e:7a:11:7d actions=output:16
 cookie=0x59874eed7c9fa42a, duration=76882.302s, table=94, n_packets=0, 
n_bytes=0, idle_age=65534, hard_age=65534, 
priority=10,reg6=0x1,dl_src=fa:16:3e:7a:11:7d,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00
 actions=mod_vlan_vid:1,output:2

but the egress direct flow won't produce as well

# ovs-ofctl dump-flows br-int |grep 'fa:16:3e:e6:f9:b2' |grep -E
'priority=12|priority=10'

So north-south traffic will resume to work again.

# ip netns exec snat-10140acd-28e6-4110-ae67-76115b72b37c ping -c1 
192.168.21.114
PING 192.168.21.114 (192.168.21.114) 56(84) bytes of data.
64 bytes from 192.168.21.114: icmp_seq=1 ttl=64 time=1.86 ms

[1]
https://review.opendev.org/c/openstack/neutron/+/704506/1/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#678

** Changed in: neutron
   Status: Triaged => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1948656

Title:
  toggling explicitly_egress_direct from true to false does not clean
  the openflow flows on the integration bridge

Status in neutron:
  Invalid

Bug description:
  As the comment [1] says, the following flows are not clearup after
  explicitly_egress_direct is toggled from true to false

  # ovs-ofctl dump-flows br-int |grep fa:16:3e:7a:11:7d |grep -E 
'priority=12|priority=10' 
   cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, 
n_bytes=0, idle_age=2148, priority=12,reg6=0x1,dl_dst=fa:16:3e:7a:11:7d 
actions=output:16
   cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, 
n_bytes=0, idle_age=2148, 
priority=10,reg6=0x1,dl_src=fa:16:3e:7a:11:7d,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00
 actions=mod_vlan_vid:1,output:2

  There seems to be no way to trigger delete_accepted_egress_direct_flow
  [2] for above snat-xxx port (fa:16:3e:7a:11:7d).

  [1] https://bugs.launchpad.net/neutron/+bug/1945306/comments/9
  [2] 
https://review.opendev.org/c/openstack/neutron/+/704506/1/neutron/agent/linux/openvswitch_firewall/firewall.py#1140

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1948656/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1948656] [NEW] toggling explicitly_egress_direct from true to false does not clean flows

2021-10-25 Thread Hua Zhang
Public bug reported:

As the comment [1] says, the following flows are not clearup after
explicitly_egress_direct is toggled from true to false

# ovs-ofctl dump-flows br-int |grep fa:16:3e:7a:11:7d |grep -E 
'priority=12|priority=10' 
 cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, 
n_bytes=0, idle_age=2148, priority=12,reg6=0x1,dl_dst=fa:16:3e:7a:11:7d 
actions=output:16
 cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, 
n_bytes=0, idle_age=2148, 
priority=10,reg6=0x1,dl_src=fa:16:3e:7a:11:7d,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00
 actions=mod_vlan_vid:1,output:2

There seems to be no way to trigger delete_accepted_egress_direct_flow
[2] for above snat-xxx port (fa:16:3e:7a:11:7d).

[1] https://bugs.launchpad.net/neutron/+bug/1945306/comments/9
[2] 
https://review.opendev.org/c/openstack/neutron/+/704506/1/neutron/agent/linux/openvswitch_firewall/firewall.py#1140

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: sts

** Tags added: sts

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1948656

Title:
  toggling explicitly_egress_direct from true to false does not clean
  flows

Status in neutron:
  New

Bug description:
  As the comment [1] says, the following flows are not clearup after
  explicitly_egress_direct is toggled from true to false

  # ovs-ofctl dump-flows br-int |grep fa:16:3e:7a:11:7d |grep -E 
'priority=12|priority=10' 
   cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, 
n_bytes=0, idle_age=2148, priority=12,reg6=0x1,dl_dst=fa:16:3e:7a:11:7d 
actions=output:16
   cookie=0x59874eed7c9fa42a, duration=1372.227s, table=94, n_packets=0, 
n_bytes=0, idle_age=2148, 
priority=10,reg6=0x1,dl_src=fa:16:3e:7a:11:7d,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00
 actions=mod_vlan_vid:1,output:2

  There seems to be no way to trigger delete_accepted_egress_direct_flow
  [2] for above snat-xxx port (fa:16:3e:7a:11:7d).

  [1] https://bugs.launchpad.net/neutron/+bug/1945306/comments/9
  [2] 
https://review.opendev.org/c/openstack/neutron/+/704506/1/neutron/agent/linux/openvswitch_firewall/firewall.py#1140

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1948656/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1945306] [NEW] north-south traffic not working when VM and main router are not on the same host

2021-09-28 Thread Hua Zhang
Public bug reported:

Some newly created VM's are not able to reach "outside" resources (e.g.
apt repositories) on the l3ha + dvr env, this problem can be easily
reproduced as long as VM and main router are not on the same host, and
'apt update' command can not be run inside VM, so the north-south
traffic is broken.

Here are steps to easily reproduce it.

1, set up wallaby or ussuri vrrp + dvr env (it works on train, not work on 
ussuri and wallaby)
2, create a test vm, query host by: nova show  |grep host
3, query main router by: neutron l3-agent-list-hosting-router $(openstack 
router show provider-router -fvalue -cid)
4, make sure VM and main router are not on the same host
5, on main router host, it will fail to run: ip netns exec snat-xxx ping 
 -c1

I've done some bisect, I found:

15.3.4 (bionic-train)  - no problem
1c2e10f859 - no problem
16.4.0 (bionic-ussuri) - has problem
16.0.0-0ubuntu3- has problem, and also have multiple active routers 
problem
16.0.0~b3~git2020041516.5f42488a9a-0ubuntu2 - BAD version, all routers are in 
standby state so we can't do any test
16.1.0 (focal) - has problem, and also have multiple active routers problem
16.2.0 (focal) - has problem
16.3.0 (focal) - has problem
16.4.0 (focal-ussuri) - has problem
focal-wallaby - has problem

Because I often have multiple standby issue with some commit id (eg:
14dd3e95ca) so that I can't continue bisect.

I also used 'ovs-appctl ofproto/trace' and tcpdump to do some debugs,
the results are as follows.

train - works
sg-xxx -> vm - https://pastebin.ubuntu.com/p/MHNVf8wXtb/
tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/Fqxp4mvkgV/
tcpdump on vm's tap - https://pastebin.ubuntu.com/p/YppWc2Pg33/
tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/MPmQ5xbnT2/ - can get 
icmp reply

ussuri - not work
sg-xxx -> vm - https://pastebin.ubuntu.com/p/hKfSB9gmd9/
tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/NCcnGS4gdj/ - sg-xxx 
can't get icmp reply
tcpdump on vm's tap - https://pastebin.ubuntu.com/p/DHdVbB66NT/   - VM can't 
get sg-xxx's arp reply
tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/4hJ7vdRRC4/ - can't get 
arp reply

It looks like VM can't get arp reply for sg-xxx interface,

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

  Some newly created VM's are not able to reach "outside" resources (e.g.
- apt repositories) on then l3ha + dvr env, I can easily reproduce this
- problem as long as VM and main router are not on the same host, and 'apt
- update' command can not be run inside VM, so the north-south traffic is
- broken.
+ apt repositories) on the l3ha + dvr env, this problem can be easily
+ reproduced as long as VM and main router are not on the same host, and
+ 'apt update' command can not be run inside VM, so the north-south
+ traffic is broken.
  
  Here are steps to easily reproduce it.
  
  1, set up wallaby or ussuri vrrp + dvr env (it works on train, not work on 
ussuri and wallaby)
  2, create a test vm, query host by: nova show  |grep host
  3, query main router by: neutron l3-agent-list-hosting-router $(openstack 
router show provider-router -fvalue -cid)
  4, make sure VM and main router are not on the same host
- 5, on main router host, it will fail to run: ip netns exec snat-xxx ping 
 -c1 
+ 5, on main router host, it will fail to run: ip netns exec snat-xxx ping 
 -c1
  
  I've done some bisect, I found:
  
  15.3.4 (bionic-train)  - no problem
  1c2e10f859 - no problem
  16.4.0 (bionic-ussuri) - has problem
  16.0.0-0ubuntu3- has problem, and also have multiple active routers 
problem
  16.0.0~b3~git2020041516.5f42488a9a-0ubuntu2 - BAD version, all routers are in 
standby state so we can't do any test
  16.1.0 (focal) - has problem, and also have multiple active routers problem
  16.2.0 (focal) - has problem
  16.3.0 (focal) - has problem
  16.4.0 (focal-ussuri) - has problem
  focal-wallaby - has problem
  
  Because I often have multiple standby issue with some commit id (eg:
  14dd3e95ca) so that I can't continue bisect.
  
  I also used 'ovs-appctl ofproto/trace' and tcpdump to do some debugs,
  the results are as follows.
  
  train - works
  sg-xxx -> vm - https://pastebin.ubuntu.com/p/MHNVf8wXtb/
  tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/Fqxp4mvkgV/
  tcpdump on vm's tap - https://pastebin.ubuntu.com/p/YppWc2Pg33/
  tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/MPmQ5xbnT2/ - can get 
icmp reply
  
  ussuri - not work
  sg-xxx -> vm - https://pastebin.ubuntu.com/p/hKfSB9gmd9/
  tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/NCcnGS4gdj/ - sg-xxx 
can't get icmp reply
  tcpdump on vm's tap - https://pastebin.ubuntu.com/p/DHdVbB66NT/   - VM can't 
get sg-xxx's arp reply
  tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/4hJ7vdRRC4/ - can't get 
arp reply
  
  It looks like VM can't get arp reply for sg-xxx interface,

-- 
You received this bug notification because you are 

[Yahoo-eng-team] [Bug 1681627] Re: [SRU] Page not found error on refreshing bowser (in AngularJS-based detail page)

2019-04-22 Thread Hua Zhang
** Changed in: cloud-archive/pike
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1681627

Title:
  [SRU] Page not found error on refreshing bowser (in AngularJS-based
  detail page)

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive ocata series:
  New
Status in Ubuntu Cloud Archive pike series:
  Fix Released
Status in OpenStack Dashboard (Horizon):
  Fix Released
Status in Zun UI:
  Fix Released

Bug description:
  [Impact]

  When clicking instances snapshot detail on Images menu and then refresh the 
pages will get an error: 
  ``` 
  The page you were looking for doesn't exist 
  You may have mistyped the address or the page may have moved. 
  ``` 

  [Test Case]

  1. Deply a OpenStack env with horizon
  2. Click instances snapshot detail on Images menu
  3. Refresh the page
  4. Check if you will see the error 'The page you were looking for doesn't 
exist'

  [Regression Potential]

  This problem has been fixed in Queens with two patches [1][2], we need
  to backport them into Ocata as well.

  But in fact, directly backporting these two primitive patches [1][2]
  into Ocata will not be able to work, because:

  1, In Ocata release, getDetailsPath returns
  "'project/ngdetails/OS::Glance::Image/' + item.id;"

  
https://github.com/openstack/horizon/blob/stable/ocata/openstack_dashboard/static/app/core/images/images.service.js#L59

  function getDetailsPath(item) {
  return 'project/ngdetails/OS::Glance::Image/' + item.id;
  }

  2, In > Ocata release, eg: Pike release, getDetailsPath returns
  "detailRoute + 'OS::Glance::Image/' + item.id"

  
https://github.com/openstack/horizon/blob/stable/pike/openstack_dashboard/static/app/core/images/images.service.js#L69

  function getDetailsPath(item) {
  return detailRoute + 'OS::Glance::Image/' + item.id;
  }

  So we will see the error 'The current URL,
  project/ngdetails/OS::Glance::Image/46ef8cab-dfc3-4690-8abb-
  d416978d237e, didn't match any of these.' when backporting two
  primitive patches into Ocata. So the following simple changes need to
  be made in urls.py in addition to the primitive backport patches as
  well.

  -ngdetails_url = url(r'^ngdetails/',
  +ngdetails_url = url(r'^project/ngdetails/',

  [1] https://review.openstack.org/#/c/541676/
  [2] https://review.openstack.org/#/c/553970/

  [Original Bug Report]

  Once I get into the container detail view, refresh the browser will
  show a page not found error:

The current URL, ngdetails/OS::Zun::Container/c54ba416-a955-45b2
  -848b-aee57b748e08, didn't match any of these

  Full output: http://paste.openstack.org/show/605296/

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1681627/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1744079] Re: [SRU] disk over-commit still not correctly calculated during live migration

2018-10-24 Thread Hua Zhang
** Also affects: nova (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: nova (Ubuntu)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1744079

Title:
  [SRU] disk over-commit still not correctly calculated during live
  migration

Status in Ubuntu Cloud Archive:
  New
Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  [Impact]
  nova compares disk space with disk_available_least field, which is possible 
to be negative, due to overcommit.

  So the migration may fail because of a "Migration pre-check error:
  Unable to migrate dfcd087a-5dff-439d-8875-2f702f081539: Disk of
  instance is too large(available on destination host:-3221225472 <
  need:22806528)" when trying a migration to another compute that has
  plenty of free space in his disk.

  [Test Case]
  Deploy openstack environment. Make sure there is a negative 
disk_available_least and a adequate free_disk_gb in one test compute node, then 
migrate a VM to it with disk-overcommit (openstack server migrate --live 
 --block-migration --disk-overcommit ). You will 
see above migration pre-check error.

  This is the formula to compute disk_available_least and free_disk_gb.

  disk_free_gb = disk_info_dict['free']
  disk_over_committed = self._get_disk_over_committed_size_total()
  available_least = disk_free_gb * units.Gi - disk_over_committed
  data['disk_available_least'] = available_least / units.Gi

  The following command can be used to query the value of
  disk_available_least

  nova hypervisor-show  |grep disk

  Steps to Reproduce:
  1. set disk_allocation_ratio config option > 1.0 
  2. qemu-img resize cirros-0.3.0-x86_64-disk.img +40G
  3. glance image-create --disk-format qcow2 ...
  4. boot VMs based on resized image
  5. we see disk_available_least becomes negative

  [Regression Potential]
  Minimal - we're just changing from the following line:

  disk_available_gb = dst_compute_info['disk_available_least']

  to the following codes:

  if disk_over_commit:
  disk_available_gb = dst_compute_info['free_disk_gb']
  else:
  disk_available_gb = dst_compute_info['disk_available_least']

  When enabling overcommit, disk_available_least is possible to be
  negative, so we should use free_disk_gb instead of it by backporting
  the following two fixes.

  
https://git.openstack.org/cgit/openstack/nova/commit/?id=e097c001c8e0efe8879da57264fcb7bdfdf2
  
https://git.openstack.org/cgit/openstack/nova/commit/?id=e2cc275063658b23ed88824100919a6dfccb760d

  This is the code path for check_can_live_migrate_destination:

  _migrate_live(os-migrateLive API, migrate_server.py) -> migrate_server
  -> _live_migrate -> _build_live_migrate_task ->
  _call_livem_checks_on_host -> check_can_live_migrate_destination

  BTW, redhat also has a same bug -
  https://bugzilla.redhat.com/show_bug.cgi?id=1477706

  
  [Original Bug Report]
  Change I8a705114d47384fcd00955d4a4f204072fed57c2 (written by me... sigh) 
addressed a bug which prevented live migration to a target host with 
overcommitted disk when made with microversion <2.25. It achieved this, but the 
fix is still not correct. We now do:

  if disk_over_commit:
  disk_available_gb = dst_compute_info['local_gb']

  Unfortunately local_gb is *total* disk, not available disk. We
  actually want free_disk_gb. Fun fact: due to the way we calculate this
  for filesystems, without taking into account reserved space, this can
  also be negative.

  The test we're currently running is: could we fit this guest's
  allocated disks on the target if the target disk was empty. This is at
  least better than it was before, as we don't spuriously fail early. In
  fact, we're effectively disabling a test which is disabled for
  microversion >=2.25 anyway. IOW we should fix it, but it's probably
  not a high priority.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744079/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1744079] Re: [SRU] disk over-commit still not correctly calculated during live migration

2018-10-23 Thread Hua Zhang
** Tags added: sts-sponsor

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1744079

Title:
  [SRU] disk over-commit still not correctly calculated during live
  migration

Status in Ubuntu Cloud Archive:
  New
Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  [Impact]
  nova compares disk space with disk_available_least field, which is possible 
to be negative, due to overcommit.

  So the migration may fail because of a "Migration pre-check error:
  Unable to migrate dfcd087a-5dff-439d-8875-2f702f081539: Disk of
  instance is too large(available on destination host:-3221225472 <
  need:22806528)" when trying a migration to another compute that has
  plenty of free space in his disk.

  [Test Case]
  Deploy openstack environment. Make sure there is a negative 
disk_available_least and a adequate free_disk_gb in one test compute node, then 
migrate a VM to it with disk-overcommit (openstack server migrate --live 
 --block-migration --disk-overcommit ). You will 
see above migration pre-check error.

  This is the formula to compute disk_available_least and free_disk_gb.

  disk_free_gb = disk_info_dict['free']
  disk_over_committed = self._get_disk_over_committed_size_total()
  available_least = disk_free_gb * units.Gi - disk_over_committed
  data['disk_available_least'] = available_least / units.Gi

  The following command can be used to query the value of
  disk_available_least

  nova hypervisor-show  |grep disk

  Steps to Reproduce:
  1. set disk_allocation_ratio config option > 1.0 
  2. qemu-img resize cirros-0.3.0-x86_64-disk.img +40G
  3. glance image-create --disk-format qcow2 ...
  4. boot VMs based on resized image
  5. we see disk_available_least becomes negative

  [Regression Potential]
  Minimal - we're just changing from the following line:

  disk_available_gb = dst_compute_info['disk_available_least']

  to the following codes:

  if disk_over_commit:
  disk_available_gb = dst_compute_info['free_disk_gb']
  else:
  disk_available_gb = dst_compute_info['disk_available_least']

  When enabling overcommit, disk_available_least is possible to be
  negative, so we should use free_disk_gb instead of it by backporting
  the following two fixes.

  
https://git.openstack.org/cgit/openstack/nova/commit/?id=e097c001c8e0efe8879da57264fcb7bdfdf2
  
https://git.openstack.org/cgit/openstack/nova/commit/?id=e2cc275063658b23ed88824100919a6dfccb760d

  This is the code path for check_can_live_migrate_destination:

  _migrate_live(os-migrateLive API, migrate_server.py) -> migrate_server
  -> _live_migrate -> _build_live_migrate_task ->
  _call_livem_checks_on_host -> check_can_live_migrate_destination

  BTW, redhat also has a same bug -
  https://bugzilla.redhat.com/show_bug.cgi?id=1477706

  
  [Original Bug Report]
  Change I8a705114d47384fcd00955d4a4f204072fed57c2 (written by me... sigh) 
addressed a bug which prevented live migration to a target host with 
overcommitted disk when made with microversion <2.25. It achieved this, but the 
fix is still not correct. We now do:

  if disk_over_commit:
  disk_available_gb = dst_compute_info['local_gb']

  Unfortunately local_gb is *total* disk, not available disk. We
  actually want free_disk_gb. Fun fact: due to the way we calculate this
  for filesystems, without taking into account reserved space, this can
  also be negative.

  The test we're currently running is: could we fit this guest's
  allocated disks on the target if the target disk was empty. This is at
  least better than it was before, as we don't spuriously fail early. In
  fact, we're effectively disabling a test which is disabled for
  microversion >=2.25 anyway. IOW we should fix it, but it's probably
  not a high priority.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1744079/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1793102] [NEW] ha_vrrp_health_check_interval causes constantly VRRP transitions

2018-09-18 Thread Hua Zhang
Public bug reported:

Commit 185d6cbc648fd041402a5034b04b818da5c7136e added support for
keepalived VRRP health check, but it will cause constantly VRRP
transitions if you actually enable the option
ha_vrrp_health_check_interval.

It seems to be because keepalived can't run ha_check_script_1.sh well,
while we can run ha_check_script_1.sh well by hand.

Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
/var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh
 exited with status 1
Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Script(ha_health_check_1) failed
Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) Entering FAULT STATE
Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) removing protocol Virtual Routes
Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) removing protocol VIPs.
Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) removing protocol E-VIPs.
Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) Now in FAULT state

root@juju-23f84c-queens-dvr-5:~# ll 
/var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh
-r-x-w 1 neutron neutron 109 Sep 18 03:45 
/var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh*

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1793102

Title:
  ha_vrrp_health_check_interval causes constantly VRRP transitions

Status in neutron:
  New

Bug description:
  Commit 185d6cbc648fd041402a5034b04b818da5c7136e added support for
  keepalived VRRP health check, but it will cause constantly VRRP
  transitions if you actually enable the option
  ha_vrrp_health_check_interval.

  It seems to be because keepalived can't run ha_check_script_1.sh well,
  while we can run ha_check_script_1.sh well by hand.

  Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
/var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh
 exited with status 1
  Sep 18 08:19:41 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Script(ha_health_check_1) failed
  Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) Entering FAULT STATE
  Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) removing protocol Virtual Routes
  Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) removing protocol VIPs.
  Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) removing protocol E-VIPs.
  Sep 18 08:19:43 juju-23f84c-queens-dvr-5 Keepalived_vrrp[8448]: 
VRRP_Instance(VR_1) Now in FAULT state

  root@juju-23f84c-queens-dvr-5:~# ll 
/var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh
  -r-x-w 1 neutron neutron 109 Sep 18 03:45 
/var/lib/neutron/ha_confs/909c6b55-9bc6-476f-9d28-c32d031c41d7/ha_check_script_1.sh*

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1793102/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1374508] Re: Mismatch happens between BDM and domain XML If instance does not respond to ACPI hotplug during detach/attach.

2018-01-22 Thread Hua Zhang
** Also affects: nova (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1374508

Title:
  Mismatch happens between BDM and domain XML If instance does not
  respond to ACPI hotplug during detach/attach.

Status in OpenStack Compute (nova):
  Fix Released
Status in nova package in Ubuntu:
  New
Status in nova source package in Trusty:
  New

Bug description:
  
tempest.api.compute.servers.test_server_rescue_negative:ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume

  This test passes however it fails to properly cleanup after itself -
  the detach completes but without running the necessary iscsiadm
  commands.

  In nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver.disconnect_volume
  the list returned by self.connection._get_all_block_devices includes
  the host_device which means that self._disconnect_from_iscsi_portal is
  never run.

  
  You can see evidence of this in /etc/iscsi/nodes as well as errors logged in 
/var/log/syslog

  I'm guessing there is a race between the unrescue and the detach
  within libvirt. In
  nova.virt.libvirt.driver.LibvirtDriver.detach_volume if I put in a
  sleep before virt_dom.detachDeviceFlags(xml, flags) the detach appears
  to work properly however if I sleep after that line it does not appear
  to have any effect.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1374508/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1635554] Re: Delete Router / race condition

2017-08-11 Thread Hua Zhang
** Changed in: neutron
   Status: Invalid => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1635554

Title:
  Delete Router /  race condition

Status in neutron:
  Confirmed

Bug description:
  When deleting a router the logfile is filled up.

  
  CentOS7
  Newton(RDO)


  2016-10-21 09:45:02.526 16200 DEBUG neutron.agent.linux.utils [-] Exit code: 
0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:140
  2016-10-21 09:45:02.526 16200 WARNING neutron.agent.l3.namespaces [-] 
Namespace qrouter-8cf5-5c5c-461c-84f3-c8abeca8f79a does not exist. Skipping 
delete
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent [-] Error while 
deleting router 8cf5-5c5c-461c-84f3-c8abeca8f79a
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent Traceback (most 
recent call last):
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 357, in 
_safe_router_removed
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent 
self._router_removed(router_id)
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 376, in 
_router_removed
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent ri.delete(self)
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 381, in 
delete
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent 
self.destroy_state_change_monitor(self.process_monitor)
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 325, in 
destroy_state_change_monitor
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent pm = 
self._get_state_change_monitor_process_manager()
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 296, in 
_get_state_change_monitor_process_manager
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent 
default_cmd_callback=self._get_state_change_monitor_callback())
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 299, in 
_get_state_change_monitor_callback
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent ha_device = 
self.get_ha_device_name()
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent   File 
"/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 137, in 
get_ha_device_name
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent return 
(HA_DEV_PREFIX + self.ha_port['id'])[:self.driver.DEV_NAME_LEN]
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent TypeError: 
'NoneType' object has no attribute '__getitem__'
  2016-10-21 09:45:02.527 16200 ERROR neutron.agent.l3.agent
  2016-10-21 09:45:02.528 16200 DEBUG neutron.agent.l3.agent [-] Finished a 
router update for 8cf5-5c5c-461c-84f3-c8abeca8f79a _process_router_update 
/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py:504

  
  See full log
  http://paste.openstack.org/show/586656/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1635554/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1681998] [NEW] Bypass the dirty BDM enty no matter how it is produced

2017-04-11 Thread Hua Zhang
Public bug reported:

Sometimes the following dirty BDM enty (1.row) can be seen in the
database that multiple BDMs with the same image_id and instance_uuid.

mysql> select * from block_device_mapping where 
volume_id='153bcab4-1f88-440c-9782-3c661a7502a8' \G
*** 1. row ***
   created_at: 2017-02-02 02:28:45
   updated_at: NULL
   deleted_at: NULL
   id: 9754
  device_name: /dev/vdb
delete_on_termination: 0
  snapshot_id: NULL
volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8
  volume_size: NULL
no_device: NULL
  connection_info: NULL
instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc
  deleted: 0
  source_type: volume
 destination_type: volume
 guest_format: NULL
  device_type: NULL
 disk_bus: NULL
   boot_index: NULL
 image_id: NULL
*** 2. row ***
   created_at: 2017-02-02 02:29:31
   updated_at: 2017-02-27 10:59:42
   deleted_at: NULL
   id: 9757
  device_name: /dev/vdc
delete_on_termination: 0
  snapshot_id: NULL
volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8
  volume_size: NULL
no_device: NULL
  connection_info: {"driver_volume_type": "rbd", "serial": 
"153bcab4-1f88-440c-9782-3c661a7502a8", "data": {"secret_type": "ceph", "name": 
"cinder-ceph/volume-153bcab4-1f88-440c-9782-3c661a7502a8", "secret_uuid": null, 
"qos_specs": null, "hosts": ["10.7.1.202", "10.7.1.203", "10.7.1.204"], 
"auth_enabled": true, "access_mode": "rw", "auth_username": "cinder-ceph", 
"ports": ["6789", "6789", "6789"]}}
instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc
  deleted: 0
  source_type: volume
 destination_type: volume
 guest_format: NULL
  device_type: disk
 disk_bus: virtio
   boot_index: NULL
 image_id: NULL

then it cause we fail to detach the volume and see the following error
since connection_info of row 1 is NULL.

2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher 
self._detach_volume(context, instance, bdm)
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File 
"/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4801, in 
_detach_volume
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher 
connection_info = jsonutils.loads(bdm.connection_info)
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File 
"/usr/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 215, 
in loads
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher return 
json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File 
"/usr/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in 
safe_decode
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher raise 
TypeError("%s can't be decoded" % type(text))
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher TypeError: 
 can't be decoded

This kind of dirty data can be produced when happened to fail to run this line 
_attach_volume()#volume_bdm.destroy() [1], I think these conditions may cause 
it to happen:
1, lose the database during the operation volume_bdm.destroy()
2, lose an MQ connection or RPC timing out during the operation 
volume_bdm.destroy()

If you lose the database during any operation, things are going to be
bad, so in general I'm not sure how realistic guarding for that case is.
Losing an MQ connection or RPC timing out is probably more realistic.
Seems the fix [2] is trying to solve the point 2.

However, I'm thinking if we can bypass the dirty BDM entry according to
the condition that connection_info is NULL no matter how it is produced.


[1] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3724
[2] https://review.openstack.org/#/c/290793

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1681998

Title:
  Bypass the dirty BDM enty no matter how it is produced

Status in OpenStack Compute (nova):
  New

Bug description:
  Sometimes the following dirty BDM enty (1.row) can be seen in the
  database that multiple BDMs with the same image_id and instance_uuid.

  mysql> select * from block_device_mapping where 
volume_id='153bcab4-1f88-440c-9782-3c661a7502a8' \G
  *** 1. row ***
 created_at: 2017-02-02 02:28:45
 updated_at: NULL
 deleted_at: NULL
 id: 9754
device_name: /dev/vdb
  delete_on_termination: 0
  

[Yahoo-eng-team] [Bug 1515896] Re: Update port of admin state to False for neutron floating IP port does not take effect

2015-11-16 Thread Hua Zhang
** Changed in: neutron
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1515896

Title:
  Update port of admin state to False for neutron floating IP port does
  not take effect

Status in neutron:
  Invalid

Bug description:
  It is expected that when the admin-state of the port is down, it
  should seize all the operations specific to that port. But this is not
  the case with a floating IP port. When the port's admin-state is made
  False, floating IP continues to be operational.

  root@controller:~# neutron port-show bc3f8bf6-c4bf-451a-8b5a-0d0b7624b5ca
  
+---+--+
  | Field | Value   
 |
  
+---+--+
  | admin_state_up| False   
  |
  | allowed_address_pairs | 
 |
  | binding:host_id   | 
 |
  | binding:profile   | {}  
 |
  | binding:vif_details   | {}  
 |
  | binding:vif_type  | unbound 
 |
  | binding:vnic_type | normal  
 |
  | device_id | ca0d0355-9ebe-46c5-9a31-2e2253da2d40
 |
  | device_owner  | network:floatingip  
 |
  | extra_dhcp_opts   | 
 |
  | fixed_ips | {"subnet_id": 
"eb1339bd-b552-4207-8856-ccff1de04f47", "ip_address": "10.0.2.18"} |
  | id| bc3f8bf6-c4bf-451a-8b5a-0d0b7624b5ca
 |
  | mac_address   | fa:16:3e:c7:0a:98   
 |
  | name  | 
 |
  | network_id| dda8f089-25b0-4e13-886e-b0b1bc8f5801
 |
  | security_groups   | 
 |
  | status| DOWN
 |
  | tenant_id | 
 |
  
+---+--+
  root@controller:~# ping 10.0.2.18
  PING 10.0.2.18 (10.0.2.18) 56(84) bytes of data.
  64 bytes from 10.0.2.18: icmp_seq=1 ttl=63 time=2.46 ms
  64 bytes from 10.0.2.18: icmp_seq=2 ttl=63 time=23.4 ms
  ^C
  --- 10.0.2.18 ping statistics ---
  2 packets transmitted, 2 received, 0% packet loss, time 1001ms
  rtt min/avg/max/mdev = 2.466/12.949/23.433/10.484 ms

  Observation:

  1) neutron port-update FLOATING_IP_PORT --admin-state-up False
  2) neutron port-update PRIVATE_IP_PORT --admin-state-up False
  3) neutron port-update PRIVATE_IP_PORT --admin-state-up True
  4) Ping of Floating IP does not work now. 
  5) neutron port-update FLOATING_IP_PORT --admin-state-up True
  6) Ping of Floating IP should start working now.

  Also, this results in the failure of tempest test:
  
tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_update_instance_port_admin_state

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1515896/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1450294] [NEW] Enable password support for vnc session

2015-04-29 Thread Hua Zhang
Public bug reported:

qemu supports that password based authentication is used for client connections 
by adding password option for -vnc as below [1]. 
-vnc 0.0.0.0:1,password -k en-us 
qemu xml configuration file provides a VNC password in clear text. 
graphics type='vnc' port='-1' autoport='yes' listen='192.168.1.5' 
passwd='YOUR-PASSWORD-HERE' keymap='en-us'/ 

but openstack doesn't support to configure vpn password, see the following 
codes: 
if ((CONF.vnc_enabled and 
virt_type not in ('lxc', 'uml'))): 
graphics = vconfig.LibvirtConfigGuestGraphics() 
graphics.type = vnc 
graphics.keymap = CONF.vnc_keymap 
graphics.listen = CONF.vncserver_listen 
guest.add_device(graphics) 
add_video_driver = True 


[1], http://www.cyberciti.biz/faq/linux-kvm-vnc-for-guest-machine/

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1450294

Title:
  Enable password support for vnc session

Status in OpenStack Compute (Nova):
  New

Bug description:
  qemu supports that password based authentication is used for client 
connections by adding password option for -vnc as below [1]. 
  -vnc 0.0.0.0:1,password -k en-us 
  qemu xml configuration file provides a VNC password in clear text. 
  graphics type='vnc' port='-1' autoport='yes' listen='192.168.1.5' 
passwd='YOUR-PASSWORD-HERE' keymap='en-us'/ 

  but openstack doesn't support to configure vpn password, see the following 
codes: 
  if ((CONF.vnc_enabled and 
  virt_type not in ('lxc', 'uml'))): 
  graphics = vconfig.LibvirtConfigGuestGraphics() 
  graphics.type = vnc 
  graphics.keymap = CONF.vnc_keymap 
  graphics.listen = CONF.vncserver_listen 
  guest.add_device(graphics) 
  add_video_driver = True 

  
  [1], http://www.cyberciti.biz/faq/linux-kvm-vnc-for-guest-machine/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1450294/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1433223] [NEW] Add functional tests for ipsec strongswan vpnaas driver

2015-03-17 Thread Hua Zhang
Public bug reported:

Add functional tests for ipsec strongswan vpnaas driver

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1433223

Title:
  Add functional tests for ipsec strongswan vpnaas driver

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  Add functional tests for ipsec strongswan vpnaas driver

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1433223/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1433226] [NEW] Add some unit tests for ipsec strongswan vpnaas driver

2015-03-17 Thread Hua Zhang
Public bug reported:

Add some unit tests for ipsec strongswan vpnaas driver

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1433226

Title:
  Add some unit tests for ipsec strongswan vpnaas driver

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  Add some unit tests for ipsec strongswan vpnaas driver

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1433226/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1418656] Re: Sometimes vpnservice's status can't be updated

2015-03-11 Thread Hua Zhang
I can't find this issue recently, maybe is caused by my developement env
problem.

** Changed in: neutron
   Status: In Progress = Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1418656

Title:
  Sometimes vpnservice's status can't be updated

Status in OpenStack Neutron (virtual network service):
  Invalid

Bug description:
  2015-02-05 23:56:11.890 12178 ERROR neutron.openstack.common.loopingcall [-] 
in fixed duration looping call
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
Traceback (most recent call last):
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall   
File /bak/openstack/neutron/neutron/openstack/common/loopingcall.py, line 81, 
in _inner
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
self.f(*self.args, **self.kw)
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall   
File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py,
 line 674, in report_status
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
previous_status = self.get_process_status_cache(process)
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall   
File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py,
 line 628, in get_process_status_cache
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
'id': process.vpnservice['id'],
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
TypeError: 'NoneType' object has no attribute '__getitem__'
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1418656/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1430166] [NEW] cisco.l3.plugging_drivers has been moved out from neutron repo

2015-03-10 Thread Hua Zhang
Public bug reported:

The following commit in neutron move cisco.l3.plugging_drivers out of
neutron.

commit 41166d533383e3490ffe6c2b1b200053d90e0b83
Merge: 4663a15 b6ba733
Author: Jenkins jenk...@review.openstack.org
Date:   Mon Mar 9 17:52:19 2015 +

Merge Vendor decomposition to move CSR1000v support to the
networking-cisco repo


but neutron-vpnaas still refer it, so the unit test failed as below:

hua@hua-ThinkPad-T440p:/bak/openstack/neutron-vpnaas$ python setup.py testr 
--slowest --testr-args=
running testr
running=OS_STDOUT_CAPTURE=1 OS_STDERR_CAPTURE=1 OS_LOG_CAPTURE=1 
${PYTHON:-python} -m subunit.run discover -t ./ 
${OS_TEST_PATH:-./neutron_vpnaas/tests/unit} --list 
--- import errors ---
Failed to import test module: 
neutron_vpnaas.tests.unit.services.vpn.service_drivers.test_cisco_ipsec
Traceback (most recent call last):
  File /usr/local/lib/python2.7/dist-packages/unittest2/loader.py, line 445, 
in _find_test_path
module = self._get_module_from_name(name)
  File /usr/local/lib/python2.7/dist-packages/unittest2/loader.py, line 384, 
in _get_module_from_name
__import__(name)
  File 
neutron_vpnaas/tests/unit/services/vpn/service_drivers/test_cisco_ipsec.py, 
line 27, in module
from neutron_vpnaas.services.vpn.service_drivers \
  File neutron_vpnaas/services/vpn/service_drivers/cisco_ipsec.py, line 16, 
in module
from neutron.plugins.cisco.l3.plugging_drivers import (
ImportError: No module named l3.plugging_drivers
Non-zero exit code (2) from test listing.
error: testr failed (3)

** Affects: neutron
 Importance: Undecided
 Assignee: Hua Zhang (zhhuabj)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1430166

Title:
  cisco.l3.plugging_drivers has been moved out from neutron repo

Status in OpenStack Neutron (virtual network service):
  In Progress

Bug description:
  The following commit in neutron move cisco.l3.plugging_drivers out of
  neutron.

  commit 41166d533383e3490ffe6c2b1b200053d90e0b83
  Merge: 4663a15 b6ba733
  Author: Jenkins jenk...@review.openstack.org
  Date:   Mon Mar 9 17:52:19 2015 +

  Merge Vendor decomposition to move CSR1000v support to the
  networking-cisco repo


  but neutron-vpnaas still refer it, so the unit test failed as below:

  hua@hua-ThinkPad-T440p:/bak/openstack/neutron-vpnaas$ python setup.py testr 
--slowest --testr-args=
  running testr
  running=OS_STDOUT_CAPTURE=1 OS_STDERR_CAPTURE=1 OS_LOG_CAPTURE=1 
${PYTHON:-python} -m subunit.run discover -t ./ 
${OS_TEST_PATH:-./neutron_vpnaas/tests/unit} --list 
  --- import errors ---
  Failed to import test module: 
neutron_vpnaas.tests.unit.services.vpn.service_drivers.test_cisco_ipsec
  Traceback (most recent call last):
File /usr/local/lib/python2.7/dist-packages/unittest2/loader.py, line 
445, in _find_test_path
  module = self._get_module_from_name(name)
File /usr/local/lib/python2.7/dist-packages/unittest2/loader.py, line 
384, in _get_module_from_name
  __import__(name)
File 
neutron_vpnaas/tests/unit/services/vpn/service_drivers/test_cisco_ipsec.py, 
line 27, in module
  from neutron_vpnaas.services.vpn.service_drivers \
File neutron_vpnaas/services/vpn/service_drivers/cisco_ipsec.py, line 16, 
in module
  from neutron.plugins.cisco.l3.plugging_drivers import (
  ImportError: No module named l3.plugging_drivers
  Non-zero exit code (2) from test listing.
  error: testr failed (3)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1430166/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1430100] [NEW] vpnaas service doesn't work caused by a refactoring commit

2015-03-09 Thread Hua Zhang
Public bug reported:

The refactoring commit 56fd82 moves router_info and NAT rules staffs
from l3-agent into vpn device driver, which cause two problems:

1, The router is maintained in the driver, and not the VPN service. The
router instance should not be deleted.

2, NAT rules has been moved from l3-agent into vpn device driver, but
something in vpn device driver is still refering NAT rules related
methods in l3-agent.

** Affects: neutron
 Importance: Undecided
 Assignee: Hua Zhang (zhhuabj)
 Status: New

** Changed in: neutron
 Assignee: (unassigned) = Hua Zhang (zhhuabj)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1430100

Title:
  vpnaas service doesn't work caused by a refactoring commit

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  The refactoring commit 56fd82 moves router_info and NAT rules staffs
  from l3-agent into vpn device driver, which cause two problems:

  1, The router is maintained in the driver, and not the VPN service.
  The router instance should not be deleted.

  2, NAT rules has been moved from l3-agent into vpn device driver, but
  something in vpn device driver is still refering NAT rules related
  methods in l3-agent.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1430100/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1420139] [NEW] VPNPluginDbTestCase unit test failed with upstream submit I16b5e5b2

2015-02-09 Thread Hua Zhang
Public bug reported:

today I found the unit test case VPNPluginDbTestCase doesn't work as
below error log shows.

I debug the code and find the reason, that's because the upstream submit
I16b5e5b2 (
https://review.openstack.org/#/c/151375/7/neutron/services/provider_configuration.py
), it trys to read services_provider configrations items in
neutron-{service}.conf file.

on the other hand, VPNPluginDbTestCase still try to override
service_provider, so error 'Invalid: Driver
neutron_vpnaas.services.vpn.service_drivers.ipsec.IPsecVPNDriver is not
unique across providers' is throwed.

if not vpnaas_provider:
vpnaas_provider = (
constants.VPN +
':vpnaas:neutron_vpnaas.services.vpn.'
'service_drivers.ipsec.IPsecVPNDriver:default')

cfg.CONF.set_override('service_provider',
  [vpnaas_provider],
  'service_providers')


Traceback (most recent call last):
  File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/unit/services/vpn/test_vpnaas_driver_plugin.py,
 line 47, in setUp
vpnaas_plugin=VPN_DRIVER_CLASS)
  File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/unit/db/vpn/test_db_vpnaas.py,
 line 437, in setUp
service_plugins=service_plugins
  File /bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/base.py, line 53, 
in setUp
plugin, service_plugins, ext_mgr)
  File /bak/openstack/neutron/neutron/tests/unit/test_db_plugin.py, line 120, 
in setUp
self.api = router.APIRouter()
  File /bak/openstack/neutron/neutron/api/v2/router.py, line 74, in __init__
plugin = manager.NeutronManager.get_plugin()
  File /bak/openstack/neutron/neutron/manager.py, line 222, in get_plugin
return weakref.proxy(cls.get_instance().plugin)
  File /bak/openstack/neutron/neutron/manager.py, line 216, in get_instance
cls._create_instance()
  File /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py, 
line 431, in inner
return f(*args, **kwargs)
  File /bak/openstack/neutron/neutron/manager.py, line 202, in 
_create_instance
cls._instance = cls()
  File /bak/openstack/neutron/neutron/manager.py, line 128, in __init__
self._load_service_plugins()
  File /bak/openstack/neutron/neutron/manager.py, line 175, in 
_load_service_plugins
provider)
  File /bak/openstack/neutron/neutron/manager.py, line 143, in 
_get_plugin_instance
return plugin_class()
  File /bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/plugin.py, 
line 44, in __init__
constants.VPN, self)
  File /bak/openstack/neutron/neutron/services/service_base.py, line 64, in 
load_drivers
service_type_manager = sdb.ServiceTypeManager.get_instance()
  File /bak/openstack/neutron/neutron/db/servicetype_db.py, line 41, in 
get_instance
cls._instance = cls()
  File /bak/openstack/neutron/neutron/db/servicetype_db.py, line 45, in 
__init__
self._load_conf()
  File /bak/openstack/neutron/neutron/db/servicetype_db.py, line 49, in 
_load_conf
pconf.parse_service_provider_opt())
  File /bak/openstack/neutron/neutron/services/provider_configuration.py, 
line 139, in __init__
self.add_provider(prov)
  File /bak/openstack/neutron/neutron/services/provider_configuration.py, 
line 160, in add_provider
self._ensure_driver_unique(provider['driver'])
  File /bak/openstack/neutron/neutron/services/provider_configuration.py, 
line 147, in _ensure_driver_unique
raise n_exc.Invalid(msg)
Invalid: Driver 
neutron_vpnaas.services.vpn.service_drivers.ipsec.IPsecVPNDriver is not unique 
across providers

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1420139

Title:
  VPNPluginDbTestCase unit test failed with upstream submit I16b5e5b2

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  today I found the unit test case VPNPluginDbTestCase doesn't work as
  below error log shows.

  I debug the code and find the reason, that's because the upstream
  submit I16b5e5b2 (
  
https://review.openstack.org/#/c/151375/7/neutron/services/provider_configuration.py
  ), it trys to read services_provider configrations items in
  neutron-{service}.conf file.

  on the other hand, VPNPluginDbTestCase still try to override
  service_provider, so error 'Invalid: Driver
  neutron_vpnaas.services.vpn.service_drivers.ipsec.IPsecVPNDriver is
  not unique across providers' is throwed.

  if not vpnaas_provider:
  vpnaas_provider = (
  constants.VPN +
  ':vpnaas:neutron_vpnaas.services.vpn.'
  'service_drivers.ipsec.IPsecVPNDriver:default')

  cfg.CONF.set_override('service_provider',
[vpnaas_provider],
'service_providers')


  Traceback (most recent call 

[Yahoo-eng-team] [Bug 1418656] [NEW] Sometimes vpnservice's status can't be updated

2015-02-05 Thread Hua Zhang
Public bug reported:

2015-02-05 23:56:11.890 12178 ERROR neutron.openstack.common.loopingcall [-] in 
fixed duration looping call
2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
Traceback (most recent call last):
2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall   File 
/bak/openstack/neutron/neutron/openstack/common/loopingcall.py, line 81, in 
_inner
2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
self.f(*self.args, **self.kw)
2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall   File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py,
 line 674, in report_status
2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
previous_status = self.get_process_status_cache(process)
2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall   File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py,
 line 628, in get_process_status_cache
2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
'id': process.vpnservice['id'],
2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
TypeError: 'NoneType' object has no attribute '__getitem__'
2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1418656

Title:
  Sometimes vpnservice's status can't be updated

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  2015-02-05 23:56:11.890 12178 ERROR neutron.openstack.common.loopingcall [-] 
in fixed duration looping call
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
Traceback (most recent call last):
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall   
File /bak/openstack/neutron/neutron/openstack/common/loopingcall.py, line 81, 
in _inner
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
self.f(*self.args, **self.kw)
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall   
File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py,
 line 674, in report_status
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
previous_status = self.get_process_status_cache(process)
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall   
File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/services/vpn/device_drivers/ipsec.py,
 line 628, in get_process_status_cache
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
'id': process.vpnservice['id'],
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall 
TypeError: 'NoneType' object has no attribute '__getitem__'
  2015-02-05 23:56:11.890 12178 TRACE neutron.openstack.common.loopingcall

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1418656/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1418798] [NEW] upstread RouterInfo refactor cause vpnaas unit test failure

2015-02-05 Thread Hua Zhang
Public bug reported:

Unit tests for PS25 of strongswan driver
(https://review.openstack.org/#/c/144391/ ) failed, it's caused by
upstream RouteInfo refactor.


==
ERROR: test_actions_after_router_added 
(neutron_vpnaas.tests.unit.services.vpn.test_vpn_service.TestVPNServiceEventHandlers)
neutron_vpnaas.tests.unit.services.vpn.test_vpn_service.TestVPNServiceEventHandlers.test_actions_after_router_added
--
_StringException: Empty attachments:
  pythonlogging:''
  pythonlogging:'neutron.api.extensions'

Traceback (most recent call last):
  File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/unit/services/vpn/test_vpn_service.py,
 line 206, in test_actions_after_router_added
FAKE_ROUTER_ID, self.conf.root_helper, {})
TypeError: __init__() takes at least 6 arguments (4 given)

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1418798

Title:
  upstread RouterInfo refactor cause vpnaas unit test failure

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  Unit tests for PS25 of strongswan driver
  (https://review.openstack.org/#/c/144391/ ) failed, it's caused by
  upstream RouteInfo refactor.

  
  ==
  ERROR: test_actions_after_router_added 
(neutron_vpnaas.tests.unit.services.vpn.test_vpn_service.TestVPNServiceEventHandlers)
  
neutron_vpnaas.tests.unit.services.vpn.test_vpn_service.TestVPNServiceEventHandlers.test_actions_after_router_added
  --
  _StringException: Empty attachments:
pythonlogging:''
pythonlogging:'neutron.api.extensions'

  Traceback (most recent call last):
File 
/bak/openstack/neutron-vpnaas/neutron_vpnaas/tests/unit/services/vpn/test_vpn_service.py,
 line 206, in test_actions_after_router_added
  FAKE_ROUTER_ID, self.conf.root_helper, {})
  TypeError: __init__() takes at least 6 arguments (4 given)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1418798/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1414253] [NEW] separate openswan special staff from general vpnaas framework

2015-01-23 Thread Hua Zhang
Public bug reported:

initial vpnaas effort puts general vpn framework and openswan stuff into
a file (device_drivers.ipsec.py) which will cause other vpn driver
implementations import this file, thus they will contain a bunch of
openswan stuff.  so we had better refactor openswan out in to its own
files and give some symmetry to these files

** Affects: neutron
 Importance: Undecided
 Assignee: Hua Zhang (zhhuabj)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1414253

Title:
  separate openswan special staff from general vpnaas framework

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  initial vpnaas effort puts general vpn framework and openswan stuff
  into a file (device_drivers.ipsec.py) which will cause other vpn
  driver implementations import this file, thus they will contain a
  bunch of openswan stuff.  so we had better refactor openswan out in to
  its own files and give some symmetry to these files

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1414253/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp