[Yahoo-eng-team] [Bug 1869389] Re: [OVN] SRIOV (external) ports flapping
Reviewed: https://review.opendev.org/715445 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ea999564a5b80dcf13c0c43f107165f0754210b7 Submitter: Zuul Branch:master commit ea999564a5b80dcf13c0c43f107165f0754210b7 Author: Lucas Alvares Gomes Date: Fri Mar 27 15:23:49 2020 + [OVN] HA Chassis Group: Ignore UPDATES when external_ids hasn't changed The "old" parameter passed to the handle_ha_chassis_group_changes() method is a delta object and sometimes it does not contain the "external_ids" column (because it hasn't changed). The absence of that column was misleading that method into believe that the "old" object was no longer a gateway chassis and that triggered some changes in the HA group. Changing the HA group resulted in the SRIOV (external in OVN) ports to start flapping between the gateway chassis. This patch is adding a check to verify that the "external_ids" column has changed before acting on it, otherwise just ignore the update and return. Closes-Bug: #1869389 Change-Id: I3f7de633e5546dc78c3546b9c34ea81d0a0524d3 Signed-off-by: Lucas Alvares Gomes ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1869389 Title: [OVN] SRIOV (external) ports flapping Status in neutron: Fix Released Bug description: The "old" parameter passed to the handle_ha_chassis_group_changes() method is a delta object and sometimes it does not contain the "external_ids" column (because it hasn't changed). The absence of the "external_ids" column lead to the method into believe that the "old" object was no longer a gateway chassis (because since the external_ids column wasn't present, the code from is_gateway_chassis() returned False) and that triggered some changes in the default HA group which the external ports lives. The combination of the agents health check (that triggers updates to the chassis) plus this problem with the absence of the "external_ids" column in the old object for certain updates is resulting in the SRIOV (external in OVN) ports to flap between the gateway chassis. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1869389/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1869430] [NEW] cloud-init persists in running state on Kali in AWS
Public bug reported: Hello, We're trying to customize published Kali AMIs using packer & cloud-init. The entire process works with Ubuntu, CentOS, and Amazon Linux 2 targets, but seemingly breaks with Kali. We've tried it with both the 2020.01 and 2019.03. We're also experiencing a long timeout for ec2 data source: root@kali:~# cloud-init status --long status: running time: Fri, 27 Mar 2020 20:06:54 + detail: DataSourceEc2Local root@kali:~# cloud-init analyze blame -- Boot Record 01 -- 51.20500s (init-local/search-Ec2Local) 00.91700s (init-network/config-users-groups) 00.67200s (init-network/config-growpart) 00.27400s (init-network/config-resizefs) 00.24800s (init-network/config-ssh) 00.00600s (init-network/consume-user-data) 00.00300s (init-network/check-cache) Attached is the log tarball produced by cloud-init. We'd appreciate any hints as to what may be happening. It's worth noting that these targets are starting in a VPC without direct connection to the outside world, but there's a squid proxy available for web traffic. We have relevant parts set up to use that proxy. Thanks! ** Affects: cloud-init Importance: Undecided Status: New ** Attachment added: "cloud-init.tar.gz" https://bugs.launchpad.net/bugs/1869430/+attachment/5342340/+files/cloud-init.tar.gz -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1869430 Title: cloud-init persists in running state on Kali in AWS Status in cloud-init: New Bug description: Hello, We're trying to customize published Kali AMIs using packer & cloud- init. The entire process works with Ubuntu, CentOS, and Amazon Linux 2 targets, but seemingly breaks with Kali. We've tried it with both the 2020.01 and 2019.03. We're also experiencing a long timeout for ec2 data source: root@kali:~# cloud-init status --long status: running time: Fri, 27 Mar 2020 20:06:54 + detail: DataSourceEc2Local root@kali:~# cloud-init analyze blame -- Boot Record 01 -- 51.20500s (init-local/search-Ec2Local) 00.91700s (init-network/config-users-groups) 00.67200s (init-network/config-growpart) 00.27400s (init-network/config-resizefs) 00.24800s (init-network/config-ssh) 00.00600s (init-network/consume-user-data) 00.00300s (init-network/check-cache) Attached is the log tarball produced by cloud-init. We'd appreciate any hints as to what may be happening. It's worth noting that these targets are starting in a VPC without direct connection to the outside world, but there's a squid proxy available for web traffic. We have relevant parts set up to use that proxy. Thanks! To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1869430/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1869396] [NEW] os-ips API policy is allowed for everyone even policy defaults is admin_or_owner
Public bug reported: os-ips API policy is default to admin_or_owner[1] but API is allowed for everyone. We can see the test trying with other project context can access the API - https://review.opendev.org/#/c/715477/ This is because API does not pass the server project_id in policy target - https://github.com/openstack/nova/blob/96f6622316993fb41f4c5f37852d4c879c9716a5/nova/api/openstack/compute/ips.py#L41 and if no target is passed then, policy.py add the default targets which is nothing but context.project_id (allow for everyone try to access) - https://github.com/openstack/nova/blob/c16315165ce307c605cf4b608b2df3aa06f46982/nova/policy.py#L191 [1] - https://github.com/openstack/nova/blob/eaf08c0b7b8250408e5d10c6471f2e3155cc0edb/nova/policies/ips.py#L27 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1869396 Title: os-ips API policy is allowed for everyone even policy defaults is admin_or_owner Status in OpenStack Compute (nova): New Bug description: os-ips API policy is default to admin_or_owner[1] but API is allowed for everyone. We can see the test trying with other project context can access the API - https://review.opendev.org/#/c/715477/ This is because API does not pass the server project_id in policy target - https://github.com/openstack/nova/blob/96f6622316993fb41f4c5f37852d4c879c9716a5/nova/api/openstack/compute/ips.py#L41 and if no target is passed then, policy.py add the default targets which is nothing but context.project_id (allow for everyone try to access) - https://github.com/openstack/nova/blob/c16315165ce307c605cf4b608b2df3aa06f46982/nova/policy.py#L191 [1] - https://github.com/openstack/nova/blob/eaf08c0b7b8250408e5d10c6471f2e3155cc0edb/nova/policies/ips.py#L27 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1869396/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1225922] Re: Support static network configuration even on already configured devices
** Also affects: linux-gcp (Ubuntu) Importance: Undecided Status: New ** Changed in: linux-gcp (Ubuntu) Status: New => Confirmed ** Changed in: linux-gcp (Ubuntu) Assignee: (unassigned) => Roufique Hossain (roufique) ** Changed in: cloud-init Assignee: (unassigned) => Roufique Hossain (roufique) ** Changed in: cloud-init Status: Confirmed => Fix Committed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1225922 Title: Support static network configuration even on already configured devices Status in cloud-init: Fix Committed Status in linux-gcp package in Ubuntu: Confirmed Bug description: Some datasources (e.g. OpenNebula) support full static network configuration. It's done in local execution phase by pushing new interfaces configuration to *distro.apply_network*. This new configuration is written on disk and activated by calling ifup on particular devices. Unfortunatelly it can't be guaranteed that full local phase is executed before any network configuration is done by system. Mentioned steps are OK only for devices not present in former network configuration or not configurated to start on boot (e.g. eth0 configured on boot to take address from DHCP, the new static configuration is not applied on this device, it's already up and ifup just passes). It would be good to first put interfaces down before writing new configuration. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1225922/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1869389] [NEW] [OVN] SRIOV (external) ports flapping
Public bug reported: The "old" parameter passed to the handle_ha_chassis_group_changes() method is a delta object and sometimes it does not contain the "external_ids" column (because it hasn't changed). The absence of the "external_ids" column lead to the method into believe that the "old" object was no longer a gateway chassis (because since the external_ids column wasn't present, the code from is_gateway_chassis() returned False) and that triggered some changes in the default HA group which the external ports lives. The combination of the agents health check (that triggers updates to the chassis) plus this problem with the absence of the "external_ids" column in the old object for certain updates is resulting in the SRIOV (external in OVN) ports to flap between the gateway chassis. ** Affects: neutron Importance: High Assignee: Lucas Alvares Gomes (lucasagomes) Status: In Progress ** Tags: ovn ** Tags added: ovn ** Changed in: neutron Status: New => Confirmed ** Changed in: neutron Importance: Undecided => High ** Changed in: neutron Assignee: (unassigned) => Lucas Alvares Gomes (lucasagomes) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1869389 Title: [OVN] SRIOV (external) ports flapping Status in neutron: In Progress Bug description: The "old" parameter passed to the handle_ha_chassis_group_changes() method is a delta object and sometimes it does not contain the "external_ids" column (because it hasn't changed). The absence of the "external_ids" column lead to the method into believe that the "old" object was no longer a gateway chassis (because since the external_ids column wasn't present, the code from is_gateway_chassis() returned False) and that triggered some changes in the default HA group which the external ports lives. The combination of the agents health check (that triggers updates to the chassis) plus this problem with the absence of the "external_ids" column in the old object for certain updates is resulting in the SRIOV (external in OVN) ports to flap between the gateway chassis. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1869389/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1864675] Re: DHCP agent should prioritize new ports when sending RPC messages to server
Reviewed: https://review.opendev.org/709824 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=113dfac6083c2bda36f5186e123e4e5c82fa8097 Submitter: Zuul Branch:master commit 113dfac6083c2bda36f5186e123e4e5c82fa8097 Author: Brian Haley Date: Tue Feb 25 15:01:47 2020 -0500 Prioritize port create and update ready messages The DHCP agent prioritizes RPC messages based on the priority field send from neutron-server, but then groups them all in the same dhcp_ready_ports set when sending them back to the server to clear the provisioning block(s). Priority should be given to new and changed ports, since those are most likely to be associated with new instances which can fail to boot if they are not handled quickly when the agent is very busy, for example, right after it was restarted. Change-Id: Ib5074abadd7189bb4bdd5e46c677f1bfb071221e Closes-bug: #1864675 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1864675 Title: DHCP agent should prioritize new ports when sending RPC messages to server Status in neutron: Fix Released Bug description: When a port is provisioned in the dhcp-agent, for example via a port_create, it will just be added to the dhcp_ready_ports set and sent to neutron-server in _dhcp_ready_ports_loop() by popping elements off the list. So although it was prioritized when it was received, it is not prioritized when sent to the server to clear the provisioning block. It seems like these ports should be sent first, then others behind it if there is still room in the RPC message. This could just be done with a second set() perhaps, unless we want to make it more complicated by using the priority sent from the server to place ports in different queues. This should decrease the time it takes to clear the port provisioning block when an agent is restarted and gets a port_create message, as it would help even if it was sent with PRIORITY_PORT_CREATE_HIGH to a single agent, since the one it chose could still be in the middle of a full sync. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1864675/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1869354] [NEW] FIP isn't properly created for octavia loadbalancers
Public bug reported: After upgrading from an earlier version of stable/stein to a newer version (released around 2020-02-20) , FIP's on newly created octavia loadbalancers stopped working. So what we can see is that everything works except the snat namespaces never gets the ip assigned via rootwrap nor the iptables rules. It never happens and there are no errors around this. Any old loadbalancers that exist work, but can't create new ones. The loadbalancers themselves work great, using shared networks or tenant networks they work great too. Asked about it in the octavia irc but they pointed me to neutron. Instances can be assigned FIP's normally without issues. I could use some help/pointers on where to troubleshoot this. Debug is enabled on neutron but there are simply no errors, everything just seems to be fine with whats happening. I'm using DVR HA with OVS. I'm not even sure what logs to attach since nothing is complaining or spitting out any errors. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1869354 Title: FIP isn't properly created for octavia loadbalancers Status in neutron: New Bug description: After upgrading from an earlier version of stable/stein to a newer version (released around 2020-02-20) , FIP's on newly created octavia loadbalancers stopped working. So what we can see is that everything works except the snat namespaces never gets the ip assigned via rootwrap nor the iptables rules. It never happens and there are no errors around this. Any old loadbalancers that exist work, but can't create new ones. The loadbalancers themselves work great, using shared networks or tenant networks they work great too. Asked about it in the octavia irc but they pointed me to neutron. Instances can be assigned FIP's normally without issues. I could use some help/pointers on where to troubleshoot this. Debug is enabled on neutron but there are simply no errors, everything just seems to be fine with whats happening. I'm using DVR HA with OVS. I'm not even sure what logs to attach since nothing is complaining or spitting out any errors. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1869354/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1869342] [NEW] OVNMechanismDriver _ovn_client is a read-only property
Public bug reported: OVNMechanismDriver "_ovn_client" is a read-only property and can't be assigned in "ovn_client" property: https://github.com/openstack/neutron/blob/805fb5c970c8b761ce7f4877052ffef74b524e41/neutron/cmd/ovn/neutron_ovn_db_sync_util.py#L58 ** Affects: neutron Importance: Undecided Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1869342 Title: OVNMechanismDriver _ovn_client is a read-only property Status in neutron: In Progress Bug description: OVNMechanismDriver "_ovn_client" is a read-only property and can't be assigned in "ovn_client" property: https://github.com/openstack/neutron/blob/805fb5c970c8b761ce7f4877052ffef74b524e41/neutron/cmd/ovn/neutron_ovn_db_sync_util.py#L58 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1869342/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1866961] Re: ImportError: cannot import name 'Feature'
Ussuri got it fixed with PyScss 1.3.6 release (and later update to 1.3.7). Train, Stein are in progress. ** Changed in: kolla/ussuri Status: Triaged => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1866961 Title: ImportError: cannot import name 'Feature' Status in devstack: Triaged Status in OpenStack Dashboard (Horizon): Confirmed Status in kolla: Fix Released Status in kolla train series: Triaged Status in kolla ussuri series: Fix Released Bug description: One of Horizon's requirements is pyscss package. Which had last release over 4 years ago... Two days ago setuptools v46 got released. One of changes was drop of Features feature. Now Kolla builds fail: INFO:kolla.common.utils.horizon:Collecting pyScss===1.3.4 INFO:kolla.common.utils.horizon: Downloading http://mirror.ord.rax.opendev.org:8080/pypifiles/packages/1d/4a/221ae7561c8f51c4f28b2b172366ccd0820b14bb947350df82428dfce381/pyScss-1.3.4.tar.gz (120 kB) INFO:kolla.common.utils.horizon:[91mERROR: Command errored out with exit status 1: INFO:kolla.common.utils.horizon: command: /var/lib/kolla/venv/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-rr0db3qs/pyScss/setup.py'"'"'; __file__='"'"'/tmp/pip-install-rr0db3qs/pyScss/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-rr0db3qs/pyScss/pip-egg-info INFO:kolla.common.utils.horizon: cwd: /tmp/pip-install-rr0db3qs/pyScss/ INFO:kolla.common.utils.horizon:Complete output (5 lines): INFO:kolla.common.utils.horizon:Traceback (most recent call last): INFO:kolla.common.utils.horizon: File "", line 1, in INFO:kolla.common.utils.horizon: File "/tmp/pip-install-rr0db3qs/pyScss/setup.py", line 9, in INFO:kolla.common.utils.horizon:from setuptools import setup, Extension, Feature INFO:kolla.common.utils.horizon:ImportError: cannot import name 'Feature' Devstack also has the same problem. Are there any plans to fix it? pyscss project got issue: https://github.com/Kronuz/pyScss/issues/385 What are plans of Horizon team? To manage notifications about this bug go to: https://bugs.launchpad.net/devstack/+bug/1866961/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1869306] [NEW] Users module errors for users of same SSH key type with existing user
Public bug reported: I'm starting an instance (tried both centos and ubuntu) in AWS with user_data similar to the following: users: - name: bob sudo: ALL=(ALL) NOPASSWD:ALL groups: users lock_passwd: true ssh_authorized_keys: - ssh-rsa some-ssh-pubkey-x - name: alice sudo: ALL=(ALL) NOPASSWD:ALL groups: users lock_passwd: true ssh_authorized_keys: - ssh-rsa some-ssh-pubkey-x - name: mallory sudo: ALL=(ALL) NOPASSWD:ALL groups: users lock_passwd: true ssh_authorized_keys: - ssh-rsa some-ssh-pubkey-x - name: trent sudo: ALL=(ALL) NOPASSWD:ALL groups: users lock_passwd: true ssh_authorized_keys: - ssh-ed25519 some-ssh-pubkey-x Two things are special in this case. Mallory made herself a user account on the box before baking the original image, and Trent has an ECC key (the rest are using RSA). Upon running this in AWS, only Trent gets created. The only discernible error I have seen is: File "/usr/lib/python2.7/site-packages/cloudinit/ssh_util.py", line 208, in us ers_ssh_info pw_ent = pwd.getpwnam(username) KeyError: 'getpwnam(): name not found: alice' Trent can log in and see that his key has been created, but literally every other user who is using an RSA SSH key hasn't had their user created. Compounding it, Mallory doesn't have a login but still retains her home directory. The fix for this entails making a user "mallory2" and leaving mallory alone. When this happens, all users get created (though mallory's original account is missing other than /home). I've also tried making a mallory user with a custom homedir of /home/mallorytoo, but the same error happens. ** Affects: cloud-init Importance: Undecided Status: New ** Attachment added: "cloud-init.log" https://bugs.launchpad.net/bugs/1869306/+attachment/5342009/+files/cloud-init.log -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1869306 Title: Users module errors for users of same SSH key type with existing user Status in cloud-init: New Bug description: I'm starting an instance (tried both centos and ubuntu) in AWS with user_data similar to the following: users: - name: bob sudo: ALL=(ALL) NOPASSWD:ALL groups: users lock_passwd: true ssh_authorized_keys: - ssh-rsa some-ssh-pubkey-x - name: alice sudo: ALL=(ALL) NOPASSWD:ALL groups: users lock_passwd: true ssh_authorized_keys: - ssh-rsa some-ssh-pubkey-x - name: mallory sudo: ALL=(ALL) NOPASSWD:ALL groups: users lock_passwd: true ssh_authorized_keys: - ssh-rsa some-ssh-pubkey-x - name: trent sudo: ALL=(ALL) NOPASSWD:ALL groups: users lock_passwd: true ssh_authorized_keys: - ssh-ed25519 some-ssh-pubkey-x Two things are special in this case. Mallory made herself a user account on the box before baking the original image, and Trent has an ECC key (the rest are using RSA). Upon running this in AWS, only Trent gets created. The only discernible error I have seen is: File "/usr/lib/python2.7/site-packages/cloudinit/ssh_util.py", line 208, in us ers_ssh_info pw_ent = pwd.getpwnam(username) KeyError: 'getpwnam(): name not found: alice' Trent can log in and see that his key has been created, but literally every other user who is using an RSA SSH key hasn't had their user created. Compounding it, Mallory doesn't have a login but still retains her home directory. The fix for this entails making a user "mallory2" and leaving mallory alone. When this happens, all users get created (though mallory's original account is missing other than /home). I've also tried making a mallory user with a custom homedir of /home/mallorytoo, but the same error happens. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1869306/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp