[Yahoo-eng-team] [Bug 2020028] [NEW] evacuate an instance on non-shared storage succeeded and boot image is rebuilt

2023-05-17 Thread norman shen
Public bug reported: Description === evacuate an instance on non-shared storage succeeded and boot image is rebuilt Steps to reproduce == 1. Create a two compute nodes cluster without shared storage 2. boot a image backed virtual machine 3. shutdown down the compute

[Yahoo-eng-team] [Bug 1999126] [NEW] resize to the same host unexpectedly clears host_info cache

2022-12-07 Thread norman shen
Public bug reported: Description === We are using victoria nova and find that after a same-host-cold-migrate, subsequently cold migrate could break anti-affinity policy. Steps to reproduce == 1 provision a openstack cluster with 2 compute nodes 2 create a server group

[Yahoo-eng-team] [Bug 1996966] [NEW] get_machine_ips took too long to complete

2022-11-17 Thread norman shen
Public bug reported: Description === I found that get_machine_ips could took too long before returning IP addresses. There are around 160 instances with about 200 nics which results in around 1000 network adapters on the host. calling netifaces.ifaddresses approximately took around 0.2

[Yahoo-eng-team] [Bug 1995229] [NEW] [Opinion] Update instance availability_zone when reset host AZ

2022-10-30 Thread norman shen
Public bug reported: Description === Instance.availability_zone is set in nova.conductor while scheduling. But host's availability_zone could be modified when host is added to an aggregate, but instance.availability_zone will not be changed, instead 'availabity_zone' will be cached in

[Yahoo-eng-team] [Bug 1995028] [NEW] list os-service causing reconnects to memcached all the time

2022-10-27 Thread norman shen
Public bug reported: Description === we are running a victoria openstack cluster (python3). and I observe that everytime when an openstack compute service list executed, nova-api will create a new connection to memcache. Actually there are several reasons to cause this behavior 1. when

[Yahoo-eng-team] [Bug 1995029] [NEW] list os-service causing reconnects to memcached all the time

2022-10-27 Thread norman shen
Public bug reported: Description === we are running a victoria openstack cluster (python3). and I observe that everytime when an openstack compute service list executed, nova-api will create a new connection to memcache. Actually there are several reasons to cause this behavior 1. when

[Yahoo-eng-team] [Bug 1991380] [NEW] centos 7.6 cannot access 169.254.169.254

2022-09-30 Thread norman shen
Public bug reported: Hello, I am testing centos 7.6 using an Victoria Openstack. In the virtual machine, I am finding the route looks like below # ip r default via 172.31.0.1 dev eth0 192.168.0.0/16 dev eth1 proto kernel scope link src 192.168.0.9 169.254.0.0/16 dev eth0 scope link metric

[Yahoo-eng-team] [Bug 1988281] [NEW] neutron dhcp agent state not consistent with real status

2022-08-31 Thread norman shen
Public bug reported: We are observing that neutron-dhcp-agent's state is deviating from "real state", by saying real state, I mean all hosted dnsmasq are running and configured. For example, agent A is hosting 1,000 networks, if I reboot agent A then all dnsmasq processes are gone, and dhcp

[Yahoo-eng-team] [Bug 1982902] [NEW] umount /run/cloud-init/tmp/tmpl5n7csdd failed

2022-07-26 Thread norman shen
Public bug reported: Hello, I am using cloud-init version: /usr/bin/cloud-init 20.4.1-0ubuntu1~18.04.1, ubuntu version is root@ubuntu:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 18.04.5 LTS Release:18.04 Codename: bionic I found

[Yahoo-eng-team] [Bug 1978827] [NEW] rebuild instance continues to flush old mpath on failure

2022-06-15 Thread norman shen
Public bug reported: Description === When rebuilding instance failed due to a potentially problematic cinder API, then when trying to rebuild again, nova will try to disconnect volume again although the path has already clearer. It is generally OK for rbd backend, but it could cause

[Yahoo-eng-team] [Bug 1973656] [NEW] meaning of option "router_auto_schedule" is ambiguous

2022-05-16 Thread norman shen
Public bug reported: I found meaning of option "router_auto_schedule" is hard to follow. A quick code review finds it is only used at (tests excluded) ```python def get_router_ids(self, context, host): """Returns IDs of routers scheduled to l3 agent on This will

[Yahoo-eng-team] [Bug 1973576] [NEW] remove eager subquery load for DistributedPortBinding

2022-05-16 Thread norman shen
Public bug reported: We observe excessive DB calls to load DistributedPortBindings, We have enabled DVR and have some huge virtual routers with around 60 router interfaces scheduled on around 200 compute nodes. We saw something like ```console 2022-05-12 05:59:06.406 50 ERROR

[Yahoo-eng-team] [Bug 1968837] [NEW] too many l3 dvr agents got notifications after a server got deleted

2022-04-13 Thread norman shen
Public bug reported: We are using Rocky 13.0.6 neutron which seems removing router namespace if retry limit got hit. After some investigations, it seems that delete a server which already associates with a floating ip address seems causes a broadcast notification to all related routers. In

[Yahoo-eng-team] [Bug 1964587] Re: default video driver

2022-03-11 Thread norman shen
** Changed in: nova Status: Incomplete => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1964587 Title: default video driver Status in OpenStack Compute

[Yahoo-eng-team] [Bug 1964587] [NEW] default video driver

2022-03-11 Thread norman shen
Public bug reported: Hello, I saw on amd64 platform nova defaults to use cirrus as video driver and windows virtual machine got a small resolution. video driver virtio could allow a larger resolution. And looks like the driver type cannot be set by user. My question is why using cirrus as

[Yahoo-eng-team] [Bug 1954619] Re: device_name is too narrow

2022-01-06 Thread norman shen
Thank you, I saw a patch has been merged upstream for new releases. and this should be fixed. ** Changed in: horizon Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).

[Yahoo-eng-team] [Bug 1954619] [NEW] device_name is too narrow

2021-12-12 Thread norman shen
Public bug reported: Horizon will auto fill a device_name "vda" by default. But vda only makes senses to virtio-blk block device. For scsi device, sda makes more sense. Nova will take care of device name if not specified, so why not make this field null by default and let nova chose a better

[Yahoo-eng-team] [Bug 1953718] [NEW] nova compute failed to update placement if mdev max available is 0

2021-12-08 Thread norman shen
Public bug reported: Description === nova compute will failed to update vgpu mdev placement data if mdev type changed while there are some previously created mdev devices with different types. For nvidia, under such circumstances max available instances will be 0. Steps to reproduce

[Yahoo-eng-team] [Bug 1946546] [NEW] nova-compute endlessly waits for snapshot completes

2021-10-09 Thread norman shen
Public bug reported: Description === When trying to create a server image, nova compute will endlessly waits for snapshot to be created. this is quite dangerous because server's file system has already been frozen and IO operations has been disabled. ** Affects: nova Importance:

[Yahoo-eng-team] [Bug 1940641] [NEW] nova compute with allocated vgpu device failed to start after host reboot

2021-08-20 Thread norman shen
Public bug reported: Description = nova compute service failed to start after reboot, if there are vgpu virtual machines beforehand. Error log 2021-08-20 09:37:30.331 284159 DEBUG nova.virt.libvirt.volume.mount [None req-6ad4e06c-980e-4759-8b36-6c696e596dab - - - - -]

[Yahoo-eng-team] [Bug 1940012] [NEW] allow attaching pci devices as different functions

2021-08-15 Thread norman shen
Public bug reported: Description === We have a use case to attach FPGA device to virtual machine. This FPGA card gets two functions, we can attach both of them using alias. After both of them are passing through to the virtual machine, we found that they are not appearing as different

[Yahoo-eng-team] [Bug 1934203] [NEW] cannot multi attach enabled volume after swap volume

2021-06-30 Thread norman shen
Public bug reported: Description === detach a multi-attach enabled volume failed after swapping volume. Steps to reproduce == 1. Create two volume type with multi attach enabled (A, B) 2. Create a new volume using type A 3. attach it a server 4. Retype this volume to

[Yahoo-eng-team] [Bug 1931209] [NEW] Circular reference detected during cold migration

2021-06-08 Thread norman shen
Public bug reported: Description === cold migration failed when server is specified with a numa topology Steps to reproduce == create server from a flavor specified with numa topology parameters and then do a cold migrate or resize Expected success Actual

[Yahoo-eng-team] [Bug 1929480] [NEW] cloud-init for ubuntu 18.04

2021-05-24 Thread norman shen
Public bug reported: ubuntu 18.04 uses netplan to manage networks, netplan could either use NetworkManager or systemd-networkd internally, but it does not use networking. cloud-init.service explicitly depends on networking.service to complete which might be problematic because network service

[Yahoo-eng-team] [Bug 1927747] [NEW] neutron ovs agent apply openvswitch security group slow

2021-05-07 Thread norman shen
Public bug reported: I am using neutron-ovs-agent using openvswitch firewall, there are around 40 ports with same security group on the same compute node. it seems update security group for each port will consume near 3 seconds which sums up to around 100 seconds in total. This significantly

[Yahoo-eng-team] [Bug 1926049] [NEW] check_changed_vlans failed

2021-04-24 Thread norman shen
Public bug reported: 2021-04-25 03:19:37.303 1 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-413ff802-0c14-47ad-8221-14d7e972bad3 - - - - -] Error while processing VIF ports: TypeError: %d format: a number is required, not list 2021-04-25 03:19:37.303 1 ERROR

[Yahoo-eng-team] [Bug 1925144] [NEW] timeout in rados connect does not take effect

2021-04-20 Thread norman shen
Public bug reported: Description === from https://github.com/ceph/ceph/blob/0be78da368f2dc1c891e3caafac38f7aa96d3c49/src/pybind/rados/rados.pyx#L660, it looks like function connect in object rados will ignore timeout input and therefore makes current configuration does not take effect.

[Yahoo-eng-team] [Bug 1925143] [NEW] timeout in rados connect does not take effect

2021-04-20 Thread norman shen
Public bug reported: Description === from https://github.com/ceph/ceph/blob/0be78da368f2dc1c891e3caafac38f7aa96d3c49/src/pybind/rados/rados.pyx#L660, it looks like function connect in object rados will ignore timeout input and therefore makes current configuration does not take effect.

[Yahoo-eng-team] [Bug 1923560] [NEW] retrieving security group is slow for server detail

2021-04-13 Thread norman shen
Public bug reported: Description === querying large number of vms through server detail is slow, and a lot of time is wasted on calling neutron api to obtain security group info. Expected result === obtaining security group info should not consumes half of total query time

[Yahoo-eng-team] [Bug 1922222] [NEW] allow using tap device on netdev enabled host

2021-04-01 Thread norman shen
Public bug reported: hello, after reading the code, it seems nova-compute can only use vhostuser mode if netdev is enabled on ovs bridge. an internal use case requires us to allow using tap device as well as vhostuser device on the same host. Do this sound like a valid use case? ** Affects:

[Yahoo-eng-team] [Bug 1921804] [NEW] leftover bdm when rabbitmq unstable

2021-03-29 Thread norman shen
Public bug reported: Description === When rabbitMQ unstable, there might be a chance that method https://github.com/openstack/nova/blob/7a1222a8654684262a8e589d91e67f2b9a9da336/nova/compute/api.py#L4741 will timeout but bdm is successfully created. Under such cases, volume will be shown

[Yahoo-eng-team] [Bug 1914522] [NEW] migrate from iptables firewall to ovs firewall

2021-02-03 Thread norman shen
Public bug reported: Sorry this is actually a bug report but discussing for better clarification in document. Currently, we are running iptables firewall in production and saw performance degrade thus we plan to upgrade to ovs firewall in place. By reading the doc I found upgrading process is

[Yahoo-eng-team] [Bug 1910946] [NEW] ovs is dead but ovs agent is up

2021-01-10 Thread norman shen
Public bug reported: we are using openstack-neutron rocky with openvswitch versioned 2.10.0 We are using ubuntu 18.04 which shipped with a libc6 bug, reported here https://github.com/openvswitch/ovs-issues/issues/175. My question is that when this bug happens ovs agent will not working and

[Yahoo-eng-team] [Bug 1910947] [NEW] ovs is dead but ovs agent is up

2021-01-10 Thread norman shen
Public bug reported: we are using openstack-neutron rocky with openvswitch versioned 2.10.0 We are using ubuntu 18.04 which shipped with a libc6 bug, reported here https://github.com/openvswitch/ovs-issues/issues/175. My question is that when this bug happens ovs agent will not working and

[Yahoo-eng-team] [Bug 1909160] Re: high cpu usage when listing security groups

2020-12-27 Thread norman shen
Ok, i'll try out Victoria and compare the result. thank you for reply. ** Changed in: neutron Status: New => Opinion -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1909160 Title:

[Yahoo-eng-team] [Bug 1909160] [NEW] high cpu usage when listing security groups

2020-12-23 Thread norman shen
Public bug reported: I saw listing security group is slow and causing cpu spikes unexpectedly, I run a Rock neutron-server with api worker set to 1, when executing command like ```console root@mgt01:~# time curl -H "x-auth-token: $token"

[Yahoo-eng-team] [Bug 1908957] [NEW] iptable rules collision deployed with k8s iptables kube-proxy enabled

2020-12-21 Thread norman shen
Public bug reported: Maybe it's a k8s kube-proxy related bug, but maybe it is easier to solve on neutron's side... In k8s either NodePort or ExternalIP will generate iptable rules which will effect vm traffic when hybrid iptable plugin enabled. The problem is: Chain PREROUTING (policy

[Yahoo-eng-team] [Bug 1902806] [NEW] only 7 iscsi disk could be attached

2020-11-03 Thread norman shen
Public bug reported: for libvirt version 4.0.0, scsi disk with an unit equal to 7 will not be able to attach due to libvirt's own limitation. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering

[Yahoo-eng-team] [Bug 1901124] [NEW] memcached cache not get expired

2020-10-22 Thread norman shen
Public bug reported: We are using openstack rocky and we I check the memcached, I found root@compute:~# telnet compute 11211 Trying 192.168.0.17... Connected to compute. Escape character is '^]'. stats cachedump 15 1 ITEM c9067b617ec1e6e7f78318c19e7ce2c7f4f9dcd6 [2034 b; 0 s] expiration time

[Yahoo-eng-team] [Bug 1897236] [NEW] create port in a shared network failed for user with member role

2020-09-25 Thread norman shen
Public bug reported: Create a port on a shared network using a user with member role on another project fails. ** Affects: horizon Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to

[Yahoo-eng-team] [Bug 1896574] Re: how to deal with hypervisor name changing

2020-09-24 Thread norman shen
I think previous title is misleading. Actually hostname itself is still A. what changes is fqdn name seen by hostname --fqdn. ** Changed in: nova Status: Invalid => New ** Summary changed: - how to deal with hypervisor name changing + how to deal with hypervisor host fqdn name changing

[Yahoo-eng-team] [Bug 1896574] [NEW] how to deal with hypervisor name changing

2020-09-22 Thread norman shen
Public bug reported: nova fails to correctly account for resources after hypervisor name changes. For example, if previously the hypervisor name is A, and some later it switches to A.B, then all of the instances which belong to A will not be included in the resource computation for A.B although

[Yahoo-eng-team] [Bug 1542032] Re: IP reassembly issue on the Linux bridges in Openstack

2020-09-17 Thread norman shen
** Changed in: neutron Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1542032 Title: IP reassembly issue on the Linux bridges in Openstack Status in

[Yahoo-eng-team] [Bug 1895063] [NEW] Allow rescue volume backed instance

2020-09-09 Thread norman shen
Public bug reported: Should we offer support for volume backed instance? ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova).

[Yahoo-eng-team] [Bug 1893015] [NEW] ping with large package size fails

2020-08-26 Thread norman shen
Public bug reported: We are using neutron rocky, with security driver set to iptables_hybrid, the cluster is deployed on top of a kubernetes cluster. And all the networks are set to mtu 1500 The problem I am facing right now is that ping across compute nodes fails with a packet size larger than

[Yahoo-eng-team] [Bug 1892582] [NEW] image creation does not fail immediately if volume not created

2020-08-22 Thread norman shen
Public bug reported: Cinder backend Image creation failed after long waiting when the volume is still creating root@mgt01:~# openstack volume list --all | grep fb8aee1b-e19e-4336-8fa2-864f1664b834 | b1e021bd-974d-4974-961b-47ab7f9b0a16 | image-fb8aee1b-e19e-4336-8fa2-864f1664b834

[Yahoo-eng-team] [Bug 1887108] [NEW] wrong l2pop flows on vlan network

2020-07-09 Thread norman shen
Public bug reported: I saw l2pop rules for a vlan network which causes problems for mac learning. There is no dvr routed associated with it. It is a pure vlan netowrk. root@compute02:/tmp# ovs-ofctl dump-flows br-tun table=21 cookie=0xcd381baa7a6d5b5c, duration=1703630.319s, table=21,

[Yahoo-eng-team] [Bug 1886355] [NEW] glance upload image to rbd backend stuck

2020-07-05 Thread norman shen
Public bug reported: Uploading image to rbd backend stuck at saving state, and rbd du command shows image size is not increasing, as well as ceph osd pool stats shows that there is no client io. a tcpdump shows the program is actually trying receive from client with a rather small window size

[Yahoo-eng-team] [Bug 1884695] [NEW] allow less strict cpu flag comparison

2020-06-22 Thread norman shen
Public bug reported: Description === Nova uses strict cpu flag comparison during live migration, this introduces some problems when migrating with some cpu flags which do not affect actually migration. For example, `monitoring` flag could be neglected safely. So I think it might be

[Yahoo-eng-team] [Bug 1884532] [NEW] inconsistent data in ipamallocations

2020-06-22 Thread norman shen
Public bug reported: Sometimes I saw database is not consistent for some reasons, for example, as shown below MariaDB [neutron]> select * from ipamsubnets where neutron_subnet_id='9a8fd2b0-743c-4500-8978-9e5bf9b38347' -> ;

[Yahoo-eng-team] [Bug 1881455] [NEW] migrate server reporting list index of out bound

2020-05-30 Thread norman shen
Public bug reported: Description When resize to local host enabled and do a cold migration sometimes fails with 1. migrating to same host failed 2. and then a list index out of bound error Steps to reproduce === deploy two compute nodes and make workload

[Yahoo-eng-team] [Bug 1880455] [NEW] interrupted vlan connection after live migration

2020-05-24 Thread norman shen
Public bug reported: After https://github.com/openstack/neutron/commit/efa8dd08957b5b6b1a05f0ed412ff00462a9f216 this patch, I saw unexpected vlan interruption after live migration. The steps to reproduce the problem is simple, first create two vm01, vm02 on compute01 and compute02 separately,

[Yahoo-eng-team] [Bug 1870866] [NEW] inconsistent connection info data after live migration

2020-04-04 Thread norman shen
Public bug reported: Description === after live migration, block device mapping's connection stays at "attaching", which is confusing piece of information. The root cause seems caused by different code path between live migration and attach volume. Steps to reproduce ==

[Yahoo-eng-team] [Bug 1869808] [NEW] reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

2020-03-30 Thread norman shen
Public bug reported: We are using Openstack Neutron 13.0.6 and it is deployed using OpenStack-helm. I test ping servers in the same vlan while rebooting neutron-ovs-agent. The result shows root@mgt01:~# openstack server list

[Yahoo-eng-team] [Bug 1866288] [NEW] tox pep8 fails on ubuntu 18.04.3

2020-03-05 Thread norman shen
Public bug reported: pep8 checking fails for rocky branch on ubuntu 18.04.3 root@mgt02:~/src/nova# tox -epep8 -vvv   removing /root/src/nova/.tox/log using tox.ini: /root/src/nova/tox.ini using tox-3.1.0 from /usr/local/lib/python2.7/dist-packages/tox/__init__.pyc skipping sdist step pep8 start:

[Yahoo-eng-team] [Bug 1865120] [NEW] arm64 vm boot failed when set num_pcie_ports to 28

2020-02-28 Thread norman shen
Public bug reported: We are testing OpenStack on Phytium,FT2000PLUS root@compute01:~# lscpu Architecture: aarch64 Byte Order:Little Endian CPU(s):64 On-line CPU(s) list: 0-63 Thread(s) per core:1 Core(s) per socket:4 Socket(s): 16 NUMA

[Yahoo-eng-team] [Bug 1856962] [NEW] openid method failed when federation_group_ids is empty list

2019-12-19 Thread norman shen
quot;{0}" } }, { "projects":[ { "name":"{1}", "roles":[ { "name":"member" }

[Yahoo-eng-team] [Bug 1856312] [NEW] RuntimeError during calling log_opts_values

2019-12-13 Thread norman shen
Public bug reported: During starting up nova-compute service, we are hit by the following error message + sed -i s/HOST_IP// /tmp/logging-nova-compute.conf + exec nova-compute --config-file /etc/nova/nova.conf --config-file /tmp/pod-shared/nova-console.conf --config-file

[Yahoo-eng-team] [Bug 1840579] [NEW] excessive number of dvrs where vm got a fixed ip on floating network

2019-08-18 Thread norman shen
with fixed ip on floating network Then call `routers_updated_on_host` manually, then this dvr will be created on the host where vm resides on, but actually it should be there. ** Affects: neutron Importance: Undecided Assignee: norman shen (jshen28) Status: In Progress

[Yahoo-eng-team] [Bug 1836680] [NEW] attach volume succeeded but device not found on guest machine

2019-07-15 Thread norman shen
Public bug reported: sorry post bug at wrong place. ** Affects: neutron Importance: Undecided Status: Invalid ** Changed in: neutron Status: New => Invalid ** Description changed: - we are using OpenStack Queens: - nova-common/xenial,now 2:17.0.9-6~u16.01+mcp189 all

[Yahoo-eng-team] [Bug 1836681] [NEW] attach volume succeeded but device not found on guest machine

2019-07-15 Thread norman shen
Public bug reported: we are using OpenStack Queens: nova-common/xenial,now 2:17.0.9-6~u16.01+mcp189 all [installed] nova-compute/xenial,now 2:17.0.9-6~u16.01+mcp189 all [installed,automatic] nova-compute-kvm/xenial,now 2:17.0.9-6~u16.01+mcp189 all [installed] guest vm uses windows 2012

[Yahoo-eng-team] [Bug 1830456] [NEW] dvr router slow response during port update

2019-05-24 Thread norman shen
Public bug reported: We are having a distributed router which used by hundreds of virtual machines scattered across around 150 compute nodes. When nova sends port update request to neutron, it will generally taking nearly 4 min to complete. Neutron version is openstack Queens 12.0.5. I found