[Yahoo-eng-team] [Bug 1504641] [NEW] Listing volumes respects osapi_max_limit but does not provide a link to the next element

2015-10-09 Thread Artom Lifshitz
Public bug reported: When GETting os-volumes, the returned list of volumes respects the osapi_max_limit configuration parameter but does not provide a link to the next element in the list. For example, with two volumes configured and osapi_max_limit set to 1, GETting volumes results in the

[Yahoo-eng-team] [Bug 1520633] [NEW] Exception in compute log when booting an instance with max_concurrent_builds=0

2015-11-27 Thread Artom Lifshitz
use someone debugging an unrelated issue. ** Affects: nova Importance: Undecided Assignee: Artom Lifshitz (notartom) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova).

[Yahoo-eng-team] [Bug 1585601] [NEW] Deleting a live-migrated instance causes its fixed IP to remain reserved

2016-05-25 Thread Artom Lifshitz
a new instance with the same fixed IP. This has been reported against Icehouse and has been reproduced in master, and is therefore presumably present in all versions in-between. ** Affects: nova Importance: Undecided Assignee: Artom Lifshitz (notartom) Status: New ** Changed

[Yahoo-eng-team] [Bug 1599201] [NEW] Device tagging does not work with block_device_mapping with source=blank, destination=local

2016-07-05 Thread Artom Lifshitz
Public bug reported: When booting instance with a block_device_mapping with source=blank,destination=local,tag=foo no entry is created in the block_device_mapping table and the volume is not tagged. While the former is existing normal behaviour, the latter is definitely a bug. ** Affects: nova

[Yahoo-eng-team] [Bug 1624052] [NEW] Evacuation fails with VirtualInterfaceCreateException

2016-09-15 Thread Artom Lifshitz
Public bug reported: Description: With Neutron, evacuating an instance results in a VirtualInterfaceCreateException and the evacuation fails. Steps to reproduce: 1. Boot a VM with a Neutron NIC. 2. Cause the underlying host to be down, for example by stopping the compute service. 3. Evacuate

[Yahoo-eng-team] [Bug 1624052] Re: Evacuation fails with VirtualInterfaceCreateException

2016-10-08 Thread Artom Lifshitz
*** This bug is a duplicate of bug 1535918 *** https://bugs.launchpad.net/bugs/1535918 I spent some time adding LOG.debug statements all over the place while trying to reproduce this bug, and then read bug 1535918. They're the same bug. To summarise: a. When spawning a VM, the libvirt

[Yahoo-eng-team] [Bug 1630161] Re: nova image-list is deprecated, but it should work even now

2016-11-28 Thread Artom Lifshitz
I think this is actually a valid bug in openstackclient. Specifically, the following use case *should* work: $ export OS_COMPUTE_API_VERSION=2.latest $ openstack server create --flavor m1.tiny --image cirros-0.3.4-x86_64 --key-name stack --security-group default --nic net-

[Yahoo-eng-team] [Bug 1691195] [NEW] Can't live-migrate after "round-trip" volume-upate

2017-05-16 Thread Artom Lifshitz
Public bug reported: Description === If an instance has had an attached volume volume-updated twice in a "round-trip" - ie, volume-update $vol1 $vol2, then volume-update $vol2 $vol1 - it cannot be live-migrated. Steps to reproduce == 1. Create two iscsi volumes. #

[Yahoo-eng-team] [Bug 1692893] [NEW] 'nova usage' returns 500 when deleted row in instance_extra is archived

2017-05-23 Thread Artom Lifshitz
Public bug reported: Description === In Newton and earlier, 'nova usage' returns 500 as soon as the first row in instance_extra that's marked deleted is archived by 'nova-manage db archive_deleted_rows'. Steps to reproduce == 1. Create a few VMs: $ for i in `seq 1

[Yahoo-eng-team] [Bug 1743458] [NEW] Device tagging does not work for PF passthrough

2018-01-15 Thread Artom Lifshitz
Public bug reported: Description === When booting an instance with a PF passed through, device tags applied to the PF do not appear in the metadata. Steps to reproduce == Create a PF neutron port: $ neutron port-create sriov --binding:vnic-type direct-physical --name

[Yahoo-eng-team] [Bug 1744325] [NEW] If a rebuild is refused by the scheduler, the instance's imageref is not rolled back

2018-01-19 Thread Artom Lifshitz
Public bug reported: Description === Since CVE-2017-16239, we now go through the scheduler for rebuilds. If the scheduler refuses a rebuild with a new image because of filter constraints (for example IsolatedHostsFilter), the instance's imageref is set to the new image and never rolled

[Yahoo-eng-team] [Bug 1746032] [NEW] By rebuilding twice with the same "forbidden" image one can circumvent scheduler rebuild restrictions

2018-01-29 Thread Artom Lifshitz
Public bug reported: Description === Since CVE-2017-16239, we call to the scheduler when doing a rebuild with a new image. If the scheduler refuses a rebuild because a filter forbids the new image on the instance's host (for example, IsolatedHostsFilter), at first there was no indication

[Yahoo-eng-team] [Bug 1832028] [NEW] revert resize: vif-plugged external event sent too soon if Neutron is using OVS hybrid plug

2019-06-07 Thread Artom Lifshitz
/show_bug.cgi?id=1678681 [2] https://review.opendev.org/#/c/660782/ [3] https://review.opendev.org/#/c/653498/ ** Affects: nova Importance: Undecided Assignee: Artom Lifshitz (notartom) Status: In Progress -- You received this bug notification because you are a member of Yahoo

[Yahoo-eng-team] [Bug 1831538] [NEW] IDE config drive CDROM doesn't work with q35 machine type

2019-06-03 Thread Artom Lifshitz
Public bug reported: Description === Setting [libvirt]/hw_machine_type=x86_64=q35 in nova-cpu.conf results in any instances booted with a config drive to fail to spawn. Steps to reproduce == 1. Set [libvirt]/hw_machine_type=x86_64=q35 in nova-cpu.conf 2. Boot an

[Yahoo-eng-team] [Bug 1826519] Re: Ephemeral disk volume was not mounted after resizing from non-ephemeral flavor

2019-06-17 Thread Artom Lifshitz
I don't think this is Nova's responsibility - as long as the new block device shows up with `lsblk`, it's up to the guest OS/tooling to mount it (or not). I've changed the component to cloud-init in case there's something actionable for them in this bug. Thanks! ** Project changed: nova =>

[Yahoo-eng-team] [Bug 1836389] [NEW] Device role tagging doesn't work for SRIOV PF

2019-07-12 Thread Artom Lifshitz
Public bug reported: Description === Setting a device role tag on a PF interface has no effect on metadata - IOW, the PF and its tag doesn't appear in the device metadata at all. Steps to reproduce == 1. Create a PF port: openstack port show

[Yahoo-eng-team] [Bug 1836595] [NEW] test_server_connectivity_cold_migration_revert failing

2019-07-15 Thread Artom Lifshitz
Public bug reported: test_server_connectivity_cold_migration_revert has started failing now that we've un-skipped it [1]. It appears as though Nova is doing everything right in terms of external events and VIF plugging (see analysis on PS6 of [2]), so the thinking is that it's something with the

[Yahoo-eng-team] [Bug 1836754] [NEW] Conflict when deleting allocations for an instance that hasn't finished building

2019-07-16 Thread Artom Lifshitz
Public bug reported: Description === When deleting an instance that hasn't finished building, we'll sometimes get a 409 from placement as such: Failed to delete allocations for consumer 6494d4d3-013e-478f- 9ac1-37ca7a67b776. Error: {"errors": [{"status": 409, "title": "Conflict",

[Yahoo-eng-team] [Bug 1842616] Re: NUMA vcpus not correctly allocated against numa regions

2019-09-05 Thread Artom Lifshitz
Couple of things here: You're not using hw:cpu_policy=dedicated, which means multiple instances vCPUS can be pinned to the same host CPUs. This explains why they're all landing on the same NUMA node. You're not setting hw:mem_page_size, which means Nova does not do per- NUMA-cell accounting of

[Yahoo-eng-team] [Bug 1836945] [NEW] Deleting a CPU-pinned instance after changing vcpu_pin_set causes it to go to ERROR

2019-07-17 Thread Artom Lifshitz
Public bug reported: Description === If you boot an instance with pinned CPUs (for example by using the 'dedicated' CPU policy), change the vcpu_pin_set option on its compute host, then attempt to delete the instance, it will ERROR out instead of deleting successfully. Subsequent delete

[Yahoo-eng-team] [Bug 1837075] [NEW] Evacuation takes too long when destination host has a large number of NICs

2019-07-18 Thread Artom Lifshitz
Public bug reported: Description === Evacuation takes a long time if the destination host a large number of network interfaces. Steps to reproduce == 1. Have a host down, or force it down. 2. Evacuate instances to a host with a large number of network interfaces.

[Yahoo-eng-team] [Bug 1850694] [NEW] shelve doesn't handle UnexpectedTaskStateError

2019-10-30 Thread Artom Lifshitz
sgi Additional Info === This is obviously minor, as the difference between a 500 and a 409 is purely semantic, but we're being told this is an SLA thing. An SLA defines 5xx as being "down", while 4xx is user error and therefore "up". [1] https://bugzilla.redhat.

[Yahoo-eng-team] [Bug 1289064] Re: live migration of instance should claim resources on target compute node

2019-12-02 Thread Artom Lifshitz
Based on comments #31 and #30, we consider this as 'Fix released' in Train. ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova).

[Yahoo-eng-team] [Bug 1840869] Re: VNC Server Unauthenticated Access

2020-01-14 Thread Artom Lifshitz
You mean the VNC server(s) that are created on the compute hosts for their instances? Those are not supposed to be publically accessible. Access to those is done via the consoles API [1] which provides an authentication token to the client. The client the connects to the publically-facing console

[Yahoo-eng-team] [Bug 1859403] Re: The instance needs to supports dongle devices

2020-01-14 Thread Artom Lifshitz
Hello, thanks for the feature request. Unfortunately, it is very unlikely that someone will work on it (unless you yourself chose to do so), so I'm going to close this bug in order to set realistic expectations. If you want to work on this yourself, please file a blueprint. ** Changed in: nova

[Yahoo-eng-team] [Bug 1869804] [NEW] Live migration with Train-style cpu_shared_set not updating CPU pinning

2020-03-30 Thread Artom Lifshitz
Public bug reported: In pre-Train times, when live migrating an instance without a CPU policy (and therefore without a NUMA topology) to a dest with a vcpu_pin_set, or to a dest with a vcpu_pin_set different from the source, the instance's CPU pinning information was not updated. Now that CPU

[Yahoo-eng-team] [Bug 1791224] Re: Live migration failed when the instance is booted with volume and config drive

2020-04-23 Thread Artom Lifshitz
Closing, as per comment #3. ** Also affects: nova/pike Importance: Undecided Status: New ** Changed in: nova/pike Status: New => Fix Released ** Changed in: nova Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo!

[Yahoo-eng-team] [Bug 1800204] Re: n-cpu.service consuming 100% of CPU indeterminately

2020-04-23 Thread Artom Lifshitz
I'm going to say the same thing as bug 1801733 - this is super nifty and interesting, but realistically is not a concern and will most likely never get addressed. ** Changed in: nova Status: New => Won't Fix -- You received this bug notification because you are a member of Yahoo!

[Yahoo-eng-team] [Bug 1777475] Re: Undercloud vm in state error after update of the undercloud.

2020-04-23 Thread Artom Lifshitz
Looks like the patch in comment #12 has addressed this from Nova's POV, so I'm setting this as Fix Release. ** Changed in: nova Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute

[Yahoo-eng-team] [Bug 1801733] Re: nova-compute consuming 100% of cpu after rebuilding with invalid data parameters

2020-04-23 Thread Artom Lifshitz
This is definitely nifty and impressive, but realistically will never be addressed. We recommend folks deploy with TLS on all internal services, including the message queue used for RPC. TLS makes this kind of in- flight RPC hacking even less of a concern in practice - because let's face it, if

[Yahoo-eng-team] [Bug 1879787] [NEW] post_live_migration does not handle Neutron errors

2020-05-20 Thread Artom Lifshitz
] https://bugzilla.redhat.com/show_bug.cgi?id=1818829 ** Affects: nova Importance: Medium Assignee: Artom Lifshitz (notartom) Status: In Progress ** Tags: live-migration -- You received this bug notification because you are a member of Yahoo! Engineering Team, which

[Yahoo-eng-team] [Bug 1895322] [NEW] Nova is not actually disabling greendns

2020-09-11 Thread Artom Lifshitz
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1860818 ** Affects: nova Importance: Undecided Assignee: Artom Lifshitz (notartom) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Comput

[Yahoo-eng-team] [Bug 1887746] Re: Install and configure a compute node for Red Hat Enterprise Linux and CentOS in nova

2020-07-16 Thread Artom Lifshitz
Hi, thanks for the bug report! This looks like an issue with RDO packaging than Nova itself [1]. RDO is meant to be be installed either via TripleO [2], or you can use their CloudSIG GA repo directly [3]. If this doesn't work for you, you can always file a bug with RDO directly [4]. [1]

[Yahoo-eng-team] [Bug 1887751] Re: Disabling an unreachable Node raises StackTrace

2020-07-16 Thread Artom Lifshitz
> when/how Nova decides that a compute node is really down there. The compute services update their status in the database every `report_interval` seconds [1]. If a service doesn't report after `service_down_time`, it's considered down. So if a compute service goes down, it'll be at most

[Yahoo-eng-team] [Bug 1930866] Re: locked instance can be rendered broken by deleting port

2021-06-22 Thread Artom Lifshitz
It's a valid bug, but as ports are Neutron's responsibility, I'm not sure what can be done in this case. Neutron is free to delete a port without checking anything about the instance it's attached to. Perhaps this can be changed to the Neutron component, to see if folks there have an idea? **

[Yahoo-eng-team] [Bug 1931476] [NEW] pep8 job failing because of mypy

2021-06-09 Thread Artom Lifshitz
Public bug reported: When running `tox -e pep8` (either in CI or locally), there are mypy failures on nova/crypto.py In CI, it looks like (from https://zuul.opendev.org/t/openstack/build/1e32efc260b84a94b6a9cfeba3c80976) pep8 start: run-test pep8 run-test: commands[0] | bash tools/mypywrap.sh

[Yahoo-eng-team] [Bug 1941005] [NEW] instance.request_spec not updated upon resize

2021-08-24 Thread Artom Lifshitz
Public bug reported: Description === We don't update instance.request_spec when we resize to a new flavor. If the flavor's extra specs change, this is not reflected in instance.request_spec. This goes generally unnoticed when *adding* stuff to the instance - like PCI devices, for

[Yahoo-eng-team] [Bug 1952003] [NEW] Revert resize (on different host) + ovs network backend with iptables security group firewall driver (aka hybrid plug) is broken

2021-11-23 Thread Artom Lifshitz
Public bug reported: $subject First noticed in internal Red Hat CI of OSP 17 (based on stable/wallaby), reproduced in upstream DNM patch [2]. tl;dr is - Nova waits for a "bind-time" external event from Neutron when it updates the port binding back to the original host during the resize revert,

[Yahoo-eng-team] [Bug 1969980] [NEW] Live migration rollback fails if no Neutorn multiple port bindings extension

2022-04-22 Thread Artom Lifshitz
Public bug reported: This is really a continuation of bug 1888395. Steps to reproduce == Have Neutron without multiple port bindings extension. Boot an instance with network interfaces. Live migrate, and fail the live migration somehow (easy to do in func tests) Expected result

[Yahoo-eng-team] [Bug 1976545] [NEW] Nova deletes port when detaching auto-created interface

2022-06-01 Thread Artom Lifshitz
Public bug reported: Description === When detaching an interface that was auto-created (for example, with networks='auto' at server boot time), Nova deletes the port. Steps to reproduce == 1. Boot a server with an auto-created Neutron port. 2. Detach the port using the

[Yahoo-eng-team] [Bug 1978489] [NEW] libvirt / cgroups v2: cannot boot instance with more than 16 CPUs

2022-06-13 Thread Artom Lifshitz
Public bug reported: Description === Using the libvirt driver and a host OS that uses cgroups v2 (RHEL 9, Ubuntu Jammy), an instance with more than 16 CPUs cannot be booted. Steps to reproduce == 1. Boot an instance with 10 (or more) CPUs on RHEL 9 or Ubuntu Jammy using

[Yahoo-eng-team] [Bug 1931707] Re: "NeutronAdminCredentialConfigurationInvalid: Networking client is experiencing an unauthorized exception" error while instantiating instance in Train RDO

2022-05-09 Thread Artom Lifshitz
Hello, thanks for the bug report! I think given how inconsistently reproducible the issue was, we can blame this on an environmental issue as opposed to a Nova bug. I'm not sure there's anything further to be done at this time. ** Changed in: nova Status: New => Invalid -- You received

[Yahoo-eng-team] [Bug 1955411] Re: Ping loss when live migration

2022-05-09 Thread Artom Lifshitz
Hi Yusuf, Yes, this is expected. The exact quantity of ping loss will depend on the network backend (OVS in your case), how busy/loaded the VM is, the available network bandwidth for libvirt to copy the VM memory, as well as whether autoconverge and/or post-copy is in use. The following is an

[Yahoo-eng-team] [Bug 1987199] Re: Openstack cluster cannot create instances when 1 of 3 rabbitmq cluster node down

2022-08-23 Thread Artom Lifshitz
Hello, thanks for the report. This falls outside the scope of a Nova bug. It's either a support request, or a deployment question. You can try asking on the openstack- discuss mailing list [1]. [1] https://lists.openstack.org/cgi-bin/mailman/listinfo/openstack- discuss ** Changed in: nova

[Yahoo-eng-team] [Bug 1899487] Re: cloud-init hard codes MTU configuration at initial deploy time

2022-08-23 Thread Artom Lifshitz
While Nova indeed exposes the MTU in our metadata, our source of truth for that information is Neutron, via the `mtu` field on the Neutron network. [As an aside, we've had a long standing issue wherein Neutron allows the MTU to be mutable, but there's no real support for changing the MTU within

[Yahoo-eng-team] [Bug 1986764] Re: when there are vms with numa config and vms without, OOM may be occurs

2022-08-23 Thread Artom Lifshitz
Hello, thanks for the bug report! IIRC the host oom killer runs per NUMA node, so in this case this sounds like expected behaviour: node0 is out of memory (32 + 64 is bigger than 64). ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a

[Yahoo-eng-team] [Bug 1995153] [NEW] `socket` PCI NUMA policy doesn't work if another instance is booted first on the same host

2022-10-28 Thread Artom Lifshitz
Public bug reported: Disclaimer: I haven't reproduced this in a functional test, but based on the traceback that I gathered from a real environment as well as the fact that the proposed fix actually fixes this, I think my theory is correct Description === `socket` PCI NUMA policy doesn't

[Yahoo-eng-team] [Bug 2008341] Re: Lock, migrate, and unshelve server actions don't enforce request body schema for certain microversions

2023-03-13 Thread Artom Lifshitz
Well, it was still a _mistake_, but it's been out in the wild long enough that we can't retroactively fix it without breaking someone's scripts, so we have to leave it as is, unfortunately :( ** Changed in: nova Status: In Progress => Won't Fix -- You received this bug notification

[Yahoo-eng-team] [Bug 2008341] [NEW] Lock, migrate, and unshelve server actions don't enforce request body schema for certain microversions

2023-02-23 Thread Artom Lifshitz
Public bug reported: Description === Basically $summary. For lock, migrate, and unshelve, we have decorators for validation schema that _start_ at a certain microversion (exact microversion varies), meaning anything below that is not checked. A client could send a request that is only

[Yahoo-eng-team] [Bug 2017829] Re: Live-resize feature

2023-05-02 Thread Artom Lifshitz
This has been attempted may times in the past, without success (see references below). At this point, I'd rather set honest expectations that this will never get done, and close this bug. References: Red Hat internal BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1095997 Initial upstream

[Yahoo-eng-team] [Bug 2006689] Re: Evacuation will lead to double instances in some situation

2023-02-13 Thread Artom Lifshitz
Heya, I think there's some confusion around expectations for evacuations. Evacuations _must_ be done with the source compute fenced, and brought back online by a human in a controlled manner to ensure evacuated instances are destroyed properly. Any monitoring software that initiates evacuations

[Yahoo-eng-team] [Bug 1995714] Re: cold migration shouldn't spin up the vm ins a resize step

2023-02-13 Thread Artom Lifshitz
Cold migration assumes that the source compute host is up (and I think has checks in the code for that). A cold migration will fail if the source compute host is actually down and not disabled. The correct thing to do in that case is an evacuation. ** Changed in: nova Status: New =>

[Yahoo-eng-team] [Bug 1999803] Re: Libvirt fails to start VM with virtio related "unsupported configuration"

2023-02-13 Thread Artom Lifshitz
Sounds like you fixed your host to support virtio, and then your policy file syntax error. Is it OK if I move this to Invalid since this is not a Nova bug? ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering

[Yahoo-eng-team] [Bug 2022093] [NEW] hw_disk_bus='sata' is replaced to scsi when it is attached to additional disk.

2023-06-01 Thread Artom Lifshitz
Public bug reported: Description === hw_disk_bus='sata' is replaced to scsi when it is attached to addtional disk. Steps to reproduce == 1. set hw_disk_bus=sata in a image $openstack image set --property hw_disk_bus=sata $openstack image set --property hw_disk_bus=sata

[Yahoo-eng-team] [Bug 2045746] Re: openstack server add volume -device can not assign an internal device name for volume

2023-12-19 Thread Artom Lifshitz
I'm afraid this is expected behaviour, and we call it out in our API reference [1]: device (Optional) body string Name of the device such as, /dev/vdb. Omit or set this parameter to null for auto-assignment, if supported. If you specify this parameter, the device

[Yahoo-eng-team] [Bug 2033209] Re: changing openstack_domain does not change in nova DB

2023-12-19 Thread Artom Lifshitz
If I understand correctly, it looks like the hostnames of your compute hosts has changed because of a deployment error. I understand that this is not your fault, but renaming compute host names is essentially forbidden, as it causes the kind of breakage that you're reporting in this bug. We have

[Yahoo-eng-team] [Bug 2056613] [NEW] libvirt CPU power management does not support live migration

2024-03-08 Thread Artom Lifshitz
Public bug reported: Description === libvirt CPU power management does not support live migration Steps to reproduce == 1. Turn on libvirt CPU power management 2. Boot an instance with hw:cpu_policy=dedicated 3. Live migrate the instance Expected result ===

[Yahoo-eng-team] [Bug 2056612] [NEW] libvirt CPU power management does not handle `isolate` emulator thread policy

2024-03-08 Thread Artom Lifshitz
Public bug reported: Description === libvirt CPU power management does not handle `isolate` emulator thread policy. Steps to reproduce == 1. Turn on libvirt CPU power management 2. Boot an instance with hw:cpu_policy=dedicated and hw:emulator_threads_policy=isolate