[Yahoo-eng-team] [Bug 1913625] Re: Glance will leak staging data
** Changed in: glance Status: Invalid => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1913625 Title: Glance will leak staging data Status in Glance: Confirmed Bug description: In various situations, glance will leak (potentially very large) temporary files in the staging store. One example is doing a web-download import, where glance initially downloads the image to its staging store. If the worker doing that activity crashes, loses power, etc, the user may delete the image and try again on another worker. When the crashed worker resumes, the staging data will remain but nothing will ever clean it up. Another example would be a misconfigured glance that uses local staging directories, but glance-direct is used, where the user stages data, and then deletes the image from another worker. Even in a situation where shared staging is properly configured, a failure to access the staging location during the delete call will result in the image being deleted, but the staging file not being purged. IMHO, glance workers should clean their staging directories at startup, purging any data that is attributable to a previous image having been deleted. Another option is to add a store location for each staged image, and make sure the scrubber can clean those things from the staging directory periodically (this requires also running the scrubber on each node, which may not be common practice currently). To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1913625/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1916540] [NEW] Release 21.1
Public bug reported: This bug tracks cloud-init's upstream release of 21.1. == Release Notes == Cloud-init release 21.1 is now available The 21.1 release: * spanned about 3 months * had 24 contributors from 25 domains * fixed 10 Launchpad issues Highlights: - New datasource for UpCloud - Introduced support for reading Openstack dynamic vendor0-data - Add support for VMWare's raw data feature - Add support for Azure VMs without ephemeral resource disks == Changelog == - Azure: Support for VMs without ephemeral resource disks. (#800) [Johnson Shi] (LP: #1901011) - cc_keys_to_console: add option to disable key emission (#811) [Michael Hudson-Doyle] (LP: #1915460) - integration_tests: introduce lxd_use_exec mark (#802) - azure: case-insensitive UUID to avoid new IID during kernel upgrade (#798) (LP: #1835584) - stale.yml: don't ask submitters to reopen PRs (#816) - integration_tests: fix use of SSH agent within tox (#815) - integration_tests: add UPGRADE CloudInitSource (#812) - integration_tests: use unique MAC addresses for tests (#813) - Update .gitignore (#814) - Port apt cloud_tests to integration tests (#808) - integration_tests: fix test_gh626 on LXD VMs (#809) - Fix attempting to decode binary data in test_seed_random_data test (#806) - Remove wait argument from tests with session_cloud calls (#805) - Datasource for UpCloud (#743) [Antti Myyrä] - test_gh668: fix failure on LXD VMs (#801) - openstack: read the dynamic metadata group vendor_data2.json (#777) [Andrew Bogott] (LP: #1841104) - includedir in suoders can be prefixed by "arroba" (#783) [Jordi Massaguer Pla] - [VMware] change default max wait time to 15s (#774) [xiaofengw-vmware] - Revert integration test associated with reverted #586 (#784) - Add jordimassaguerpla as contributor (#787) [Jordi Massaguer Pla] - Add Rick Harding to CLA signers (#792) [Rick Harding] - HACKING.rst: add clarifying note to LP CLA process section (#789) - Stop linting cloud_tests (#791) - cloud-tests: update cryptography requirement (#790) [Joshua Powers] - Remove 'remove-raise-on-failure' calls from integration_tests (#788) - Use more cloud defaults in integration tests (#757) - Adding self to cla signers (#776) [Andrew Bogott] - doc: avoid two warnings (#781) [Dan Kenigsberg] - Use proper spelling for Red Hat (#778) [Dan Kenigsberg] - Add antonyc to .github-cla-signers (#747) [Anton Chaporgin] - integration_tests: log image serial if available (#772) - Revert "ssh_util: handle non-default AuthorizedKeysFile config (#586)" (#775) - Release 20.4.1 (LP: #1911680) - Revert "ssh_util: handle non-default AuthorizedKeysFile config (#586)" - [VMware] Support cloudinit raw data feature (#691) [xiaofengw-vmware] - net: Fix static routes to host in eni renderer (#668) [Pavel Abalikhin] - .travis.yml: don't run cloud_tests in CI (#756) - test_upgrade: add some missing commas (#769) - cc_seed_random: update documentation and fix integration test (#771) (LP: #1911227) - Fix test gh-632 test to only run on NoCloud (#770) (LP: #1911230) - archlinux: fix package upgrade command handling (#768) [Bao Trinh] - integration_tests: add integration test for LP: #1910835 (#761) - Fix regression with handling of IMDS ssh keys (#760) [Thomas Stringer] - integration_tests: log cloud-init version in SUT (#758) - Add ajmyyra as contributor (#742) [Antti Myyrä] - net_convert: add some missing help text (#755) - Missing IPV6_AUTOCONF=no to render sysconfig dhcp6 stateful on RHEL (#753) [Eduardo Otubo] - doc: document missing IPv6 subnet types (#744) [Antti Myyrä] - Add example configuration for datasource `AliYun` (#751) [Xiaoyu Zhong] - integration_tests: add SSH key selection settings (#754) - fix a typo in man page cloud-init.1 (#752) [Amy Chen] - network-config-format-v2.rst: add Netplan Passthrough section (#750) - stale: re-enable post holidays (#749) - integration_tests: port ca_certs tests from cloud_tests (#732) - Azure: Add telemetry for poll IMDS (#741) [Johnson Shi] - doc: move testing section from HACKING to its own doc (#739) - No longer allow integration test failures on travis (#738) - stale: fix error in definition (#740) - integration_tests: set log-cli-level to INFO by default (#737) - PULL_REQUEST_TEMPLATE.md: use backticks around commit message (#736) - stale: disable check for holiday break (#735) - integration_tests: log the path we collect logs into (#733) - .travis.yml: add (most) supported Python versions to CI (#734) - integration_tests: fix IN_PLACE CLOUD_INIT_SOURCE (#731) - cc_ca_certs: add RHEL support (#633) [cawamata] - Azure: only generate config for NICs with addresses (#709) [Thomas Stringer] - doc: fix CloudStack configuration example (#707) [Olivier Lemasle] - integration_tests: restrict test_lxd_bridge appropriately (#730) - Add integration tests for CLI functionality (#729) - Integration test for gh-626 (#728) - Some test_upgrade fixes (#726) -
[Yahoo-eng-team] [Bug 1892405] Re: Removing router interface causes router to stop routing between all
This bug was fixed in the package neutron - 2:14.4.2-0ubuntu1~cloud1 --- neutron (2:14.4.2-0ubuntu1~cloud1) bionic-stein; urgency=medium . * d/p/fix-dvr-source-mac-flows.patch: Fix DVR source mac flows when non-gateway port on router is deleted (LP: #1892405). ** Changed in: cloud-archive/stein Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1892405 Title: Removing router interface causes router to stop routing between all Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive stein series: Fix Released Status in Ubuntu Cloud Archive train series: Fix Released Status in Ubuntu Cloud Archive ussuri series: Fix Released Status in Ubuntu Cloud Archive victoria series: Fix Released Status in neutron: Fix Released Status in neutron package in Ubuntu: Fix Released Status in neutron source package in Focal: Fix Released Status in neutron source package in Groovy: Fix Released Status in neutron source package in Hirsute: Fix Released Bug description: [Impact] Stumbled upon an issue where removing a DVR HA router interface renders all other subnets connected to that router to stop routing. VMs can't reach the HA port (IP) of the router (ping). Worked around this by: openstack router set --disabled openstack router set --enable This has happened more than once in the current deployment - cloud:bionic-stein - neutron 2:14.0.4-0ubuntu1~cloud1 [Test Case] 1. Reproducing the issue 1a. Deploy openstack using stsstack-bundles https://launchpad.net/stsstack-bundles 1b. Run the test script lp1892405_reproducer from comment #10 The script does the following (Detailed steps in comment #4) - Create 3 projects P1, P2, P3 - Create a router and network in each project, say R1,R2,R3 and N1,N2,N3 - Cross-connect networks by adding ports to router. - Launch VMs on N1, N2 (Ensure VMs are landed on 2 different compute nodes) - ping from VM1 -> VM2 should be successful - Detach leg from N1 -> N3 - Check for any packet loss during ping from VM1 -> VM2 The script output shows the ping output from VM1 -> VM2 and there will be packet loss 2. Install the package with fixed code 3. Confirm bug have been fixed 3a. Cleanup of projects P1,P2,P3 and associated resources created in 1b Re-enable the hypervisor which is disabled as part of 1b script. Commands for the cleanup: openstack server list --all-projects -c ID -f value | xargs openstack server delete openstack router remove port P2-router to-n2 openstack router remove port P1-router from-n2 openstack router remove port P1-router from-n3 for i in P1 P2 P3; do openstack subnet list --project $i -c ID -f value | xargs openstack router remove subnet $i-router; done for i in P1 P2 P3; do openstack router delete $i-router; done for i in P1 P2 P3; do openstack network list --project $i -c ID -f value | xargs openstack network delete; done openstack floating ip list -c ID -f value | xargs openstack floating ip delete for i in P1 P2 P3; do openstack project delete $i; done openstack compute service list --service nova-compute | grep disabled | awk '{print $6}' | xargs -I {} openstack compute service set --enable {} nova-compute 3b. Re-run the script 1b The script output shows the ping output from VM1 -> VM2 and there should not be any packet loss [Where problems could occur] Upstream CI ran all the functional and tempest test cases that involves deletion of DVR port connected to router which should cover the scenarios involving the code change. Installation of new package will result in restart of neutron-openvswitch service and will take a few milliseconds to repopulate all the OVS flows. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1892405/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1916482] [NEW] rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out of 8394566 bytes) since Victoria Upgrade, ceph v nautilus
Public bug reported: full glance log: https://paste.ubuntu.com/p/fVrj5vfc7m/ Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:35.577 55 DEBUG glance_store._drivers.rbd [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] creating image fdae789d-10ce-40f5-9c50-c8b206b81531 with order 23 and size 0 add /openstack/venvs/glance-22.0.1/lib/python3.8/site-packages/glance_store/_drivers/rbd.py:555 Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:35.578 55 WARNING glance_store._drivers.rbd [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] Since image size is zero we will be doing resize-before-write which will be slower than normal Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:35.642 55 DEBUG glance_store._drivers.rbd [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] resizing image to 8192.0 KiB _resize_on_write /openstack/venvs/glance-22.0.1/lib/python3.8/site-packages/glance_store/_drivers/rbd.py:505 Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:35.761 55 ERROR glance_store._drivers.rbd [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] Failed to store image fdae789d-10ce-40f5-9c50-c8b206b81531 Store Exception RBD incomplete write (Wrote only 8388608 out of 8394566 bytes): rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out of 8394566 bytes) Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:35.809 55 DEBUG glance_store._drivers.rbd [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] Snap Operating Exception [errno 2] RBD image not found (error unprotecting snapshot b'fdae789d-10ce-40f5-9c50-c8b206b81531'@b'snap') Snapshot does not exist. _delete_image /openstack/venvs/glance-22.0.1/lib/python3.8/site-packages/glance_store/_drivers/rbd.py:464 Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:35.984 55 ERROR glance.api.v2.image_data [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] Failed to upload image data due to internal error: rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out of 8394566 bytes) Feb 20 15:18:36 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:36.037 55 ERROR glance.common.wsgi [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] Caught error: RBD incomplete write (Wrote only 8388608 out of 8394566 bytes): rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out of 8394566 bytes) ** Affects: glance Importance: Undecided Status: New ** Attachment added: "glance debug log" https://bugs.launchpad.net/bugs/1916482/+attachment/5465930/+files/bug ** Summary changed: - Snapshots fail to be written to ceph/rbd backend since upgrade to Victoria release. Ceph v Nautilus + rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out of 8394566 bytes) since Victoria Upgrade, ceph v nautilus -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1916482 Title: rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out of 8394566 bytes) since Victoria Upgrade, ceph v nautilus Status in Glance: New Bug description: full glance log: https://paste.ubuntu.com/p/fVrj5vfc7m/ Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:35.577 55 DEBUG glance_store._drivers.rbd [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] creating image fdae789d-10ce-40f5-9c50-c8b206b81531 with order 23 and size 0 add /openstack/venvs/glance-22.0.1/lib/python3.8/site-packages/glance_store/_drivers/rbd.py:555 Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:35.578 55 WARNING glance_store._drivers.rbd [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] Since image size is zero we will be doing resize-before-write which will be slower than normal Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 15:18:35.642 55 DEBUG glance_store._drivers.rbd [req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 946f0e543169462596dbaaf7504f7a4a - default default] resizing
[Yahoo-eng-team] [Bug 1916470] [NEW] [OVN][QOS] OVN DB QoS rule is not removed when a FIP is dissasociated
Public bug reported: When a FIP is dissasociated, the QoS rules in the OVN DB are not deleted. The OVN client "disassociate_floatingip" does not call the QoS extension method to delete the related FIP QoS. Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1930942 ** Affects: neutron Importance: Undecided Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez) Status: New ** Changed in: neutron Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1916470 Title: [OVN][QOS] OVN DB QoS rule is not removed when a FIP is dissasociated Status in neutron: New Bug description: When a FIP is dissasociated, the QoS rules in the OVN DB are not deleted. The OVN client "disassociate_floatingip" does not call the QoS extension method to delete the related FIP QoS. Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1930942 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1916470/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp