[Yahoo-eng-team] [Bug 1913625] Re: Glance will leak staging data

2021-02-22 Thread Dan Smith
** Changed in: glance
   Status: Invalid => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1913625

Title:
  Glance will leak staging data

Status in Glance:
  Confirmed

Bug description:
  In various situations, glance will leak (potentially very large)
  temporary files in the staging store.

  One example is doing a web-download import, where glance initially
  downloads the image to its staging store. If the worker doing that
  activity crashes, loses power, etc, the user may delete the image and
  try again on another worker. When the crashed worker resumes, the
  staging data will remain but nothing will ever clean it up.

  Another example would be a misconfigured glance that uses local
  staging directories, but glance-direct is used, where the user stages
  data, and then deletes the image from another worker.

  Even in a situation where shared staging is properly configured, a
  failure to access the staging location during the delete call will
  result in the image being deleted, but the staging file not being
  purged.

  IMHO, glance workers should clean their staging directories at
  startup, purging any data that is attributable to a previous image
  having been deleted.

  Another option is to add a store location for each staged image, and
  make sure the scrubber can clean those things from the staging
  directory periodically (this requires also running the scrubber on
  each node, which may not be common practice currently).

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1913625/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1916540] [NEW] Release 21.1

2021-02-22 Thread Dan Watkins
Public bug reported:

This bug tracks cloud-init's upstream release of 21.1.

== Release Notes ==

Cloud-init release 21.1 is now available

The 21.1 release:
 * spanned about 3 months
 * had 24 contributors from 25 domains
 * fixed 10 Launchpad issues

Highlights:
 - New datasource for UpCloud
 - Introduced support for reading Openstack dynamic vendor0-data
 - Add support for VMWare's raw data feature
 - Add support for Azure VMs without ephemeral resource disks

== Changelog ==
 - Azure: Support for VMs without ephemeral resource disks. (#800)
   [Johnson Shi] (LP: #1901011)
 - cc_keys_to_console: add option to disable key emission (#811)
   [Michael Hudson-Doyle] (LP: #1915460)
 - integration_tests: introduce lxd_use_exec mark (#802)
 - azure: case-insensitive UUID to avoid new IID during kernel upgrade
   (#798) (LP: #1835584)
 - stale.yml: don't ask submitters to reopen PRs (#816)
 - integration_tests: fix use of SSH agent within tox (#815)
 - integration_tests: add UPGRADE CloudInitSource (#812)
 - integration_tests: use unique MAC addresses for tests (#813)
 - Update .gitignore (#814)
 - Port apt cloud_tests to integration tests (#808)
 - integration_tests: fix test_gh626 on LXD VMs (#809)
 - Fix attempting to decode binary data in test_seed_random_data test (#806)
 - Remove wait argument from tests with session_cloud calls (#805)
 - Datasource for UpCloud (#743) [Antti Myyrä]
 - test_gh668: fix failure on LXD VMs (#801)
 - openstack: read the dynamic metadata group vendor_data2.json (#777)
   [Andrew Bogott] (LP: #1841104)
 - includedir in suoders can be prefixed by "arroba" (#783)
   [Jordi Massaguer Pla]
 - [VMware] change default max wait time to 15s (#774) [xiaofengw-vmware]
 - Revert integration test associated with reverted #586 (#784)
 - Add jordimassaguerpla as contributor (#787) [Jordi Massaguer Pla]
 - Add Rick Harding to CLA signers (#792) [Rick Harding]
 - HACKING.rst: add clarifying note to LP CLA process section (#789)
 - Stop linting cloud_tests (#791)
 - cloud-tests: update cryptography requirement (#790) [Joshua Powers]
 - Remove 'remove-raise-on-failure' calls from integration_tests (#788)
 - Use more cloud defaults in integration tests (#757)
 - Adding self to cla signers (#776) [Andrew Bogott]
 - doc: avoid two warnings (#781) [Dan Kenigsberg]
 - Use proper spelling for Red Hat (#778) [Dan Kenigsberg]
 - Add antonyc to .github-cla-signers (#747) [Anton Chaporgin]
 - integration_tests: log image serial if available (#772)
 - Revert "ssh_util: handle non-default AuthorizedKeysFile config (#586)"
   (#775)
 - Release 20.4.1 (LP: #1911680)
 - Revert "ssh_util: handle non-default AuthorizedKeysFile config (#586)"
 - [VMware] Support cloudinit raw data feature (#691) [xiaofengw-vmware]
 - net: Fix static routes to host in eni renderer (#668) [Pavel Abalikhin]
 - .travis.yml: don't run cloud_tests in CI (#756)
 - test_upgrade: add some missing commas (#769)
 - cc_seed_random: update documentation and fix integration test (#771)
   (LP: #1911227)
 - Fix test gh-632 test to only run on NoCloud (#770) (LP: #1911230)
 - archlinux: fix package upgrade command handling (#768) [Bao Trinh]
 - integration_tests: add integration test for LP: #1910835 (#761)
 - Fix regression with handling of IMDS ssh keys (#760) [Thomas Stringer]
 - integration_tests: log cloud-init version in SUT (#758)
 - Add ajmyyra as contributor (#742) [Antti Myyrä]
 - net_convert: add some missing help text (#755)
 - Missing IPV6_AUTOCONF=no to render sysconfig dhcp6 stateful on RHEL
   (#753) [Eduardo Otubo]
 - doc: document missing IPv6 subnet types (#744) [Antti Myyrä]
 - Add example configuration for datasource `AliYun` (#751) [Xiaoyu Zhong]
 - integration_tests: add SSH key selection settings (#754)
 - fix a typo in man page cloud-init.1 (#752) [Amy Chen]
 - network-config-format-v2.rst: add Netplan Passthrough section (#750)
 - stale: re-enable post holidays (#749)
 - integration_tests: port ca_certs tests from cloud_tests (#732)
 - Azure: Add telemetry for poll IMDS (#741) [Johnson Shi]
 - doc: move testing section from HACKING to its own doc (#739)
 - No longer allow integration test failures on travis (#738)
 - stale: fix error in definition (#740)
 - integration_tests: set log-cli-level to INFO by default (#737)
 - PULL_REQUEST_TEMPLATE.md: use backticks around commit message (#736)
 - stale: disable check for holiday break (#735)
 - integration_tests: log the path we collect logs into (#733)
 - .travis.yml: add (most) supported Python versions to CI (#734)
 - integration_tests: fix IN_PLACE CLOUD_INIT_SOURCE (#731)
 - cc_ca_certs: add RHEL support (#633) [cawamata]
 - Azure: only generate config for NICs with addresses (#709)
   [Thomas Stringer]
 - doc: fix CloudStack configuration example (#707) [Olivier Lemasle]
 - integration_tests: restrict test_lxd_bridge appropriately (#730)
 - Add integration tests for CLI functionality (#729)
 - Integration test for gh-626 (#728)
 - Some test_upgrade fixes (#726)
 - 

[Yahoo-eng-team] [Bug 1892405] Re: Removing router interface causes router to stop routing between all

2021-02-22 Thread Corey Bryant
This bug was fixed in the package neutron - 2:14.4.2-0ubuntu1~cloud1
---

 neutron (2:14.4.2-0ubuntu1~cloud1) bionic-stein; urgency=medium
 .
   * d/p/fix-dvr-source-mac-flows.patch: Fix DVR source mac flows when 
non-gateway
 port on router is deleted (LP: #1892405).


** Changed in: cloud-archive/stein
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1892405

Title:
  Removing router interface causes router to stop routing between all

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  [Impact]
  Stumbled upon an issue where removing a DVR HA router interface renders all 
other subnets connected to that router to stop routing. VMs can't reach the HA 
port (IP) of the router (ping).

  Worked around this by:
  openstack router set --disabled 
  openstack router set --enable 

  This has happened more than once in the current deployment
   - cloud:bionic-stein
   - neutron 2:14.0.4-0ubuntu1~cloud1

  [Test Case]
  1. Reproducing the issue

  1a. Deploy openstack using stsstack-bundles
  https://launchpad.net/stsstack-bundles

  1b. Run the test script lp1892405_reproducer from comment #10

  The script does the following (Detailed steps in comment #4)
  - Create 3 projects P1, P2, P3
  - Create a router and network in each project, say R1,R2,R3 and
    N1,N2,N3
  - Cross-connect networks by adding ports to router.
  - Launch VMs on N1, N2 (Ensure VMs are landed on 2 different compute
    nodes)
  - ping from VM1 -> VM2 should be successful
  - Detach leg from N1 -> N3
  - Check for any packet loss during ping from VM1 -> VM2

  The script output shows the ping output from VM1 -> VM2 and there will
  be packet loss

  2. Install the package with fixed code

  3. Confirm bug have been fixed

  3a. Cleanup of projects P1,P2,P3 and associated resources created in 1b
  Re-enable the hypervisor which is disabled as part of 1b script.
  Commands for the cleanup:
  openstack server list --all-projects -c ID -f value | xargs openstack 
server delete
  openstack router remove port P2-router to-n2
  openstack router remove port P1-router from-n2
  openstack router remove port P1-router from-n3
  for i in P1 P2 P3; do openstack subnet list --project $i -c ID -f value | 
xargs openstack router remove subnet $i-router; done
  for i in P1 P2 P3; do openstack router delete $i-router; done
  for i in P1 P2 P3; do openstack network list --project $i -c ID -f value 
| xargs openstack network delete; done
  openstack floating ip list -c ID -f value | xargs openstack floating ip 
delete
  for i in P1 P2 P3; do openstack project delete $i; done
  openstack compute service list --service nova-compute | grep disabled | 
awk '{print $6}' | xargs -I {} openstack compute service set --enable {} 
nova-compute

  
  3b. Re-run the script 1b

  The script output shows the ping output from VM1 -> VM2 and there
  should not be any packet loss

  [Where problems could occur]

  Upstream CI ran all the functional and tempest test cases that involves 
deletion of DVR port connected to router which should cover the scenarios 
involving the code change.
  Installation of new package will result in restart of neutron-openvswitch 
service and will take a few milliseconds to repopulate all the OVS flows.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1892405/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1916482] [NEW] rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out of 8394566 bytes) since Victoria Upgrade, ceph v nautilus

2021-02-22 Thread How about no
Public bug reported:

full glance log:
https://paste.ubuntu.com/p/fVrj5vfc7m/


Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:35.577 55 DEBUG glance_store._drivers.rbd 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] creating image 
fdae789d-10ce-40f5-9c50-c8b206b81531 with order 23 and size 0 add 
/openstack/venvs/glance-22.0.1/lib/python3.8/site-packages/glance_store/_drivers/rbd.py:555
Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:35.578 55 WARNING glance_store._drivers.rbd 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] Since image size is zero we 
will be doing resize-before-write which will be slower than normal
Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:35.642 55 DEBUG glance_store._drivers.rbd 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] resizing image to 8192.0 
KiB _resize_on_write 
/openstack/venvs/glance-22.0.1/lib/python3.8/site-packages/glance_store/_drivers/rbd.py:505
Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:35.761 55 ERROR glance_store._drivers.rbd 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] Failed to store image 
fdae789d-10ce-40f5-9c50-c8b206b81531 Store Exception RBD incomplete write 
(Wrote only 8388608 out of 8394566 bytes): rbd.IncompleteWriteError: RBD 
incomplete write (Wrote only 8388608 out of 8394566 bytes)
Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:35.809 55 DEBUG glance_store._drivers.rbd 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] Snap Operating Exception 
[errno 2] RBD image not found (error unprotecting snapshot 
b'fdae789d-10ce-40f5-9c50-c8b206b81531'@b'snap') Snapshot does not exist. 
_delete_image 
/openstack/venvs/glance-22.0.1/lib/python3.8/site-packages/glance_store/_drivers/rbd.py:464
Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:35.984 55 ERROR glance.api.v2.image_data 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] Failed to upload image data 
due to internal error: rbd.IncompleteWriteError: RBD incomplete write (Wrote 
only 8388608 out of 8394566 bytes)
Feb 20 15:18:36 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:36.037 55 ERROR glance.common.wsgi 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] Caught error: RBD 
incomplete write (Wrote only 8388608 out of 8394566 bytes): 
rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out of 
8394566 bytes)

** Affects: glance
 Importance: Undecided
 Status: New

** Attachment added: "glance debug log"
   https://bugs.launchpad.net/bugs/1916482/+attachment/5465930/+files/bug

** Summary changed:

- Snapshots fail to be written to ceph/rbd backend since upgrade to Victoria 
release. Ceph v Nautilus
+ rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out of 
8394566 bytes) since Victoria Upgrade, ceph v nautilus

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1916482

Title:
  rbd.IncompleteWriteError: RBD incomplete write (Wrote only 8388608 out
  of 8394566 bytes) since Victoria Upgrade, ceph v nautilus

Status in Glance:
  New

Bug description:
  full glance log:
  https://paste.ubuntu.com/p/fVrj5vfc7m/

  
  Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:35.577 55 DEBUG glance_store._drivers.rbd 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] creating image 
fdae789d-10ce-40f5-9c50-c8b206b81531 with order 23 and size 0 add 
/openstack/venvs/glance-22.0.1/lib/python3.8/site-packages/glance_store/_drivers/rbd.py:555
  Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:35.578 55 WARNING glance_store._drivers.rbd 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] Since image size is zero we 
will be doing resize-before-write which will be slower than normal
  Feb 20 15:18:35 xxx-glance-container-071516d6 glance-wsgi-api[55]: 2021-02-20 
15:18:35.642 55 DEBUG glance_store._drivers.rbd 
[req-663e7646-b12e-4e4c-ad18-d4b23644a504 23fd1e8e87af40b88418b118085104cf 
946f0e543169462596dbaaf7504f7a4a - default default] resizing 

[Yahoo-eng-team] [Bug 1916470] [NEW] [OVN][QOS] OVN DB QoS rule is not removed when a FIP is dissasociated

2021-02-22 Thread Rodolfo Alonso
Public bug reported:

When a FIP is dissasociated, the QoS rules in the OVN DB are not
deleted. The OVN client "disassociate_floatingip" does not call the QoS
extension method to delete the related FIP QoS.

Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1930942

** Affects: neutron
 Importance: Undecided
 Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
 Status: New

** Changed in: neutron
 Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1916470

Title:
  [OVN][QOS] OVN DB QoS rule is not removed when a FIP is dissasociated

Status in neutron:
  New

Bug description:
  When a FIP is dissasociated, the QoS rules in the OVN DB are not
  deleted. The OVN client "disassociate_floatingip" does not call the
  QoS extension method to delete the related FIP QoS.

  Related BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1930942

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1916470/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp