[Yahoo-eng-team] [Bug 1750780] Re: Race with local file systems can make open-vm-tools fail to start

2018-02-21 Thread ChristianEhrhardt
cloud-init task was just for discussion, marking it invalid to make
clear there is no cloud-init action needed.

** Changed in: cloud-init
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1750780

Title:
  Race with local file systems can make open-vm-tools fail to start

Status in cloud-init:
  Invalid
Status in open-vm-tools package in Ubuntu:
  Triaged

Bug description:
  Since the change in [1] open-vm-tools-service starts very (very) early.
  Not so much due to the 
  Before=cloud-init-local.service
  But much more by
  DefaultDependencies=no

  That can trigger an issue that looks like
  root@ubuntuguest:~# systemctl status -l open-vm-tools.service
  ● open-vm-tools.service - Service for virtual machines hosted on VMware
 Loaded: loaded (/lib/systemd/system/open-vm-tools.service; enabled; vendor 
preset: enabled)
 Active: failed (Result: resources)

  
  As it is right now open-vm-tools can race with the other early start and then 
fail.
  In detail one can find a message like:
open-vm-tools.service: Failed to run 'start' task: Read-only file system"

  This is due to privtaeTmp=yes which is also set needing a writable
  /var/tmp [2]

  To ensure this works PrivateTmp would have to be removed (not good) or some 
after dependencies added that make this work reliably.
  I added
  After=local-fs.target
  which made it work for me in 3/3 tests.

  I' like to have an ack by the cloud-init Team that this does not totally kill 
the originally intended Before=cloud-init-local.service
  I think it does not as local-fs can complete before cloud-init-local, then 
open-vm-tools can initialize and finally cloud-init-local can pick up the data.

  To summarize:
  # cloud-init-local #
  DefaultDependencies=no
  Wants=network-pre.target
  After=systemd-remount-fs.service
  Before=NetworkManager.service
  Before=network-pre.target
  Before=shutdown.target
  Before=sysinit.target
  Conflicts=shutdown.target
  RequiresMountsFor=/var/lib/cloud

  # open-vm-tools #
  DefaultDependencies=no
  Before=cloud-init-local.service

  Proposed is to add to the latter:
  After=local-fs.target

  [1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859677
  [2]: https://github.com/systemd/systemd/issues/5610

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1750780/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1384379] Re: versions resource uses host_url which may be incorrect

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/15
Committed: 
https://git.openstack.org/cgit/openstack/trove/commit/?id=1667ad5e80be7d0bf3ac8e02410a18ce3a0ea4cd
Submitter: Zuul
Branch:master

commit 1667ad5e80be7d0bf3ac8e02410a18ce3a0ea4cd
Author: Zhao Chao 
Date:   Wed Feb 7 11:07:02 2018 +0800

Allow host URL for versions to be configurable

The versions resource constructs the links by using application_url,
but it's possible that the API endpoint is behind a load balancer
or SSL terminator. This means that the application_url might be
incorrect. This fix provides a config option (similar to other
services) which lets us  override the host URL when constructing
links for the versions API.

Co-Authored-By: Nikhil Manchanda 
Change-Id: I23f06c6c2d52ba46c74e0d097c4963d2de731d30
Closes-bug: 1384379


** Changed in: trove
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1384379

Title:
  versions resource uses host_url which may be incorrect

Status in Cinder:
  Fix Released
Status in Glance:
  Fix Released
Status in Glance icehouse series:
  Triaged
Status in Glance juno series:
  Triaged
Status in OpenStack Heat:
  Triaged
Status in Ironic:
  Fix Released
Status in Manila:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack DBaaS (Trove):
  Fix Released

Bug description:
  The versions resource constructs the links by using host_url, but the
  glance api endpoint may be behind a proxy or ssl terminator. This
  means that host_url may be incorrect. It should have a config option
  to override host_url like the other services do when constructing
  versions links.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1384379/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1727260] Re: Nova assumes that a volume is fully detached from the compute if the volume is not defined in the instance's libvirt definition

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/515008
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=ce531dd1b763704b9043ddde8e80ac99cd193660
Submitter: Zuul
Branch:master

commit ce531dd1b763704b9043ddde8e80ac99cd193660
Author: Sahid Orentino Ferdjaoui 
Date:   Wed Oct 25 05:57:11 2017 -0400

libvirt: disconnect volume from host during detach

Under certain failure scenarios it may be that although the libvirt
definition for the volume has been removed for the instance that the
associated storage lun on the compute server may not have been fully
cleaned up yet.

In case users try an other attempt to detach volume we should not stop
the process whether the device is not found in domain definition but
try to disconnect the logical device from host.

This commit makes the process to attempt a disconnect volume even if
the device is not attached to the guest.

Closes-Bug: #1727260
Change-Id: I4182642aab3fd2ffb1c97d2de9bdca58982289d8
Signed-off-by: Sahid Orentino Ferdjaoui 


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1727260

Title:
   Nova assumes that a volume is fully detached from the compute if the
  volume is not defined in the instance's libvirt definition

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  During a volume detach operation, Nova compute attempts to remove the
  volume from libvirt for the instance before proceeding to remove the
  storage lun from the underlying compute host. If Nova discovers that
  the volume was not found in the instance's libvirt definition then it
  ignores that error condition and returns (after issuing a warning
  message "Ignoring DiskNotFound exception while detaching").

  However, under certain failure scenarios it may be that although the
  libvirt definition for the volume has been removed for the instance
  that the associated storage lun on the compute server may not have
  been fully cleaned up yet.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1727260/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750666] Re: Deleting an instance before scheduling with BFV fails to detach volume

2018-02-21 Thread melanie witt
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Tags added: queens-rc-potential volumes

** Also affects: nova/pike
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750666

Title:
  Deleting an instance before scheduling with BFV fails to detach volume

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  New
Status in OpenStack Compute (nova) queens series:
  New

Bug description:
  If you try to boot and instance and delete it early before scheduling,
  the '_delete_while_booting' codepath hits
  `_attempt_delete_of_buildrequest` which tries to remove the block
  device mappings.

  However, if the cloud contains compute nodes before Pike, no block
  device mappings will be present in the database (because they are only
  saved if using the new attachment flow), which means the attachment
  IDs are empty and the volume delete fails:

  2018-02-20 16:02:25,063 WARNING [nova.compute.api] Ignoring volume
  cleanup failure due to Object action obj_load_attr failed because:
  attribute attachment_id not lazy-loadable

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750666/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1404867] Re: Volume remains in-use status, if instance booted from volume is deleted in error state

2018-02-21 Thread Matt Riedemann
** Also affects: nova/ocata
   Importance: Undecided
   Status: New

** Changed in: nova/ocata
 Assignee: (unassigned) => Mohammed Naser (mnaser)

** Changed in: nova/ocata
   Status: New => In Progress

** Changed in: nova/ocata
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1404867

Title:
  Volume remains in-use status, if instance booted from volume is
  deleted in error state

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  In Progress
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  If the instance is booted from volume and goes in to error state due to some 
reason.
  Volume from which instance is booted, remains in-use state even the instance 
is deleted.
  IMO, volume should be detached so that it can be used to boot other instance.

  Steps to reproduce:

  1. Log in to Horizon, create a new volume.
  2. Create an Instance using newly created volume.
  3. Verify instance is in active state.
  $ source devstack/openrc demo demo
  $ nova list
  
+--+--+++-+--+
  | ID   | Name | Status | Task State | Power 
State | Networks |
  
+--+--+++-+--+
  | dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | ACTIVE | -  | Running 
| private=10.0.0.3 |
  
+--+--+++-+--+

  Note:
  Use shelve-unshelve api to see the instance goes into error state.
  unshelving volumed back instance does not work and sets instance state to 
error state (ref: https://bugs.launchpad.net/nova/+bug/1404801)

  4. Shelve the instance
  $ nova shelve 

  5. Verify the status is SHELVED_OFFLOADED.
  $ nova list
  
+--+--+---++-+--+
  | ID   | Name | Status| Task 
State | Power State | Networks |
  
+--+--+---++-+--+
  | dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | SHELVED_OFFLOADED | - 
 | Shutdown| private=10.0.0.3 |
  
+--+--+---++-+--+

  6. Unshelve the instance.
  $ nova unshelve 

  5. Verify the instance is in Error state.
  $ nova list
  
+--+--+---++-+--+
  | ID   | Name | Status| Task 
State | Power State | Networks |
  
+--+--+---++-+--+
  | dae3a13b-6aa8-4794-93cd-5ab7bf90f604 | nova | Error | 
unshelving | Spawning| private=10.0.0.3 |
  
+--+--+---++-+--+

  6. Delete the instance using Horizon.

  7. Verify that volume still in in-use state
  $ cinder list
  
+--++--+--+-+--+--+
  |  ID  | Status | Name | Size | Volume Type | 
Bootable | Attached to  |
  
+--++--+--+-+--+--+
  | 4aeefd25-10aa-42c2-9a2d-1c89a95b4d4f | in-use | test |  1   | lvmdriver-1 | 
  true   | 8f7bdc24-1891-4bbb-8f0c-732b9cbecae7 |
  
+--++--+--+-+--+--+

  8. In Horizon, volume "Attached To" information is displayed as
  "Attached to None on /dev/vda".

  9. User is not able to delete this volume, or attached it to another
  instance as it is still in use.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1404867/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1408527] Re: Delete instance without block_device_mapping record in database after schedule error

2018-02-21 Thread Matt Riedemann
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Also affects: nova/ocata
   Importance: Undecided
   Status: New

** Also affects: nova/pike
   Importance: Undecided
   Status: New

** Changed in: nova/ocata
   Status: New => In Progress

** Changed in: nova/pike
   Status: New => In Progress

** Changed in: nova/queens
   Status: New => In Progress

** Changed in: nova/ocata
 Assignee: (unassigned) => Mohammed Naser (mnaser)

** Changed in: nova
 Assignee: Ankit Agrawal (ankitagrawal) => melanie witt (melwitt)

** Changed in: nova/pike
 Assignee: (unassigned) => Mohammed Naser (mnaser)

** Changed in: nova/queens
 Assignee: (unassigned) => Mohammed Naser (mnaser)

** Changed in: nova/ocata
   Importance: Undecided => Medium

** Changed in: nova
   Importance: Low => Medium

** Changed in: nova/pike
   Importance: Undecided => Medium

** Changed in: nova/queens
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1408527

Title:
  Delete instance without block_device_mapping record in database after
  schedule error

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  In Progress
Status in OpenStack Compute (nova) pike series:
  In Progress
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  When a instance with cinder volume is failed to be scheduled to a host, its 
status becomes error.
  Now I delete it successfully, but in block_device_mapping table of nova 
database, the volume information of the instance is still kept, and not deleted.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1408527/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750666] Related fix merged to nova (master)

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/546315
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=3120627d9802ceda46c2db387fec8fbc80700338
Submitter: Zuul
Branch:master

commit 3120627d9802ceda46c2db387fec8fbc80700338
Author: Mohammed Naser 
Date:   Tue Feb 20 16:47:06 2018 -0500

Add functional test for deleting BFV server with old attach flow

When creating a new instance and deleting it before it gets scheduled
with the old attachment flow (reserve_volume), the block device mappings
are not persisted to database which means that the clean up fails
because it tries to lookup attachment_id which cannot be lazy loaded.

This patch adds a (failing) functional test to check for this issue
which will be addressed in a follow-up patch.

Related-Bug: #1750666
Change-Id: I294c54e5a22dd6e5b226a4b00e7cd116813f0704


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750666

Title:
  Deleting an instance before scheduling with BFV fails to detach volume

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  If you try to boot and instance and delete it early before scheduling,
  the '_delete_while_booting' codepath hits
  `_attempt_delete_of_buildrequest` which tries to remove the block
  device mappings.

  However, if the cloud contains compute nodes before Pike, no block
  device mappings will be present in the database (because they are only
  saved if using the new attachment flow), which means the attachment
  IDs are empty and the volume delete fails:

  2018-02-20 16:02:25,063 WARNING [nova.compute.api] Ignoring volume
  cleanup failure due to Object action obj_load_attr failed because:
  attribute attachment_id not lazy-loadable

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750666/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1742963] Re: Cannot boot VM with Contrail SDN controller

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/533212
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=1f5fe3190bf2e0987945a6ef9ec430673c9fa736
Submitter: Zuul
Branch:master

commit 1f5fe3190bf2e0987945a6ef9ec430673c9fa736
Author: Édouard Thuleau 
Date:   Fri Jan 12 16:20:32 2018 +0100

Update plugs Contrail methods to work with privsep

As privsep uses the msgpack to send method arguments to the privsep
daemon, we could not use anymore custom data type like
nova.objects.instance.Instance.

Change-Id: I09f04d5b2f1cb39339ad7c4569186db5d361797a
Closes-Bug: #1742963


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1742963

Title:
  Cannot boot VM with Contrail SDN controller

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  On the master branch, nova-compute fails to create vif on the Contrail 
vrouter compute agent and the instance fails to spawn:
  ...
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
six.reraise(self.type_, self.value, self.tb)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/opt/stack/openstack/nova/nova/virt/libvirt/driver.py", line 5238, 
  in _create_domain_and_network
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
self.plug_vifs(instance, network_info)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/opt/stack/openstack/nova/nova/virt/libvirt/driver.py", line 755, in plug_vifs
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
self.vif_driver.plug(instance, vif)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/opt/stack/openstack/nova/nova/virt/libvirt/vif.py", line 769, in p
  lug
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
func(instance, vif)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/opt/stack/openstack/nova/nova/virt/libvirt/vif.py", line 727, in plug_vrouter
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
instance, vif, ip_addr, ip6_addr, ptype)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 
207, in _wrap
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
return self.channel.remote_call(name, args, kwargs)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 192, in 
remote_call
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
result = self.send_recv((Message.CALL.value, name, args, kwargs))
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/oslo_privsep/comm.py", line 163, in 
send_recv
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
self.writer.send((myid, msg))
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/oslo_privsep/comm.py", line 
  54, in send
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] buf = 
msgpack.packb(msg, use_bin_type=True)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/msgpack/__init__.py", line 47, in packb
  Jan 12 

[Yahoo-eng-team] [Bug 1750917] [NEW] Keystone returns a HTTP 500 error if xmlsec CLI is missing

2018-02-21 Thread Guang Yee
Public bug reported:

Keystone log is also unhelpful. All we got is

"ERROR idp _sign_assertion Error when signing assertion, reason: [Errno
2] No such file or directory"

When the xmlsec1 package is absent.

We may need to add a check here

https://github.com/openstack/keystone/blob/master/keystone/federation/idp.py#L421

to see if CONF.saml.xmlsec1_binary exist. If absent, we just to provide
a more helpful log entry.

Steps to reproduce:

1. Install devstack and enable federation.
2. Uninstall the xmlsec1 package
3. Try to authenticate via federation and you'll get a HTTP 500 error and the 
corresponding log entry in keystone.log

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1750917

Title:
  Keystone returns a HTTP 500 error if xmlsec CLI is missing

Status in OpenStack Identity (keystone):
  New

Bug description:
  Keystone log is also unhelpful. All we got is

  "ERROR idp _sign_assertion Error when signing assertion, reason:
  [Errno 2] No such file or directory"

  When the xmlsec1 package is absent.

  We may need to add a check here

  
https://github.com/openstack/keystone/blob/master/keystone/federation/idp.py#L421

  to see if CONF.saml.xmlsec1_binary exist. If absent, we just to
  provide a more helpful log entry.

  Steps to reproduce:

  1. Install devstack and enable federation.
  2. Uninstall the xmlsec1 package
  3. Try to authenticate via federation and you'll get a HTTP 500 error and the 
corresponding log entry in keystone.log

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1750917/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1700748] Re: Persistent tokens are not cleaned up when removing users from projects

2018-02-21 Thread Lance Bragstad
Now that we're in the Rocky development cycle, we've removed the uuid
token provider and the sql token storage driver, since it was slated for
removal this release [0].

[0] https://review.openstack.org/#/c/543060/

** Changed in: keystone
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1700748

Title:
  Persistent tokens are not cleaned up when removing users from projects

Status in OpenStack Identity (keystone):
  Invalid

Bug description:
  If deleting a role, we should iterate over the assignments for this
  role and build the list of tokens we need to delete. In order to
  minimize the number of token list to delete, remove any redundant
  user+project deletions.

  I think simplify the list for the same user is Improper, the same user
  and different project target different tokens. At the same time,
  original processing actually doesn't work due to user_ids is never
  added to.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1700748/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750064] Re: multiattach volume failures are masked in compute api

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/545478
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=5754ac0ab3670b47243b2d880aa153a7f42a3ac5
Submitter: Zuul
Branch:master

commit 5754ac0ab3670b47243b2d880aa153a7f42a3ac5
Author: Matt Riedemann 
Date:   Fri Feb 16 17:38:38 2018 -0500

Fix error handling in compute API for multiattach errors

Because of the big Exception block in _validate_bdm, the
multiattach-specific errors raised out of the
_check_attach_and_reserve_volume method were being lost
and the very generic InvalidBDMVolume was returned to the
user.

For example, I hit this when trying to create a server from
a multiattach volume but forgot to specify microversion 2.60
and it was just telling me it couldn't get the volume, which
I knew was bogus since I could get the volume details.

The fix is to handle the specific errors we want to re-raise.

The tests, which missed this because of their high-level mocking,
are updated so that we actually get to the problematic code and
only the things we don't care about along the way are mocked out.

Change-Id: I0b397e5bcdfd635fa562beb29819dd8c6b828e8a
Closes-Bug: #1750064


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750064

Title:
  multiattach volume failures are masked in compute api

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  While trying to create a server from a multiattach volume, I kept
  getting this unhelpful error:

  $ openstack server create --flavor 1 --wait --volume 
cirros-multiattach-volume new-server
  Block Device Mapping is Invalid: failed to get volume 
fde36bc8-3104-43b7-902a-3b391d7f4e12. (HTTP 400) (Request-ID: 
req-4e2dc785-82b6-4e48-93e5-9b72bd62b3ba)

  By adding some debug code to nova-api, I figured out it was this:

  Feb 16 21:18:35 multiattach devstack@n-api.service[13811]: ERROR
  nova.compute.api [None req-88476e2e-f6ed-4b95-bed2-f6cf567b044d demo
  demo] Failed to get volume fde36bc8-3104-43b7-902a-3b391d7f4e12.
  Error: Multiattach volumes are only supported starting with compute
  API version 2.60.: MultiattachNotSupportedOldMicroversion: Multiattach
  volumes are only supported starting with compute API version 2.60.

  This is the problematic exception block:

  
https://github.com/openstack/nova/blob/9910ac95eb9ac822ef5d38b8af2d3aff5dc4d25a/nova/compute/api.py#L1354

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750064/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750680] Re: Nova returns a traceback when it's unable to detach a volume still in use

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/546423
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=b16c0f10539a6c6b547a70a41c75ef723fc618ce
Submitter: Zuul
Branch:master

commit b16c0f10539a6c6b547a70a41c75ef723fc618ce
Author: Dan Smith 
Date:   Tue Feb 20 14:41:35 2018 -0800

Avoid exploding if guest refuses to detach a volume

When we run detach_volume(), the guest has to respond to the ACPI
eject request in order for us to proceed. It may not do this at all
if the volume is mounted or in-use, or may not by the time we time out
if lots of dirty data needs flushing. Right now, we let the failure
exception bubble up to our caller and we log a nasty stack trace, which
doesn't really convey the reason (and that it's an expected and
reasonable thing to happen).

Thus, this patch catches that, logs the situation at warning level and
avoids the trace.

Change-Id: I3800b466a50b1e5f5d1e8c8a963d9a6258af67ee
Closes-Bug: #1750680


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750680

Title:
  Nova returns a traceback when it's unable to detach a volume still in
  use

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===
  If libvirt is unable to detach a volume because it's still in-use by the 
guest (either mounted and/or file opened), nova returns a traceback.

  Steps to reproduce
  ==

  * Create an instance with volume attached using heat
  * Make sure there's activity on the volume
  * Delete stack

  Expected result
  ===
  We would expect nova to not return a traceback but a clean log about its 
incapacity to detach volume. If possible, that would be great if that exception 
was raised back to either cinder or heat.

  Actual result
  =
  ```
  21495 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall [-] Dynamic 
interval looping call 'oslo_service.loopingcall._func' failed: 
DeviceDetachFailed: Device detach failed for vdf: Unable to detach from guest 
transient domain.
  21496 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall Traceback 
(most recent call last):
  21497 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall   File 
"/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 137, in 
_run_loop
  21498 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall result = 
func(*self.args, **self.kw)
  21499 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall   File 
"/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 415, in 
_func
  21500 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall return 
self._sleep_time
  21501 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  21502 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall 
self.force_reraise()
  21503 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  21504 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall 
six.reraise(self.type_, self.value, self.tb)
  21505 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall   File 
"/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 394, in 
_func
  21506 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall result = 
f(*args, **kwargs)
  21507 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall   File 
"/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 462, in 
_do_wait_and_retry_detach
  21508 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall 
device=alternative_device_name, reason=reason)
  21509 2018-02-14 20:31:09.735 1 ERROR oslo.service.loopingcall 
DeviceDetachFailed: Device detach failed for vdf: Unable to detach from guest 
transient domain.
  ```

  Environment
  ===
  * Red Hat Openstack 12
  ```
  libvirt-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 
15:28:48 2018
  libvirt-client-3.2.0-14.el7_4.7.x86_64  Fri Jan 26 
15:26:07 2018
  libvirt-daemon-3.2.0-14.el7_4.7.x86_64  Fri Jan 26 
15:26:02 2018
  libvirt-daemon-config-network-3.2.0-14.el7_4.7.x86_64   Fri Jan 26 
15:26:06 2018
  libvirt-daemon-config-nwfilter-3.2.0-14.el7_4.7.x86_64  Fri Jan 26 
15:26:05 2018
  libvirt-daemon-driver-interface-3.2.0-14.el7_4.7.x86_64 Fri Jan 26 
15:26:05 2018
  libvirt-daemon-driver-lxc-3.2.0-14.el7_4.7.x86_64   Fri Jan 26 
15:26:06 2018
  libvirt-daemon-driver-network-3.2.0-14.el7_4.7.x86_64   Fri Jan 26 
15:26:02 2018
  libvirt-daemon-driver-nodedev-3.2.0-14.el7_4.7.x86_64   Fri Jan 26 
15:26:05 2018
  

[Yahoo-eng-team] [Bug 1750084] Re: Report client associations include non-sharing providers

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/545494
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=d2152f309439a9d9b054a481826242ca15f5c93e
Submitter: Zuul
Branch:master

commit d2152f309439a9d9b054a481826242ca15f5c93e
Author: Eric Fried 
Date:   Fri Feb 16 18:18:32 2018 -0600

Only pull associated *sharing* providers

It was discussed and decided [1] that we only want to be pulling down,
caching, and passing to update_provider_tree providers associated via
aggregate with the compute node's provider tree if they are sharing
providers. Otherwise we'll get e.g. all the *other* compute nodes which
are also associated with a sharing provider.

[1] 
https://review.openstack.org/#/c/540111/4/specs/rocky/approved/update-provider-tree.rst@48

Change-Id: Iab366da7623e5e31b8416e89fee7d418f7bf9b30
Closes-Bug: #1750084


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750084

Title:
  Report client associations include non-sharing providers

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  It was discussed and decided [1] that we only want to be pulling down,
  caching, and passing to update_provider_tree providers associated via
  aggregate with the compute node's provider tree if they are sharing
  providers.  Otherwise we'll get e.g. all the *other* compute nodes
  which are also associated with a sharing provider.

  [1] https://review.openstack.org/#/c/540111/4/specs/rocky/approved
  /update-provider-tree.rst@48

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750084/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750892] [NEW] Image remains in queued status after location set via PATCH

2018-02-21 Thread iain MacDonnell
Public bug reported:

Pike release, with show_image_direct_url and show_multiple_locations
enabled.

Attempting to create an image using the HTTP backend with the glance v2
API. I create a new/blank image (goes into "queued" status), then set
the location with:

curl -g -i -X PATCH -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*'
-H 'User-Agent: imacdonn-getting-dangerous' -H 'X-Auth-Token: xxx' -H
'Content-Type: application/openstack-images-v2.1-json-patch' -d
'[{"op":"replace", "path": "/locations", "value": [{"url":
"http://my_http_server/cirros.img;, "metadata": {}}]}]'
http://my_glance_api_endpoint:9292/v2/images/e5581f14-2d05-4ae7-8d78-9da42731a37e

This results in the direct_url getting set correctly, and the size of
the image is correctly determined, but the image remains in "queued"
status. It should become "active".

** Affects: glance
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1750892

Title:
  Image remains in queued status after location set via PATCH

Status in Glance:
  New

Bug description:
  Pike release, with show_image_direct_url and show_multiple_locations
  enabled.

  Attempting to create an image using the HTTP backend with the glance
  v2 API. I create a new/blank image (goes into "queued" status), then
  set the location with:

  curl -g -i -X PATCH -H 'Accept-Encoding: gzip, deflate' -H 'Accept:
  */*' -H 'User-Agent: imacdonn-getting-dangerous' -H 'X-Auth-Token:
  xxx' -H 'Content-Type: application/openstack-images-v2.1-json-patch'
  -d '[{"op":"replace", "path": "/locations", "value": [{"url":
  "http://my_http_server/cirros.img;, "metadata": {}}]}]'
  
http://my_glance_api_endpoint:9292/v2/images/e5581f14-2d05-4ae7-8d78-9da42731a37e

  This results in the direct_url getting set correctly, and the size of
  the image is correctly determined, but the image remains in "queued"
  status. It should become "active".

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1750892/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750884] [NEW] [2.4, bionic] /etc/resolv.conf not configured correctly in Bionic, leads to no DNS resolution

2018-02-21 Thread Andres Rodriguez
Public bug reported:

When deploying Bionic, /etc/resolv.conf is not configured correctly,
which leads to no DNS resolution. In the output below, you will see that
netplan config is correctly to the 10.90.90.1 nameserver, but in
resolv.conf that's a local address.

Resolv.conf should really be configured to use the provided DNS
server(s). That said, despite that fact, DNS resolution doesn't work
with the local address.

Bionic
--

ubuntu@node01:~$ cat /etc/netplan/50-cloud-init.yaml
# This file is generated from information provided by
# the datasource.  Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
version: 2
ethernets:
enp0s25:
match:
macaddress: b8:ae:ed:7d:17:d2
mtu: 1500
nameservers:
addresses:
- 10.90.90.1
search:
- maaslab
- maas
set-name: enp0s25
bridges:
br0:
addresses:
- 10.90.90.3/24
gateway4: 10.90.90.1
interfaces:
- enp0s25
parameters:
forward-delay: 15
stp: false
ubuntu@node01:~$ cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.
nameserver 127.0.0.53

search maaslab maas
ubuntu@node01:~$ ping google.com
ping: google.com: Temporary failure in name resolution

[...]

ubuntu@node01:~$ sudo vim /etc/resolv.conf
ubuntu@node01:~$ cat /etc/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.
nameserver 10.90.90.1

search maaslab maas
ubuntu@node01:~$ ping google.com
PING google.com (172.217.0.174) 56(84) bytes of data.
64 bytes from mia09s16-in-f14.1e100.net (172.217.0.174): icmp_seq=1 ttl=52 
time=4.46 ms
64 bytes from mia09s16-in-f14.1e100.net (172.217.0.174): icmp_seq=2 ttl=52 
time=4.38 ms

=
Xenial
==

ubuntu@node05:~$ cat /etc/network/interfaces.d/50-cloud-init.cfg
# This file is generated from information provided by
# the datasource.  Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
auto lo
iface lo inet loopback
dns-nameservers 10.90.90.1
dns-search maaslab maas

auto enp0s25
iface enp0s25 inet static
address 10.90.90.162/24
gateway 10.90.90.1
mtu 1500
ubuntu@node05:~$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.90.90.1
search maaslab maas

** Affects: cloud-init
 Importance: Undecided
 Status: New

** Affects: maas
 Importance: Undecided
 Status: Invalid

** Affects: nplan (Ubuntu)
 Importance: Critical
 Status: New

** Affects: systemd (Ubuntu)
 Importance: Critical
 Status: New

** Also affects: cloud-init
   Importance: Undecided
   Status: New

** Also affects: nplan (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: systemd (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: nplan (Ubuntu)
   Importance: Undecided => Critical

** Changed in: systemd (Ubuntu)
   Importance: Undecided => Critical

** Changed in: maas
   Status: New => Incomplete

** Changed in: maas
   Status: Incomplete => Invalid

** Description changed:

  When deploying Bionic, /etc/resolv.conf is not configured correctly,
  which leads to no DNS resolution. In the output below, you will see that
  netplan config is correctly to the 10.90.90.1 nameserver, but in
  resolv.conf that's a local address.
  
+ Resolv.conf should really be configured to use the provided DNS
+ server(s)
  
  Bionic
  --
  
  ubuntu@node01:~$ cat /etc/netplan/50-cloud-init.yaml
  # This file is generated from information provided by
  # the datasource.  Changes to it will not persist across an instance.
  # To disable cloud-init's network configuration capabilities, write a file
  # /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
  # network: {config: disabled}
  network:
- version: 2
- ethernets:
- enp0s25:
- match:
- macaddress: b8:ae:ed:7d:17:d2
- mtu: 1500
- nameservers:
- addresses:
- - 10.90.90.1
- search:
- 

[Yahoo-eng-team] [Bug 1742963] Re: Cannot boot VM with Contrail SDN controller

2018-02-21 Thread melanie witt
** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova/queens
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1742963

Title:
  Cannot boot VM with Contrail SDN controller

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  New

Bug description:
  On the master branch, nova-compute fails to create vif on the Contrail 
vrouter compute agent and the instance fails to spawn:
  ...
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
six.reraise(self.type_, self.value, self.tb)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/opt/stack/openstack/nova/nova/virt/libvirt/driver.py", line 5238, 
  in _create_domain_and_network
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
self.plug_vifs(instance, network_info)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/opt/stack/openstack/nova/nova/virt/libvirt/driver.py", line 755, in plug_vifs
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
self.vif_driver.plug(instance, vif)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/opt/stack/openstack/nova/nova/virt/libvirt/vif.py", line 769, in p
  lug
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
func(instance, vif)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/opt/stack/openstack/nova/nova/virt/libvirt/vif.py", line 727, in plug_vrouter
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
instance, vif, ip_addr, ip6_addr, ptype)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/oslo_privsep/priv_context.py", line 
207, in _wrap
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
return self.channel.remote_call(name, args, kwargs)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/oslo_privsep/daemon.py", line 192, in 
remote_call
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
result = self.send_recv((Message.CALL.value, name, args, kwargs))
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/oslo_privsep/comm.py", line 163, in 
send_recv
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
self.writer.send((myid, msg))
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/oslo_privsep/comm.py", line 
  54, in send
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] buf = 
msgpack.packb(msg, use_bin_type=True)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"/usr/local/lib/python2.7/dist-packages/msgpack/__init__.py", line 47, in packb
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c] 
return Packer(**kwargs).pack(o)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 8748627c-e31f-4c90-83e2-16abdf9c1e2c]   File 
"msgpack/_packer.pyx", line 231, in msgpack._packer.Packer.pack (msg
  pack/_packer.cpp:3661)
  Jan 12 15:05:59 ethuleau-contrail-master nova-compute[5512]: ERROR 
nova.compute.manager [instance: 

[Yahoo-eng-team] [Bug 1750084] Re: Report client associations include non-sharing providers

2018-02-21 Thread melanie witt
** Changed in: nova
   Importance: Undecided => High

** Also affects: nova/queens
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750084

Title:
  Report client associations include non-sharing providers

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) queens series:
  New

Bug description:
  It was discussed and decided [1] that we only want to be pulling down,
  caching, and passing to update_provider_tree providers associated via
  aggregate with the compute node's provider tree if they are sharing
  providers.  Otherwise we'll get e.g. all the *other* compute nodes
  which are also associated with a sharing provider.

  [1] https://review.openstack.org/#/c/540111/4/specs/rocky/approved
  /update-provider-tree.rst@48

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750084/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750890] [NEW] Neutron db performance at scale

2018-02-21 Thread Leon Zachery
Public bug reported:

OpenStack Neutron (like OpenStack) relies on SQL Alcehmy and its ORM for
database support.  From our observations, Neutron is not utilizing the
ORM models directly, but rather inserting an additional model layer
above SQLAlchmeny and manually building these models from a number of
underlying DB models.  We ran into significant performance issues due to
the increased number of queries at large scale. 

For ports the problem starts here
https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_common.py#L202-L219.
The base dict is built from a single DB query row and then the
processing of all extensions (which is the default behaviour) leads to a
sequential series of additional queries per row to augment the dict. In
our opinion, this causes issues from a performance perspective, it leads
to the classic n+1 query anti-pattern and fundamentally does not scale
(an alternate option would be to do a “joined” query with active
extensions).  This illustrates the type of workarounds that result from
this approach
https://github.com/openstack/neutron/blob/master/neutron/db/_utils.py#L95-L107.
Instead of using native SQL to filter fields from the result the whole
result reset has to be iterated to filter out fields, again surely this
is an anti-pattern when processing DB objects.

With respect to LBaaS support, we removed the intermediate model layer
with this (and a couple of previous) commit(s) https://github.com/sapcc
/neutron-lbaas/commit/f71867fbf6c8a27df43aaff6046948dce60f3081.  This is
just an interim change but after implementing this we saw LBAAS API
requests going from > 1-5 minutes and degrading with # of objects to a
consistent sub second response time.

Version:
This is/should be present in all versions, but our testing has been done in 
Mitaka and above.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1750890

Title:
  Neutron db performance at scale

Status in neutron:
  New

Bug description:
  OpenStack Neutron (like OpenStack) relies on SQL Alcehmy and its ORM
  for database support.  From our observations, Neutron is not utilizing
  the ORM models directly, but rather inserting an additional model
  layer above SQLAlchmeny and manually building these models from a
  number of underlying DB models.  We ran into significant performance
  issues due to the increased number of queries at large scale. 

  For ports the problem starts here
  
https://github.com/openstack/neutron/blob/master/neutron/db/db_base_plugin_common.py#L202-L219.
  The base dict is built from a single DB query row and then the
  processing of all extensions (which is the default behaviour) leads to
  a sequential series of additional queries per row to augment the dict.
  In our opinion, this causes issues from a performance perspective, it
  leads to the classic n+1 query anti-pattern and fundamentally does not
  scale (an alternate option would be to do a “joined” query with active
  extensions).  This illustrates the type of workarounds that result
  from this approach
  
https://github.com/openstack/neutron/blob/master/neutron/db/_utils.py#L95-L107.
  Instead of using native SQL to filter fields from the result the whole
  result reset has to be iterated to filter out fields, again surely
  this is an anti-pattern when processing DB objects.

  With respect to LBaaS support, we removed the intermediate model layer
  with this (and a couple of previous) commit(s)
  https://github.com/sapcc/neutron-
  lbaas/commit/f71867fbf6c8a27df43aaff6046948dce60f3081.  This is just
  an interim change but after implementing this we saw LBAAS API
  requests going from > 1-5 minutes and degrading with # of objects to a
  consistent sub second response time.

  Version:
  This is/should be present in all versions, but our testing has been done in 
Mitaka and above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1750890/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750705] Re: glance db_sync requires mysql db to have log_bin_trust_function_creators = 1

2018-02-21 Thread James Page
** Also affects: charm-percona-cluster
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1750705

Title:
  glance db_sync requires mysql db to have
  log_bin_trust_function_creators = 1

Status in OpenStack percona-cluster charm:
  New
Status in Glance:
  New
Status in glance package in Ubuntu:
  New

Bug description:
  Upon deploying glance via cs:~openstack-charmers-next/xenial/glance
  glance appears to throw a CRIT unhandled error, so far I have
  experienced this on arm64. Not sure about other archs at this point in
  time. Decided to bug and will investigate further.

  Cloud- xenial-queens/proposed

  This occurs when the the shared-db-relation hook fires for mysql
  :shared-db.

  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed CRITI 
[glance] Unhandled error
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
Traceback (most recent call last):
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/bin/glance-manage", line 10, in 
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
sys.exit(main())
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 528, in main
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
return CONF.command.action_fn()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 360, in sync
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
self.command_object.sync(CONF.command.version)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 153, in sync
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
self.expand()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 208, in expand
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
self._sync(version=expand_head)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 168, in _sync
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
alembic_command.upgrade(a_config, version)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/command.py", line 254, in upgrade
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
script.run_env()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/script/base.py", line 425, in run_env
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
util.load_python_file(self.dir, 'env.py')
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 93, in 
load_python_file
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
module = load_module_py(module_id, path)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/util/compat.py", line 75, in 
load_module_py
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
mod = imp.load_source(module_id, path, fp)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/db/sqlalchemy/alembic_migrations/env.py",
 line 88, in 
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
run_migrations_online()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/db/sqlalchemy/alembic_migrations/env.py",
 line 83, in run_migrations_online
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
context.run_migrations()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"", line 8, in run_migrations
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/runtime/environment.py", line 836, in 
run_migrations
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
self.get_context().run_migrations(**kw)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/runtime/migration.py", line 330, in 
run_migrations
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
step.migration_fn(**kw)
  

[Yahoo-eng-team] [Bug 1750705] Re: glance CRITI Unhandled error DBError Duplicate column name

2018-02-21 Thread James Page
Need to tie down the exact requirements but:

  set global log_bin_trust_function_creators = 1;

and then dropping and recreating the DB resolves this issue; apparently
creation of triggers which include 'unsafe' functions requires this
setting.


** Project changed: charm-glance => glance (Ubuntu)

** Summary changed:

- glance CRITI Unhandled error DBError Duplicate column name
+ glance db_sync requires mysql db to have log_bin_trust_function_creators = 1

** Also affects: glance
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1750705

Title:
  glance db_sync requires mysql db to have
  log_bin_trust_function_creators = 1

Status in Glance:
  New
Status in glance package in Ubuntu:
  New

Bug description:
  Upon deploying glance via cs:~openstack-charmers-next/xenial/glance
  glance appears to throw a CRIT unhandled error, so far I have
  experienced this on arm64. Not sure about other archs at this point in
  time. Decided to bug and will investigate further.

  Cloud- xenial-queens/proposed

  This occurs when the the shared-db-relation hook fires for mysql
  :shared-db.

  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed CRITI 
[glance] Unhandled error
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
Traceback (most recent call last):
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/bin/glance-manage", line 10, in 
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
sys.exit(main())
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 528, in main
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
return CONF.command.action_fn()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 360, in sync
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
self.command_object.sync(CONF.command.version)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 153, in sync
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
self.expand()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 208, in expand
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
self._sync(version=expand_head)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/cmd/manage.py", line 168, in _sync
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
alembic_command.upgrade(a_config, version)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/command.py", line 254, in upgrade
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
script.run_env()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/script/base.py", line 425, in run_env
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
util.load_python_file(self.dir, 'env.py')
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 93, in 
load_python_file
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
module = load_module_py(module_id, path)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/util/compat.py", line 75, in 
load_module_py
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
mod = imp.load_source(module_id, path, fp)
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/db/sqlalchemy/alembic_migrations/env.py",
 line 88, in 
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
run_migrations_online()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/glance/db/sqlalchemy/alembic_migrations/env.py",
 line 83, in run_migrations_online
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed 
context.run_migrations()
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"", line 8, in run_migrations
  unit-glance-0: 01:28:22 DEBUG unit.glance/0.shared-db-relation-changed   File 
"/usr/lib/python2.7/dist-packages/alembic/runtime/environment.py", line 836, in 
run_migrations
  

[Yahoo-eng-team] [Bug 1749397] Re: In Verify operation of the Identity service 1st step is not required as the file /etc/keystone/keystone-paste.ini doesn't contain admin_auth_token

2018-02-21 Thread Gage Hugo
*** This bug is a duplicate of bug 1716797 ***
https://bugs.launchpad.net/bugs/1716797

Looks like this was reported right after
https://bugs.launchpad.net/keystone/+bug/1716797 was committed.

** This bug has been marked a duplicate of bug 1716797
   Verify operation in keystone: step 1 has already been done

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1749397

Title:
  In Verify operation of the Identity service 1st step is not required
  as the file /etc/keystone/keystone-paste.ini doesn't contain
  admin_auth_token

Status in OpenStack Identity (keystone):
  New

Bug description:
  In Verify operation of the Identity service 1st step is not required
  as the file /etc/keystone/keystone-paste.ini doesn't contain
  admin_auth_token in the sections [pipeline:public_api],
  [pipeline:admin_api], and [pipeline:api_v3].

  https://docs.openstack.org/keystone/pike/install/keystone-verify-
  ubuntu.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1749397/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1727260] Re: Nova assumes that a volume is fully detached from the compute if the volume is not defined in the instance's libvirt definition

2018-02-21 Thread Matt Riedemann
** Also affects: nova/pike
   Importance: Undecided
   Status: New

** Also affects: nova/queens
   Importance: Undecided
   Status: New

** Changed in: nova/pike
   Status: New => Confirmed

** Changed in: nova/queens
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1727260

Title:
   Nova assumes that a volume is fully detached from the compute if the
  volume is not defined in the instance's libvirt definition

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) pike series:
  Confirmed
Status in OpenStack Compute (nova) queens series:
  In Progress

Bug description:
  During a volume detach operation, Nova compute attempts to remove the
  volume from libvirt for the instance before proceeding to remove the
  storage lun from the underlying compute host. If Nova discovers that
  the volume was not found in the instance's libvirt definition then it
  ignores that error condition and returns (after issuing a warning
  message "Ignoring DiskNotFound exception while detaching").

  However, under certain failure scenarios it may be that although the
  libvirt definition for the volume has been removed for the instance
  that the associated storage lun on the compute server may not have
  been fully cleaned up yet.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1727260/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1742505] Re: gre_sys set to default 1472 when using path_mtu > 1500 with ovs 2.8.x

2018-02-21 Thread Launchpad Bug Tracker
This bug was fixed in the package openvswitch - 2.8.1-0ubuntu0.17.10.2

---
openvswitch (2.8.1-0ubuntu0.17.10.2) artful; urgency=medium

  * d/p/dpif-kernel-gre-mtu-workaround.patch,
d/p/dpif-netlink-rtnl-Use-65000-instead-of-65535-as-tunnel-MTU.patch:
Cherry pick in-flight fixes for workaround to correctly set MTU
of GRE devices via netlink (LP: #1742505).

openvswitch (2.8.1-0ubuntu0.17.10.1) artful; urgency=medium

  * New upstream stable release (LP: #1724622).

 -- James Page   Sat, 20 Jan 2018 10:22:31 +

** Changed in: openvswitch (Ubuntu Artful)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1742505

Title:
  gre_sys set to default 1472 when using path_mtu > 1500 with ovs 2.8.x

Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive pike series:
  Fix Committed
Status in Ubuntu Cloud Archive queens series:
  Fix Committed
Status in neutron:
  Invalid
Status in linux package in Ubuntu:
  Confirmed
Status in openvswitch package in Ubuntu:
  Fix Released
Status in linux source package in Artful:
  Confirmed
Status in openvswitch source package in Artful:
  Fix Released
Status in linux source package in Bionic:
  Confirmed
Status in openvswitch source package in Bionic:
  Fix Released

Bug description:
  [Impact]
  OpenStack Clouds using GRE overlay tunnels with > 1500 MTU's will observe 
packet fragmentation/networking issues for traffic in overlay networks.

  [Test Case]
  Deploy OpenStack Pike (xenial + pike UCA or artful)
  Create tenant networks using GRE segmentation
  Boot instances
  Instance networking will be broken/slow

  gre_sys devices will be set to mtu=1472 on hypervisor hosts.

  [Regression Potential]
  Minimal; the fix to OVS works around an issue for GRE tunnel port setup via 
rtnetlink by performing a second request once the gre device is setup to set 
the MTU to a high value (65000).

  
  [Original Bug Report]
  Setup:
  Pike neutron 11.0.2-0ubuntu1.1~cloud0
  OVS 2.8.0
  Jumbo frames setttings per: 
https://docs.openstack.org/mitaka/networking-guide/config-mtu.html
  global_physnet_mtu = 9000
  path_mtu = 9000

  Symptoms:
  gre_sys MTU is 1472
  Instances with MTUs > 1500 fail to communicate across GRE

  Temporary Workaround:
  ifconfig gre_sys MTU 9000
  Note: When ovs rebuilds tunnels, such as on a restart, gre_sys MTU is set 
back to default 1472.

  Note: downgrading from OVS 2.8.0 to 2.6.1 resolves the issue.

  Previous behavior:
  With Ocata or Pike and OVS 2.6.x
  gre_sys MTU defaults to 65490
  It remains at 65490 through restarts.

  This may be related to some combination of the following changes in OVS which 
seem to imply MTUs must be set in the ovs database for tunnel interfaces and 
patches:
  
https://github.com/openvswitch/ovs/commit/8c319e8b73032e06c7dd1832b3b31f8a1189dcd1
  
https://github.com/openvswitch/ovs/commit/3a414a0a4f1901ba015ec80b917b9fb206f3c74f
  
https://github.com/openvswitch/ovs/blob/6355db7f447c8e83efbd4971cca9265f5e0c8531/datapath/vport-internal_dev.c#L186

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1742505/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750829] [NEW] RFE: libvirt: Add ability to configure extra CPU flags for named CPU models

2018-02-21 Thread Kashyap Chamarthy
Public bug reported:

Motivation
--

The recent "Meltdown" CVE fixes resulted in criticial performance
penalty, From here[*]:

[...] However, in examining both the various fixes rolled out in
actual Linux distros over the past few days and doing some very
informal surveying of environments I have access to, I discovered
that the PCID ["process-context identifiers"] processor feature,
which used to be a virtual no-op, is now a performance AND security
critical item.[...]

So if a Nova user has applied all the "Meltdown" CVE fixes, and is using
a named CPU model (like "IvyBridge", or "Westmere" — which specifically
lack the said obscure "PCID" feature) they will incur severe performance
degradation[*].

Note that some of Intel *physical* CPUs themselves include the 'pcid'
CPU feature flag; but the named CPU models provided by libvirt & QEMU
lack that flag — hence we explicitly specify it for virtual CPUs via the
following proposed config attribute.

[*] https://groups.google.com/forum/m/#!topic/mechanical-
sympathy/L9mHTbeQLNU

Proposed change
---

Modify Nova's libvirt driver such that it will be possible to set
granular CPU feature flags for named CPU models.  E.g. to explicitly
specify the 'pcid' feature flag with Intel IvyBridge CPU model, set the
following in /etc/nova.conf:

...
[libvirt]
cpu_model=IvyBridge
cpu_model_extra_flags="pcid"
...

The list of known CPU feature flags ('vmx', 'xtpr', 'pcid', et cetera)
can be found in /usr/share/libvirt/cpu_map.xml.

Note that before specifying extra CPU feature flags, one should check if
the named CPU models (provided by libvirt) already include the said
flags.  E.g. the 'Broadwell', 'Haswell-noTSX' named CPU models provided
by libvirt already provides the 'pcid' CPU feature flag.

Other use cases
---

  - Nested Virtualization — an operator can specify the Intel 'vmx' or
AMD 'svm' flags in the level-1 guest (i.e. the guest hypervisor)

  - Ability to use 1GB huge pages with Haswell model as one use case for
extra flags (thanks: Daniel Berrangé, for mentioning this scenario):

cpu_model_extra_flags=Haswell
cpu_model_extra_flags="pdpe1gb"

** Affects: nova
 Importance: Undecided
 Assignee: Kashyap Chamarthy (kashyapc)
 Status: In Progress


** Tags: libvirt

** Tags added: libvirt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750829

Title:
  RFE: libvirt: Add ability to configure extra CPU flags for named CPU
  models

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Motivation
  --

  The recent "Meltdown" CVE fixes resulted in criticial performance
  penalty, From here[*]:

  [...] However, in examining both the various fixes rolled out in
  actual Linux distros over the past few days and doing some very
  informal surveying of environments I have access to, I discovered
  that the PCID ["process-context identifiers"] processor feature,
  which used to be a virtual no-op, is now a performance AND security
  critical item.[...]

  So if a Nova user has applied all the "Meltdown" CVE fixes, and is using
  a named CPU model (like "IvyBridge", or "Westmere" — which specifically
  lack the said obscure "PCID" feature) they will incur severe performance
  degradation[*].

  Note that some of Intel *physical* CPUs themselves include the 'pcid'
  CPU feature flag; but the named CPU models provided by libvirt & QEMU
  lack that flag — hence we explicitly specify it for virtual CPUs via the
  following proposed config attribute.

  [*] https://groups.google.com/forum/m/#!topic/mechanical-
  sympathy/L9mHTbeQLNU

  Proposed change
  ---

  Modify Nova's libvirt driver such that it will be possible to set
  granular CPU feature flags for named CPU models.  E.g. to explicitly
  specify the 'pcid' feature flag with Intel IvyBridge CPU model, set the
  following in /etc/nova.conf:

  ...
  [libvirt]
  cpu_model=IvyBridge
  cpu_model_extra_flags="pcid"
  ...

  The list of known CPU feature flags ('vmx', 'xtpr', 'pcid', et cetera)
  can be found in /usr/share/libvirt/cpu_map.xml.

  Note that before specifying extra CPU feature flags, one should check if
  the named CPU models (provided by libvirt) already include the said
  flags.  E.g. the 'Broadwell', 'Haswell-noTSX' named CPU models provided
  by libvirt already provides the 'pcid' CPU feature flag.

  Other use cases
  ---

- Nested Virtualization — an operator can specify the Intel 'vmx' or
  AMD 'svm' flags in the level-1 guest (i.e. the guest hypervisor)

- Ability to use 1GB huge pages with Haswell model as one use case for
  extra flags (thanks: Daniel Berrangé, for mentioning this scenario):

  cpu_model_extra_flags=Haswell
  

[Yahoo-eng-team] [Bug 1591971] Re: Glance task creates failed when setting work_dir local and qemu-img version is 1.5.3

2018-02-21 Thread Brian Rosmaita
Looks like this was fixed by configuration, closing.

** Changed in: glance
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1591971

Title:
  Glance task creates failed when setting work_dir local and qemu-img
  version is 1.5.3

Status in Glance:
  Invalid

Bug description:
  The openstack version is mitaka.

  # rpm -qa |grep qemu-img
  qemu-img-1.5.3-105.el7_2.4.x86_64

  The glance-api.conf setting is:
  [task]
  work_dir = /home/work/
  [taskflow_executor]
  conversion_format = raw

  Then run the cli:
  glance  task-create --type import --input 
'{"import_from":"http://10.43.177.17/cirros-0.3.2-x86_64-disk.img","import_from_format":
 
"","image_properties":{"disk_format":"qcow2","container_format":"bare","name":"test1"}}'

  The log is :
  2016-06-14 04:08:29.032 DEBUG oslo_concurrency.processutils [-] CMD "qemu-img 
info --output=json file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d" 
returned: 1 in 0.025s from (pid=5460) execute 
/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:374
  2016-06-14 04:08:29.033 DEBUG oslo_concurrency.processutils [-] None
  command: u'qemu-img info --output=json 
file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d'
  exit code: 1
  stdout: u''
  stderr: u"qemu-img: Could not open 
'file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d': Could not open 
'file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d': No such file or 
directory\n" from (pid=5460) execute 
/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:413
  2016-06-14 04:08:29.034 DEBUG oslo_concurrency.processutils [-] u'qemu-img 
info --output=json file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d' 
failed. Not Retrying. from (pid=5460) execute 
/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:422
  Command: qemu-img info --output=json 
file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d
  Exit code: 1
  Stdout: u''
  Stderr: u"qemu-img: Could not open 
'file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d': Could not open 
'file:///home/work/90ff2129-0079-487e-a7ec-79ef23bd1c0d': No such file or 
directory\n"
  2016-06-14 04:08:29.072 WARNING glance.async.taskflow_executor [-] Task 
'import-ImportToFS-42684807-86db-4ff5-a4a9-abf3b1998b63' 
(5ff9cf63-f257-48d2-9cc9-cfeffd905854) transitioned into state 'FAILURE' from 
state 'RUNNING'
  4 predecessors (most recent first):
    Flow 'import'
    |__Atom 'import-CreateImage-42684807-86db-4ff5-a4a9-abf3b1998b63' 
{'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {}, 'provides': 
'90ff2129-0079-487e-a7ec-79ef23bd1c0d'}
   |__Atom 'import_retry' {'intention': 'EXECUTE', 'state': 'SUCCESS', 
'requires': {}, 'provides': [(None, {})]}
  |__Flow 'import'
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor Traceback (most 
recent call last):
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor   File 
"/usr/lib/python2.7/site-packages/taskflow/engines/action_engine/executor.py", 
line 82, in _execute_task
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor result = 
task.execute(**arguments)
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor   File 
"/opt/stack/glance/glance/async/flows/base_import.py", line 175, in execute
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor metadata = 
json.loads(stdout)
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor   File 
"/usr/lib64/python2.7/json/__init__.py", line 338, in loads
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor return 
_default_decoder.decode(s)
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor   File 
"/usr/lib64/python2.7/json/decoder.py", line 365, in decode
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor obj, end = 
self.raw_decode(s, idx=_w(s, 0).end())
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor   File 
"/usr/lib64/python2.7/json/decoder.py", line 383, in raw_decode
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor raise 
ValueError("No JSON object could be decoded")
  2016-06-14 04:08:29.072 TRACE glance.async.taskflow_executor ValueError: No 
JSON object could be decoded

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1591971/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1620363] Re: report nginx or non-eventlet based strategies for deploying glance

2018-02-21 Thread Brian Rosmaita
This has been addressed by
https://docs.openstack.org/glance/latest/admin/apache-httpd.html , which
was first merged back in Pike.

Of course it's always possible to improve the docs, so annakoppad feel
free to put up improvement patches if you are still interested in this.


** Changed in: glance
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1620363

Title:
  report nginx or non-eventlet based strategies for deploying glance

Status in Glance:
  Fix Released

Bug description:
  Recently, more than a few people have asked about different ways a
  glance service can be deployed:

* eventlet based
* nginx based
* other ways (like say using repose)

  It would be good to document this in our developer docs or specs and
  create a FAQ page in our launchpad project page for people to refer
  and then discuss further.

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1620363/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750555] Re: Revisit database rolling upgrade documentation

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/546172
Committed: 
https://git.openstack.org/cgit/openstack/glance/commit/?id=d500b3f883e94a3b82f313bffe6dbeb08d7ee1e4
Submitter: Zuul
Branch:master

commit d500b3f883e94a3b82f313bffe6dbeb08d7ee1e4
Author: Abhishek Kekane 
Date:   Tue Feb 20 14:47:43 2018 +

Revise database rolling upgrade documentation

- mark zero-downtime-db-upgrade as EXPERIMENTAL for queens
- clarify the relation between the E-M-C strategy and
  zero-downtime db upgrades
- add note that for MySQL, using the glance-manage expand or
  glance-manage contract command requires that the glance
  is granted SUPER privileges
- add note to contributor docs about checking the trigger
  flag in expand and contract scripts

Co-authored-by: Abhishek Kekane 
Co-authored-by: Brian Rosmaita 

Change-Id: I5af4a1428b89ecb05a1be9c420c5f0afc05b9a95
Closes-Bug: #1750555


** Changed in: glance
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1750555

Title:
  Revisit database rolling upgrade documentation

Status in Glance:
  Fix Released

Bug description:
  Since db_sync is now internally using EMC pattern we need to revisit
  the entire database rolling upgrades documentation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1750555/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1748900] Re: api-ref: value of custom property not limited to 255 chars

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/546021
Committed: 
https://git.openstack.org/cgit/openstack/glance/commit/?id=601f82ac24038a40dc48579fc3928b6e0f0373bf
Submitter: Zuul
Branch:master

commit 601f82ac24038a40dc48579fc3928b6e0f0373bf
Author: Brian Rosmaita 
Date:   Mon Feb 19 22:11:44 2018 -0500

Correct length limit for custom property value

The api-ref states that the both the key and value of a custom
property are limited to 255 chars.  This limit applies only to
the key.

Change-Id: I3bacca8b25f2a8339f6d8758e45c690da9968555
Closes-bug: #1748900


** Changed in: glance
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1748900

Title:
  api-ref: value of custom property not limited to 255 chars

Status in Glance:
  Fix Released

Bug description:
  - [x] This doc is inaccurate in this way: __

  https://developer.openstack.org/api-ref/image/v2/index.html#create-an-
  image

  Where it says:

  "Additionally, you may include additional properties specified as
  key:value pairs, where the value must be a string data type. Keys and
  values are limited to 255 chars in length. Available key names may be
  limited by the cloud’s property protection configuration."

  The 255 char length restriction is only for keys, not values:

  
https://github.com/openstack/glance/blob/265659e8c34865331568b069fdb27ea272df4eaa/glance/db/sqlalchemy/models.py#L158

  ---
  Release: 16.0.0.0rc2.dev10 on 'Sat Feb 10 21:15:25 2018, commit 262e61a'
  SHA: 
  Source: 
https://git.openstack.org/cgit/openstack/glance/tree/api-ref/source/v2/index.rst
  URL: https://developer.openstack.org/api-ref/image/v2/index.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1748900/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1749788] Re: image import: uri filtering conf opts help text needs revision

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/546020
Committed: 
https://git.openstack.org/cgit/openstack/glance/commit/?id=d289d0d17b4e2ace80c74be80d768a3820a9da62
Submitter: Zuul
Branch:master

commit d289d0d17b4e2ace80c74be80d768a3820a9da62
Author: Brian Rosmaita 
Date:   Mon Feb 19 21:55:16 2018 -0500

Revise help text for uri filtering options

Clarify the help text and clean up some log messages.  Includes
the regenerated glance-image-import.conf.sample file.

Change-Id: I7f9087aaf9c6969e15f63029cc38fe5a0939ad40
Closes-bug: #1749788


** Changed in: glance
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1749788

Title:
  image import: uri filtering conf opts help text needs revision

Status in Glance:
  Fix Released

Bug description:
  The six whitelist/blacklist options are not easy to explain
  individually, so the help text could use a revision.  See the original
  patch for some questions people had, and see Sean's comments on the
  cherry-pick patch for some stylistic stuff that should also be
  corrected.

  
  https://review.openstack.org/#/q/Ide5ace8979bb12239c99a312747b3151c1e64ce8

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1749788/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1748229] Re: revise api-ref: add info about web-download import-method

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/545629
Committed: 
https://git.openstack.org/cgit/openstack/glance/commit/?id=4cf65d57952cb5a85973822cf9d27fe97eba18b4
Submitter: Zuul
Branch:master

commit 4cf65d57952cb5a85973822cf9d27fe97eba18b4
Author: Brian Rosmaita 
Date:   Sat Feb 17 16:29:51 2018 -0500

api-ref: update interoperable image import info

Generalizes the discussion to include the new web-download import
method and includes a new sample import request.

Change-Id: Icb6cd920f31c6e8e4eecf17880dd3244e5d1a61b
Closes-bug: #1748229


** Changed in: glance
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1748229

Title:
  revise api-ref: add info about web-download import-method

Status in Glance:
  Fix Released

Bug description:
  Need to describe the web-download workflow.  See TODOs in api-
  ref/source/v2/images-import.inc

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1748229/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750811] [NEW] resize command should work without ssh

2018-02-21 Thread Lars
Public bug reported:

Description
===
VM resizing should work without ssh connections between compute nodes if they 
have the same shared storage.

One option is to pass an additional argument if it's a resize on a
shared storage. The same functionality is already implemented for vm
migrations.


Expected result
===
Resizing on shared storage works without ssh connections between compute nodes.


Actual result
=
Resizing fails, because it requires an ssh connection between the source and 
target host.


Environment
===
1. Openstack version:
nova-api 2:16.0.3-0ubuntu1~cloud0
nova-common 2:16.0.3-0ubuntu1~cloud0
nova-conductor 2:16.0.3-0ubuntu1~cloud0
nova-consoleauth 2:16.0.3-0ubuntu1~cloud0
nova-novncproxy 2:16.0.3-0ubuntu1~cloud0
nova-placement-api 2:16.0.3-0ubuntu1~cloud0
nova-scheduler 2:16.0.3-0ubuntu1~cloud0
python-nova 2:16.0.3-0ubuntu1~cloud0
python-novaclient 2:9.1.0-0ubuntu1~cloud0

2. Hypervisor:
Libvirt + KVM

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750811

Title:
  resize command should work without ssh

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  VM resizing should work without ssh connections between compute nodes if they 
have the same shared storage.

  One option is to pass an additional argument if it's a resize on a
  shared storage. The same functionality is already implemented for vm
  migrations.

  
  Expected result
  ===
  Resizing on shared storage works without ssh connections between compute 
nodes.

  
  Actual result
  =
  Resizing fails, because it requires an ssh connection between the source and 
target host.


  Environment
  ===
  1. Openstack version:
  nova-api 2:16.0.3-0ubuntu1~cloud0
  nova-common 2:16.0.3-0ubuntu1~cloud0
  nova-conductor 2:16.0.3-0ubuntu1~cloud0
  nova-consoleauth 2:16.0.3-0ubuntu1~cloud0
  nova-novncproxy 2:16.0.3-0ubuntu1~cloud0
  nova-placement-api 2:16.0.3-0ubuntu1~cloud0
  nova-scheduler 2:16.0.3-0ubuntu1~cloud0
  python-nova 2:16.0.3-0ubuntu1~cloud0
  python-novaclient 2:9.1.0-0ubuntu1~cloud0

  2. Hypervisor:
  Libvirt + KVM

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750811/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750790] [NEW] resources on target host aren't released if resize fails

2018-02-21 Thread Lars
Public bug reported:

Description
===
If we try to resize a host to a new flavor and resize fails due to missing ssh 
host-keys. Then the resources aren't released on the target host.

Steps to reproduce
==
1. Check output from placement API

Output from placement API before resize command:

HOST A:
{
"resource_provider_generation": 997,
"usages": {
"DISK_GB": 0,
"MEMORY_MB": 495616,
"VCPU": 84
}
}

HOST B:
{
"resource_provider_generation": 33,
"usages": {
"DISK_GB": 0,
"MEMORY_MB": 221184,
"VCPU": 40
}
}

2. Try to resize host and check resources from placement API:

This is the output after resize (to flavor with 24GB and 12CPUs) failed:
HOST B:
{
"resource_provider_generation": 33,
"usages": {
"DISK_GB": 0,
"MEMORY_MB": 245760,
"VCPU": 52
}
}

3. Delete VM an check resources again
After deleting the VM the resources have been released (on source and target 
host).

Expected result
===
If resizing fails resources must be released on target host.

Actual result
=
Resources aren't released on target host.


Environment
===
1. Openstack version:
nova-api 2:16.0.3-0ubuntu1~cloud0
nova-common 2:16.0.3-0ubuntu1~cloud0
nova-conductor 2:16.0.3-0ubuntu1~cloud0
nova-consoleauth 2:16.0.3-0ubuntu1~cloud0
nova-novncproxy 2:16.0.3-0ubuntu1~cloud0
nova-placement-api 2:16.0.3-0ubuntu1~cloud0
nova-scheduler 2:16.0.3-0ubuntu1~cloud0
python-nova 2:16.0.3-0ubuntu1~cloud0
python-novaclient 2:9.1.0-0ubuntu1~cloud0


2. Hypervisor:
Libvirt + KVM

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750790

Title:
  resources on target host aren't released if resize fails

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  If we try to resize a host to a new flavor and resize fails due to missing 
ssh host-keys. Then the resources aren't released on the target host.

  Steps to reproduce
  ==
  1. Check output from placement API

  Output from placement API before resize command:

  HOST A:
  {
  "resource_provider_generation": 997,
  "usages": {
  "DISK_GB": 0,
  "MEMORY_MB": 495616,
  "VCPU": 84
  }
  }

  HOST B:
  {
  "resource_provider_generation": 33,
  "usages": {
  "DISK_GB": 0,
  "MEMORY_MB": 221184,
  "VCPU": 40
  }
  }

  2. Try to resize host and check resources from placement API:

  This is the output after resize (to flavor with 24GB and 12CPUs) failed:
  HOST B:
  {
  "resource_provider_generation": 33,
  "usages": {
  "DISK_GB": 0,
  "MEMORY_MB": 245760,
  "VCPU": 52
  }
  }

  3. Delete VM an check resources again
  After deleting the VM the resources have been released (on source and target 
host).

  Expected result
  ===
  If resizing fails resources must be released on target host.

  Actual result
  =
  Resources aren't released on target host.


  Environment
  ===
  1. Openstack version:
  nova-api 2:16.0.3-0ubuntu1~cloud0
  nova-common 2:16.0.3-0ubuntu1~cloud0
  nova-conductor 2:16.0.3-0ubuntu1~cloud0
  nova-consoleauth 2:16.0.3-0ubuntu1~cloud0
  nova-novncproxy 2:16.0.3-0ubuntu1~cloud0
  nova-placement-api 2:16.0.3-0ubuntu1~cloud0
  nova-scheduler 2:16.0.3-0ubuntu1~cloud0
  python-nova 2:16.0.3-0ubuntu1~cloud0
  python-novaclient 2:9.1.0-0ubuntu1~cloud0

  
  2. Hypervisor:
  Libvirt + KVM

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750790/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750780] [NEW] Race with local file systems can make open-vm-tools fail to start

2018-02-21 Thread ChristianEhrhardt
Public bug reported:

Since the change in [1] open-vm-tools-service starts very (very) early.
Not so much due to the 
Before=cloud-init-local.service
But much more by
DefaultDependencies=no

That can trigger an issue that looks like
root@ubuntuguest:~# systemctl status -l open-vm-tools.service
● open-vm-tools.service - Service for virtual machines hosted on VMware
   Loaded: loaded (/lib/systemd/system/open-vm-tools.service; enabled; vendor 
preset: enabled)
   Active: failed (Result: resources)


As it is right now open-vm-tools can race with the other early start and then 
fail.
In detail one can find a message like:
  open-vm-tools.service: Failed to run 'start' task: Read-only file system"

This is due to privtaeTmp=yes which is also set needing a writable
/var/tmp [2]

To ensure this works PrivateTmp would have to be removed (not good) or some 
after dependencies added that make this work reliably.
I added
After=local-fs.target
which made it work for me in 3/3 tests.

I' like to have an ack by the cloud-init Team that this does not totally kill 
the originally intended Before=cloud-init-local.service
I think it does not as local-fs can complete before cloud-init-local, then 
open-vm-tools can initialize and finally cloud-init-local can pick up the data.

To summarize:
# cloud-init-local #
DefaultDependencies=no
Wants=network-pre.target
After=systemd-remount-fs.service
Before=NetworkManager.service
Before=network-pre.target
Before=shutdown.target
Before=sysinit.target
Conflicts=shutdown.target
RequiresMountsFor=/var/lib/cloud

# open-vm-tools #
DefaultDependencies=no
Before=cloud-init-local.service

Proposed is to add to the latter:
After=local-fs.target

[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859677
[2]: https://github.com/systemd/systemd/issues/5610

** Affects: cloud-init
 Importance: Undecided
 Status: New

** Affects: open-vm-tools (Ubuntu)
 Importance: High
 Status: Triaged

** Also affects: cloud-init
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1750780

Title:
  Race with local file systems can make open-vm-tools fail to start

Status in cloud-init:
  New
Status in open-vm-tools package in Ubuntu:
  Triaged

Bug description:
  Since the change in [1] open-vm-tools-service starts very (very) early.
  Not so much due to the 
  Before=cloud-init-local.service
  But much more by
  DefaultDependencies=no

  That can trigger an issue that looks like
  root@ubuntuguest:~# systemctl status -l open-vm-tools.service
  ● open-vm-tools.service - Service for virtual machines hosted on VMware
 Loaded: loaded (/lib/systemd/system/open-vm-tools.service; enabled; vendor 
preset: enabled)
 Active: failed (Result: resources)

  
  As it is right now open-vm-tools can race with the other early start and then 
fail.
  In detail one can find a message like:
open-vm-tools.service: Failed to run 'start' task: Read-only file system"

  This is due to privtaeTmp=yes which is also set needing a writable
  /var/tmp [2]

  To ensure this works PrivateTmp would have to be removed (not good) or some 
after dependencies added that make this work reliably.
  I added
  After=local-fs.target
  which made it work for me in 3/3 tests.

  I' like to have an ack by the cloud-init Team that this does not totally kill 
the originally intended Before=cloud-init-local.service
  I think it does not as local-fs can complete before cloud-init-local, then 
open-vm-tools can initialize and finally cloud-init-local can pick up the data.

  To summarize:
  # cloud-init-local #
  DefaultDependencies=no
  Wants=network-pre.target
  After=systemd-remount-fs.service
  Before=NetworkManager.service
  Before=network-pre.target
  Before=shutdown.target
  Before=sysinit.target
  Conflicts=shutdown.target
  RequiresMountsFor=/var/lib/cloud

  # open-vm-tools #
  DefaultDependencies=no
  Before=cloud-init-local.service

  Proposed is to add to the latter:
  After=local-fs.target

  [1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859677
  [2]: https://github.com/systemd/systemd/issues/5610

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1750780/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750777] [NEW] openvswitch agent eating CPU, time spent in ip_conntrack.py

2018-02-21 Thread Thomas Morin
Public bug reported:

We just ran into a case where the openvswitch agent (local dev destack,
current master branch) eats 100% of CPU time.

Pyflame profiling show the time being largely spent in
neutron.agent.linux.ip_conntrack, line 95.

https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_conntrack.py#L95

The code around this line is:

while True:
pool.spawn_n(self._process_queue)

The documentation of eventlet.spawn_n says: "The same as spawn(), but
it’s not possible to know how the function terminated (i.e. no return
value or exceptions). This makes execution faster. See spawn_n for more
details."  I suspect that GreenPool.spaw_n may behave similarly.

It seems plausible that spawn_n is returning very quickly because of
some error, and then all time is quickly spent in a short circuited
while loop.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1750777

Title:
  openvswitch agent eating CPU, time spent in ip_conntrack.py

Status in neutron:
  New

Bug description:
  We just ran into a case where the openvswitch agent (local dev
  destack, current master branch) eats 100% of CPU time.

  Pyflame profiling show the time being largely spent in
  neutron.agent.linux.ip_conntrack, line 95.

  
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_conntrack.py#L95

  The code around this line is:

  while True:
  pool.spawn_n(self._process_queue)

  The documentation of eventlet.spawn_n says: "The same as spawn(), but
  it’s not possible to know how the function terminated (i.e. no return
  value or exceptions). This makes execution faster. See spawn_n for
  more details."  I suspect that GreenPool.spaw_n may behave similarly.

  It seems plausible that spawn_n is returning very quickly because of
  some error, and then all time is quickly spent in a short circuited
  while loop.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1750777/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750770] [NEW] installing cloud init in vmware breaks ubuntu user

2018-02-21 Thread ChristianEhrhardt
Public bug reported:

When installing cloud-init in vmware without any setup for user/vendor
data it breaks the ubuntu user.

Steps to reproduce:
1. take vmwre (free 30 days is fine)
2. install xenial (maybe newer as well but my case was xenial)
3. set up your user to be ubuntu/ubuntu (through the vmware fast installer)
# you now have a working system
# no user/vendor data provider was set up (unless vmware did some internally)
4. install cloud-init
5. reboot
# on reboot I see the cloud init vmware data gatherer timing out (fine as 
expected)
# But after that I can't login anymore, so it seems it changed the user

This came up in debugging another issue - so there is a chance I messed
the service dependencies up enough to trigger this :-/ (we need to check
that)

Sorry, this sucks at getting logs and since I can't login anymore ...
I'll have to setup a new system with a second user to use to take a look.

** Affects: cloud-init
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1750770

Title:
  installing cloud init in vmware breaks ubuntu user

Status in cloud-init:
  New

Bug description:
  When installing cloud-init in vmware without any setup for user/vendor
  data it breaks the ubuntu user.

  Steps to reproduce:
  1. take vmwre (free 30 days is fine)
  2. install xenial (maybe newer as well but my case was xenial)
  3. set up your user to be ubuntu/ubuntu (through the vmware fast installer)
  # you now have a working system
  # no user/vendor data provider was set up (unless vmware did some internally)
  4. install cloud-init
  5. reboot
  # on reboot I see the cloud init vmware data gatherer timing out (fine as 
expected)
  # But after that I can't login anymore, so it seems it changed the user

  This came up in debugging another issue - so there is a chance I
  messed the service dependencies up enough to trigger this :-/ (we need
  to check that)

  Sorry, this sucks at getting logs and since I can't login anymore ...
  I'll have to setup a new system with a second user to use to take a look.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1750770/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750205] Re: image import: 500 for web-download import-method

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/545649
Committed: 
https://git.openstack.org/cgit/openstack/glance/commit/?id=156ba81c2fad2844af1ad21b24c771cf66522932
Submitter: Zuul
Branch:master

commit 156ba81c2fad2844af1ad21b24c771cf66522932
Author: Brian Rosmaita 
Date:   Sat Feb 17 23:48:18 2018 -0500

Fix config group not found error

Two parts to this fix:

* add a call to oslo.config.cfg.import_group so that the function
  that checks a uri against the configured white/blacklists can
  access them
* move the location where these options are defined into the
  module's __init__ so that they can be imported without causing a
  circular import (which happens if you import them from their
  current location)

Change-Id: I6363faba0c4cbe75e6e4d0cbf0209a62c10474ef
Closes-bug: #1750205


** Changed in: glance
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1750205

Title:
  image import: 500 for web-download import-method

Status in Glance:
  Fix Released

Bug description:
  This is in the log:

  Feb 17 23:18:00 br-virtual-machine glance-api[22952]: ERROR
  glance.common.wsgi [None req-59d8c68b-8fc9-4b04-9215-6f64abd55532 demo
  demo] Caught error: no such option import_filtering_opts in group
  [DEFAULT]: NoSuchOptError: no such option import_filtering_opts in
  group [DEFAULT]

  Pretty sure the problem is that when the uri validating function was
  moved to common.utils, the import filtering options are no longer
  guaranteed to be registered at the point when the request hits the
  ImagesController.

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1750205/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1749640] Re: db sync fails for mysql while adding triggers

2018-02-21 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/544792
Committed: 
https://git.openstack.org/cgit/openstack/glance/commit/?id=14e8a7b53ba7ee6e6c3b9265c819bd9acc5274a1
Submitter: Zuul
Branch:master

commit 14e8a7b53ba7ee6e6c3b9265c819bd9acc5274a1
Author: Abhishek Kekane 
Date:   Tue Feb 20 15:32:00 2018 +

Triggers shouldn't be execute in offline migration

Recently this change [1] in glance-manage db_sync is internally
using Expand, Migrate and Contract. EMC is explicitly used for
online migration for which glance uses triggers to sync data
between old columns and new columns. DB Sync is used for
offline migartion for which adding triggers is not required.

Made provision to execute triggers explicitly in case of
online migration (EMC pattern) and skip the same in
case of offline migration (db sync).

[1] https://review.openstack.org/#/c/433934/

Closes-Bug: #1749640
Change-Id: I816c73405dd61d933182ad5efc24445a0add4eea


** Changed in: glance
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1749640

Title:
  db sync fails for mysql while adding triggers

Status in Glance:
  Fix Released

Bug description:
  glance-manage db sync fails while adding triggers to the database
  table with error.

  Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332
  os-collect-config[2239]: "DBError: (pymysql.err.InternalError) (1419,
  u'You do not have the SUPER privilege and binary logging is enabled
  (you *might* want to use the less safe log_bin_trust_function_creators
  variable)') [SQL: u\"\\nCREATE TRIGGER insert_visibility BEFORE INSERT
  ON images\\nFOR EACH ROW\\nBEGIN\\n-- NOTE(abashmak):\\n-- The
  following IF/ELSE block implements a priority decision tree.\\n--
  Strict order MUST be followed to correctly cover all the edge
  cases.\\n\\n-- Edge case: neither is_public nor visibility
  specified\\n--(or both specified as NULL):\\nIF
  NEW.is_public <=> NULL AND NEW.visibility <=> NULL THEN\\n
  SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid visibility
  value';\\n-- Edge case: both is_public and visibility
  specified:\\nELSEIF NOT(NEW.is_public <=> NULL OR NEW.visibility
  <=> NULL) THEN\\nSIGNAL SQLSTATE '45000' SET MESSAGE_TEXT =
  'Invalid visibility value';\\n-- Inserting with is_public, set
  visibility accordingly:\\nELSEIF NOT NEW.is_public <=> NULL
  THEN\\nIF NEW.is_public = 1 THEN\\nSET
  NEW.visibility = 'public';\\nELSE\\nSET
  NEW.visibility = 'shared';\\nEND IF;\\n-- Inserting with
  visibility, set is_public accordingly:\\nELSEIF NOT NEW.visibility
  <=> NULL THEN\\nIF NEW.visibility = 'public' THEN\\n
  SET NEW.is_public = 1;\\nELSE\\nSET NEW.is_public
  = 0;\\nEND IF;\\n-- Edge case: either one of: is_public or
  visibility,\\n--is explicitly set to NULL:\\n
  ELSE\\nSIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid
  visibility value';\\nEND IF;\\nEND;\\n\"] (Background on this
  error at: http://sqlalche.me/e/2j85)",

  
  The reason is for MySQL, using the glance-manage db_sync or glance-manage 
expand command requires that you either grant your glance user SUPER 
privileges, or run set global log_bin_trust_function_creators=1; in mysql 
beforehand.

  
  Actual logs:
  Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 
os-collect-config[2239]: "+++ [[ -n 0 ]]",
  Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 
os-collect-config[2239]: "+++ glance-manage db_sync",
  Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 
os-collect-config[2239]: 
"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:1334: 
OsloDBDeprecationWarning: EngineFacade is deprecated; please use 
oslo_db.sqlalchemy.enginefacade",
  Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 
os-collect-config[2239]: "  expire_on_commit=expire_on_commit, _conf=conf)",
  Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 
os-collect-config[2239]: "INFO  [alembic.runtime.migration] Context impl 
MySQLImpl.",
  Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 
os-collect-config[2239]: "INFO  [alembic.runtime.migration] Will assume 
non-transactional DDL.",
  Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 
os-collect-config[2239]: "INFO  [alembic.runtime.migration] Running upgrade  -> 
liberty, liberty initial",
  Feb 15 03:20:31 upstream-centos-7-2-node-rdo-cloud-tripleo-30309-6332 
os-collect-config[2239]: "INFO  [alembic.runtime.migration] Running upgrade 
liberty -> mitaka01, add index on created_at and updated_at columns of 'images'