[Yahoo-eng-team] [Bug 1847999] [NEW] vmware virt driver's report of VCPU can be inaccurate in some cases

2019-10-14 Thread Chris Dent
Public bug reported:

caveat lector: This is a placeholder bug to record an issue with the
vmware virtdriver so that if a reasonable solution is determined it can
be contributed upstream. The challenge is that no solution is going to
be perfect so it may be easier to just leave things as they are, but I
wanted to get this in place to remember it. If a patch does happen, I'll
be doing it.

In the downstream version of the vmware driver more features are
exposed, based on various settings made on the individual esxi hosts and
the vcenter cluster manager. Some of these features consume available
resources (cpu, disk, memory) that needs to be accounted as overhead,
per esxi host. However, because the vmware driver has chosen to expose
the vcenter cluster as the unit of hypervisor, per esxi host differences
are difficult to manage in nova and placement. In some cases
compensation can be done by tweaking max_unit of a resource class (see
update_provider_tree in nova/virt/vmwareapi/driver.py for existing
examples) to have a value of the maximum available slice on any host (or
datastore) and regularly updating this (in the periodic job or after a
workload lands).

For VCPU resources there is a mismatch between how the esxi host reports
overhead and how nova and placement think of it. vmware talks Hz, nova
and placement in whole CPUs. For some NFV-related features, reserving a
"core" for network management (things which help a workload but are not
the workload itself) will lower the value of available Hz, but not
impact 'summary.hardware.numCpuThread', the attribute currently used to
calculate total and max_unit for the VCPU resource class.

A more accurate picture of available resources can be created by doing
some math across several hardware summary attributes: numCpuThreads,
cpuMhz, and numCpuCores. Probably with some "what features are turned
on" magic for extra accuracy.

The correct math is being researched, I'll hang it on this bug when it
is figured out.

** Affects: nova
 Importance: Low
 Assignee: Chris Dent (cdent)
 Status: Triaged

** Changed in: nova
   Status: New => Triaged

** Changed in: nova
   Importance: Undecided => Low

** Changed in: nova
 Assignee: (unassigned) => Chris Dent (cdent)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1847999

Title:
  vmware virt driver's report of VCPU can be inaccurate in some cases

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  caveat lector: This is a placeholder bug to record an issue with the
  vmware virtdriver so that if a reasonable solution is determined it
  can be contributed upstream. The challenge is that no solution is
  going to be perfect so it may be easier to just leave things as they
  are, but I wanted to get this in place to remember it. If a patch does
  happen, I'll be doing it.

  In the downstream version of the vmware driver more features are
  exposed, based on various settings made on the individual esxi hosts
  and the vcenter cluster manager. Some of these features consume
  available resources (cpu, disk, memory) that needs to be accounted as
  overhead, per esxi host. However, because the vmware driver has chosen
  to expose the vcenter cluster as the unit of hypervisor, per esxi host
  differences are difficult to manage in nova and placement. In some
  cases compensation can be done by tweaking max_unit of a resource
  class (see update_provider_tree in nova/virt/vmwareapi/driver.py for
  existing examples) to have a value of the maximum available slice on
  any host (or datastore) and regularly updating this (in the periodic
  job or after a workload lands).

  For VCPU resources there is a mismatch between how the esxi host
  reports overhead and how nova and placement think of it. vmware talks
  Hz, nova and placement in whole CPUs. For some NFV-related features,
  reserving a "core" for network management (things which help a
  workload but are not the workload itself) will lower the value of
  available Hz, but not impact 'summary.hardware.numCpuThread', the
  attribute currently used to calculate total and max_unit for the VCPU
  resource class.

  A more accurate picture of available resources can be created by doing
  some math across several hardware summary attributes: numCpuThreads,
  cpuMhz, and numCpuCores. Probably with some "what features are turned
  on" magic for extra accuracy.

  The correct math is being researched, I'll hang it on this bug when it
  is figured out.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1847999/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1837995] Re: "Unexpected API Error" when use "openstack usage show" command

2019-08-02 Thread Chris Dent
** Changed in: nova
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1837995

Title:
  "Unexpected API Error" when use "openstack usage show" command

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===
  For a non-admin project, if you have instance launched. And try to query the 
usage information on GUI by clicking Overview or on CLI: openstack usage show

  I will got "Error: Unable to retrieve usage information." on GUI. and
  the following ERROR for CLI output:

  $ openstack usage show
  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ 
and attach the Nova API log if possible.
   (HTTP 500) (Request-ID: 
req-cbea9542-ecce-42fd-b660-fc5f996ea3c3)

  Steps to reproduce
  ==
  Execute "openstack usage show" command 
  Or click Project - Compute - Overview button on the GUI.

  
  Expected result
  ===
  No Error report and the usage information shown

  
  Actual result
  =
  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ 
and attach the Nova API log if possible.
   (HTTP 500) (Request-ID: 
req-cbea9542-ecce-42fd-b660-fc5f996ea3c3)

  
  Environment
  ===
  1. Exact version of OpenStack you are running. 
  Openstack Stein on CentOS7

  $ rpm -qa | grep nova
  openstack-nova-api-19.0.1-1.el7.noarch
  puppet-nova-14.4.0-1.el7.noarch
  python2-nova-19.0.1-1.el7.noarch
  openstack-nova-conductor-19.0.1-1.el7.noarch
  openstack-nova-novncproxy-19.0.1-1.el7.noarch
  openstack-nova-migration-19.0.1-1.el7.noarch
  openstack-nova-common-19.0.1-1.el7.noarch
  openstack-nova-scheduler-19.0.1-1.el7.noarch
  openstack-nova-console-19.0.1-1.el7.noarch
  python2-novaclient-13.0.1-1.el7.noarch
  openstack-nova-placement-api-19.0.1-1.el7.noarch
  openstack-nova-compute-19.0.1-1.el7.noarch

  2. Which hypervisor did you use?
 Libvirt + KVM
 $ rpm -qa | grep kvm
  qemu-kvm-ev-2.12.0-18.el7_6.5.1.x86_64
  libvirt-daemon-kvm-4.5.0-10.el7_6.12.x86_64
  qemu-kvm-common-ev-2.12.0-18.el7_6.5.1.x86_64
  $ rpm -qa | grep libvirt
  libvirt-gconfig-1.0.0-1.el7.x86_64
  libvirt-daemon-driver-nwfilter-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-interface-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-config-nwfilter-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-mpath-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-core-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-secret-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-lxc-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-rbd-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-kvm-4.5.0-10.el7_6.12.x86_64
  libvirt-bash-completion-4.5.0-10.el7_6.12.x86_64
  libvirt-4.5.0-10.el7_6.12.x86_64
  libvirt-glib-1.0.0-1.el7.x86_64
  libvirt-daemon-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-qemu-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-config-network-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-disk-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-4.5.0-10.el7_6.12.x86_64
  libvirt-python-4.5.0-1.el7.x86_64
  libvirt-libs-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-scsi-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-network-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-nodedev-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-logical-4.5.0-10.el7_6.12.x86_64
  libvirt-daemon-driver-storage-iscsi-4.5.0-10.el7_6.12.x86_64
  libvirt-client-4.5.0-10.el7_6.12.x86_64
  libvirt-gobject-1.0.0-1.el7.x86_64

  
  Logs & Configs
  ==

  nova-api.log

  
  2019-07-26 16:12:53.967 8673 INFO nova.osapi_compute.wsgi.server 
[req-69d7df76-7dd9-4d42-8eeb-347ef1c9d0a5 f887cc44f21043dca85438d74a47d68d 
0d47cfd5b9c94a5790fa4472e576cba6 - default default] c5f::e2 "GET 
/v2.1/0d47cfd5b9c94a5790fa4472e576cba6/servers/detail?all_tenants=True=2019-07-26T08%3A07%3A55.280119%2B00%3A00
 HTTP/1.1" status: 200 len: 413 time: 0.0639658
  2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi 
[req-cbea9542-ecce-42fd-b660-fc5f996ea3c3 1e45ea9a7d5647a6a938c2ac027822f2 
85dd8936d21b46a8878ed59678c7ad9a - default default] Unexpected exception in API 
method: OrphanedObjectError: Cannot call obj_load_attr on orphaned Instance 
object
  2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi Traceback (most 
recent call last):
  2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py", line 671, in 
wrapped
  2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi return 
f(*args, **kwargs)
  2019-07-26 16:12:57.211 8682 ERROR nova.api.openstack.wsgi   File 
"/usr/lib/python2.7/site-packages/nova/api/openstack/compute/simple_tenant_usage.py",
 line 291, in show
  2019-07-26 

[Yahoo-eng-team] [Bug 1829472] Re: [placement] install and configure controller node in nova (STEIN)

2019-05-17 Thread Chris Dent
The [placement] section is still necessary for nova.conf to configure
how nova talks to the placement service. As it says in there you are
configuring "access to the Placement service".

So I'm pretty sure this is not a bug, and will mark it as such. However,
if I've missed your point, please respond with more info.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1829472

Title:
  [placement] install and configure controller node in nova (STEIN)

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Hi guys,

  I'm working with the doc:
  https://docs.openstack.org/nova/stein/install/controller-install-
  ubuntu.html#install-and-configure-components

  [x] This doc is inaccurate in this way:

  Obviously, the service nova-placement, and with it the creation of the 
endpoints and the user placement a.s.o. are eliminated.
  However, there is still the [placement] section in the document to configure 
the /etc/nova/nova.conf file.
  I think this section is obsolete.
  Regards,
  Robert

  
  ---
  Release: STEIN on 2019-03-27 10:24:01
  SHA: d7d7f115430c7ffeb88ec9dcd155ac69b29d7513
  Source: 
https://git.openstack.org/cgit/openstack/nova/tree/doc/source/install/controller-install-ubuntu.rst
  URL: 
https://docs.openstack.org/nova/stein/install/controller-install-ubuntu.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1829472/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1782690] Re: Consider forbidden traits in early exit of AllocationCandidates

2019-04-12 Thread Chris Dent
** Changed in: nova
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1782690

Title:
  Consider forbidden traits in early exit of AllocationCandidates

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  Current, if there aren't any providers that have any of the required
  traits, just exit early in _get_by_one_request. But we should also
  consider the forbidden traits to optimize this attempt at a quick
  return.

  
https://github.com/openstack/nova/blob/fdd4253/nova/api/openstack/placement/objects/resource_provider.py#L3948-L3956

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1782690/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1771325] Re: placement trait and inventory handler use nonstandard HTTP error message details

2019-04-04 Thread Chris Dent
** Changed in: nova/pike
   Status: Fix Committed => Fix Released

** Changed in: nova
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1771325

Title:
  placement trait and inventory handler use nonstandard HTTP error
  message details

Status in OpenStack Compute (nova):
  Fix Committed
Status in OpenStack Compute (nova) pike series:
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Fix Released

Bug description:
  DELETE /traits/

  Actual
  --

  {"errors": [{"status": 400, "request_id": "req-b30e30ba-9fce-403f-
  9f24-b6e32cd0b8c9", "detail": "Cannot delete standard trait
  HW_GPU_API_DXVA.\n\n   ", "title": "Bad Request"}]}

  Expected
  

  {"errors": [{"status": 400, "request_id": "req-3caa15be-a726-41f2
  -a7cb-f4afb3c97a44", "detail": "The server could not comply with the
  request since it is either malformed or otherwise incorrect.\n\n
  Cannot delete standard trait HW_GPU_API_DXVA.  ", "title": "Bad
  Request"}]}

  Most of the placement wsgi code passes one positional argument to the
  constructor of the  webob.exc.HTTPXXX exception classes but the trait
  [1] and inventory handlers uses the 'explanation' kwargs. As the above
  example shows this leads to different behavior. This inconsistency
  leads to incorrect behavior in osc placement client [2].

  [1] 
https://github.com/openstack/nova/blob/ae131868f71700d69053b65a0a37f9c2d65c3770/nova/api/openstack/placement/handlers/trait.py#L133
  [2] 
https://github.com/openstack/osc-placement/blob/2357807c95d74afc836852e1c54f0631c6fd2d60/osc_placement/http.py#L35

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1771325/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1778591] Re: GET /allocations/{uuid} on a consumer with no allocations provides no generation

2019-04-04 Thread Chris Dent
Since there's been a lot of churn since this conversation started, and
because we're over on storyboard now, and other issues have taken
greater priority, I'm going to mark this as a wontfix. Not because we
won't fix the conceptual situation, but that the concept isn't really
caught by the bug.

We'll get to it when it matters.

** Changed in: nova
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1778591

Title:
  GET /allocations/{uuid} on a consumer with no allocations provides no
  generation

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  If we write some allocations with PUT /allocations/{uuid} at modern
  microversions, a consumer record is created for {uuid} and a
  generation is created for that consumer. Each subsequent attempt to
  PUT /allocations/{uuid} must include a matching consumer generation.

  If the allocations for a consumer are cleared (either DELETE, or PUT
  /allocations/{uuid} with an empty dict of allocations) two things go
  awry:

  * the consumer record, with a generation, stays around
  * GET /allocations/{uuid} returns the following:

 {u'allocations': {}}

  That is, no generation is provided, and we have no way figure one out
  other than inspecting the details of the error response.

  Some options to address this:

  * Return the generation in that response
  * When the allocations for a consumer go empty, remove the consumer
  * Something else?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1778591/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1823043] [NEW] Docs insufficiently clear on the intersection of availalability zones, force, and cold and live migrations

2019-04-03 Thread Chris Dent
Public bug reported:


It's hard to find a single place in the nova docs where the impact of 
availability zones (including default) on the capabilities of live or cold 
migrations is clear

In https://docs.openstack.org/nova/latest/user/aggregates.html
#availability-zones-azs is probably a good place to describe what's
going on. Some of the rules are discussed in IRC at

http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
nova.2019-04-03.log.html#t2019-04-03T15:22:39

The gist is that when a server is created, if it is created in an AZ
(either an explicit one, or in the default 'nova' zone) it is required
to stay in that AZ for all move operations unless there is a force,
which can happen in a live migrate or evacuate.

(Caveats abound in this area, see the IRC log for more discussion which
may help to flavor the docs being created.)

The reasoning for this, as far as I can tell, is that requesting an AZ
is a part of the boot constraints and we don't want the moved server to
be in violation of its own constraints.

** Affects: nova
 Importance: Medium
 Assignee: Matt Riedemann (mriedem)
 Status: Confirmed


** Tags: doc docs

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1823043

Title:
  Docs insufficiently clear on the intersection of availalability zones,
  force, and cold and live migrations

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  
  It's hard to find a single place in the nova docs where the impact of 
availability zones (including default) on the capabilities of live or cold 
migrations is clear

  In https://docs.openstack.org/nova/latest/user/aggregates.html
  #availability-zones-azs is probably a good place to describe what's
  going on. Some of the rules are discussed in IRC at

  http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-
  nova.2019-04-03.log.html#t2019-04-03T15:22:39

  The gist is that when a server is created, if it is created in an AZ
  (either an explicit one, or in the default 'nova' zone) it is required
  to stay in that AZ for all move operations unless there is a force,
  which can happen in a live migrate or evacuate.

  (Caveats abound in this area, see the IRC log for more discussion
  which may help to flavor the docs being created.)

  The reasoning for this, as far as I can tell, is that requesting an AZ
  is a part of the boot constraints and we don't want the moved server
  to be in violation of its own constraints.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1823043/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1821737] [NEW] simple_cell_setup reports "exiting" when it is not

2019-03-26 Thread Chris Dent
Public bug reported:

When running 'nova-manage simple_cell_setup ...' if hosts are already
the _map_cell_and_hosts method prints a message of 'All hosts are
already mapped to cell(s), exiting.' and then proceeds to map instances.
It does not, in fact, exit.

This isn't the end of the world, but is somewhat confusing.

The easiest fix is probably to get rid of ', exiting'. Then in the
multiple paths to the method, the printed message still makes sense and
'exiting' can be implicit.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: cells

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1821737

Title:
  simple_cell_setup reports "exiting" when it is not

Status in OpenStack Compute (nova):
  New

Bug description:
  When running 'nova-manage simple_cell_setup ...' if hosts are already
  the _map_cell_and_hosts method prints a message of 'All hosts are
  already mapped to cell(s), exiting.' and then proceeds to map
  instances. It does not, in fact, exit.

  This isn't the end of the world, but is somewhat confusing.

  The easiest fix is probably to get rid of ', exiting'. Then in the
  multiple paths to the method, the printed message still makes sense
  and 'exiting' can be implicit.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1821737/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1696830] Re: nova-placement-api default config files is too strict

2019-03-11 Thread Chris Dent
After speaking with coreycb we're going to drop nova/placement from this
as it no longer quite fits and the existing snap related bugs are
sufficient to provide an aide-mémoire.

** Changed in: nova
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1696830

Title:
  nova-placement-api default config files is too strict

Status in OpenStack Compute (nova):
  Won't Fix
Status in oslo.config:
  Fix Released
Status in Glance Snap:
  Fix Released
Status in Keystone Snap:
  Fix Released
Status in Neutron Snap:
  Fix Released
Status in Nova Snap:
  Fix Released
Status in Nova Hypervisor Snap:
  Fix Released

Bug description:
  If nova.conf doesn't exist in the typical location of
  /etc/nova/nova.conf and OS_PLACEMENT_CONFIG_DIR isn't set, nova-
  placement-api's wsgi application will fail. In our case with the
  OpenStack snap, we have two possible paths we may pick nova.conf up
  from, based on what --config-file specifies. I think the right answer
  here is to be a bit more flexible and not set the default config file
  if it's path doesn't exist.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1696830/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1818560] [NEW] Nova's use of the placement database fixture from test_report_client doesn't register opts

2019-03-04 Thread Chris Dent
Public bug reported:

See: http://logs.openstack.org/98/538498/22/gate/nova-tox-functional-
py35/7673d3e/testr_results.html.gz

The failing tests there are failing because the Database fixture from
placement is used directly, and configuration opts are not being
registered properly. This was an oversight when adding a new
configuration setting.

The fix is to register the missing opt when requested to do so.

This is blocking the gate.

** Affects: nova
 Importance: Critical
 Status: Confirmed


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1818560

Title:
  Nova's use of the placement database fixture from test_report_client
  doesn't register opts

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  See: http://logs.openstack.org/98/538498/22/gate/nova-tox-functional-
  py35/7673d3e/testr_results.html.gz

  The failing tests there are failing because the Database fixture from
  placement is used directly, and configuration opts are not being
  registered properly. This was an oversight when adding a new
  configuration setting.

  The fix is to register the missing opt when requested to do so.

  This is blocking the gate.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1818560/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1792503] Re: allocation candidates "?member_of=" doesn't work with nested providers

2019-03-04 Thread Chris Dent
** Changed in: nova/rocky
   Status: In Progress => Won't Fix

** Changed in: nova
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1792503

Title:
  allocation candidates "?member_of=" doesn't work with nested providers

Status in OpenStack Compute (nova):
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Won't Fix

Bug description:
  "GET /allocation_candidates" now supports "member_of" parameter.
  With nested providers present, this should work with the following 
constraints.

  -
  (a)  With "member_of" qparam, aggregates on the root should span on the whole 
tree

  If a root provider is in the aggregate, which has been specified by 
"member_of" qparam,
  the resource providers under that root can be in allocation candidates even 
the root is absent.

  (b) Without "member_of" qparam, sharing resource provider should be
  shared with the whole tree

  If a sharing provider is in the same aggregate with one resource provider 
(rpA),
  and "member_of" hasn't been specified in qparam by user, the sharing provider 
can be in
  allocation candidates with any of the resource providers in the same tree 
with rpA.

  (c) With "member_of" qparam, the range of the share of sharing
  resource providers should shrink to the resource providers "under the
  specified aggregates" in a tree.

  Here, whether the rp is "under the specified aggregates" is determined with 
the constraints of (a). Namely, not only rps that belongs to the aggregates 
directly are "under the aggregates",
  but olso rps whose root is under the aggregates are also "under the 
aggregates".
  -

  So far at Stein PTG time, 2018 Sep. 13th, this constraint is broken in the 
point that
  when placement picks up allocation candidates, the aggregates of nested 
providers
  are assumed as the same as root providers. This means it ignores the 
aggregates of
  the nested provider itself. This could result in the lack of allocation 
candidates when
  an aggregate which on a nested provider but not on the root has been 
specified in
  the `member_of` query parameter.

  This bug is well described in a test case which is submitted shortly.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1792503/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1809401] Re: os-resource-classes: Could not satisfy constraints for 'os-resource-classes': installation from path or url cannot be constrained to a version

2019-03-04 Thread Chris Dent
** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1809401

Title:
  os-resource-classes: Could not satisfy constraints for 'os-resource-
  classes': installation from path or url cannot be constrained to a
  version

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  http://logs.openstack.org/85/624885/2/check/openstack-tox-pep8/a18a925
  /job-output.txt.gz#_2018-12-21_01_42_38_612849

  
  2018-12-21 01:42:19.753292 | ubuntu-xenial | pep8 create: 
/home/zuul/src/git.openstack.org/openstack/os-resource-classes/.tox/pep8
  2018-12-21 01:42:22.895392 | ubuntu-xenial | pep8 installdeps: 
-r/home/zuul/src/git.openstack.org/openstack/os-resource-classes/test-requirements.txt
  2018-12-21 01:42:35.891695 | ubuntu-xenial | pep8 develop-inst: 
/home/zuul/src/git.openstack.org/openstack/os-resource-classes
  2018-12-21 01:42:38.606499 | ubuntu-xenial | ERROR: invocation failed (exit 
code 1), logfile: 
/home/zuul/src/git.openstack.org/openstack/os-resource-classes/.tox/pep8/log/pep8-2.log
  2018-12-21 01:42:38.606661 | ubuntu-xenial | ERROR: actionid: pep8
  2018-12-21 01:42:38.606722 | ubuntu-xenial | msg: developpkg
  2018-12-21 01:42:38.607197 | ubuntu-xenial | cmdargs: 
'/home/zuul/src/git.openstack.org/openstack/os-resource-classes/.tox/pep8/bin/pip
 install 
-c/home/zuul/src/git.openstack.org/openstack/requirements/upper-constraints.txt 
--exists-action w -e 
/home/zuul/src/git.openstack.org/openstack/os-resource-classes'
  2018-12-21 01:42:38.607230 | ubuntu-xenial |
  2018-12-21 01:42:38.607406 | ubuntu-xenial | Ignoring mypy-extensions: 
markers 'python_version == "3.4"' don't match your environment
  2018-12-21 01:42:38.607586 | ubuntu-xenial | Ignoring mypy-extensions: 
markers 'python_version == "3.5"' don't match your environment
  2018-12-21 01:42:38.607761 | ubuntu-xenial | Ignoring mypy-extensions: 
markers 'python_version == "3.6"' don't match your environment
  2018-12-21 01:42:38.607923 | ubuntu-xenial | Ignoring asyncio: markers 
'python_version == "3.4"' don't match your environment
  2018-12-21 01:42:38.608086 | ubuntu-xenial | Ignoring asyncio: markers 
'python_version == "3.5"' don't match your environment
  2018-12-21 01:42:38.608248 | ubuntu-xenial | Ignoring asyncio: markers 
'python_version == "3.6"' don't match your environment
  2018-12-21 01:42:38.608418 | ubuntu-xenial | Ignoring dnspython3: markers 
'python_version == "3.4"' don't match your environment
  2018-12-21 01:42:38.608585 | ubuntu-xenial | Ignoring dnspython3: markers 
'python_version == "3.5"' don't match your environment
  2018-12-21 01:42:38.608751 | ubuntu-xenial | Ignoring dnspython3: markers 
'python_version == "3.6"' don't match your environment
  2018-12-21 01:42:38.608908 | ubuntu-xenial | Ignoring mypy: markers 
'python_version == "3.4"' don't match your environment
  2018-12-21 01:42:38.609093 | ubuntu-xenial | Ignoring mypy: markers 
'python_version == "3.5"' don't match your environment
  2018-12-21 01:42:38.609255 | ubuntu-xenial | Ignoring mypy: markers 
'python_version == "3.6"' don't match your environment
  2018-12-21 01:42:38.609417 | ubuntu-xenial | Ignoring jeepney: markers 
'python_version == "3.4"' don't match your environment
  2018-12-21 01:42:38.609577 | ubuntu-xenial | Ignoring jeepney: markers 
'python_version == "3.5"' don't match your environment
  2018-12-21 01:42:38.609739 | ubuntu-xenial | Ignoring jeepney: markers 
'python_version == "3.6"' don't match your environment
  2018-12-21 01:42:38.609910 | ubuntu-xenial | Ignoring SecretStorage: markers 
'python_version == "3.4"' don't match your environment
  2018-12-21 01:42:38.610081 | ubuntu-xenial | Ignoring SecretStorage: markers 
'python_version == "3.5"' don't match your environment
  2018-12-21 01:42:38.610253 | ubuntu-xenial | Ignoring SecretStorage: markers 
'python_version == "3.6"' don't match your environment
  2018-12-21 01:42:38.610413 | ubuntu-xenial | Ignoring Django: markers 
'python_version == "3.4"' don't match your environment
  2018-12-21 01:42:38.610573 | ubuntu-xenial | Ignoring Django: markers 
'python_version == "3.5"' don't match your environment
  2018-12-21 01:42:38.610733 | ubuntu-xenial | Ignoring Django: markers 
'python_version == "3.6"' don't match your environment
  2018-12-21 01:42:38.610889 | ubuntu-xenial | Ignoring cmd2: markers 
'python_version == "3.4"' don't match your environment
  2018-12-21 01:42:38.611044 | ubuntu-xenial | Ignoring cmd2: markers 
'python_version == "3.5"' don't match your environment
  2018-12-21 01:42:38.611200 | ubuntu-xenial | Ignoring cmd2: markers 
'python_version == "3.6"' don't match your environment
  2018-12-21 01:42:38.611364 | ubuntu-xenial | Ignoring typed-ast: markers 
'python_version == "3.4"' don't match your environment
  2018-12-21 01:42:38.611528 | ubuntu-xenial 

[Yahoo-eng-team] [Bug 1818498] [NEW] Placement aggregate creation continues to be unstable under very high load

2019-03-04 Thread Chris Dent
Public bug reported:

See: http://logs.openstack.org/89/639889/3/check/placement-
perfload/e56f0a0/logs/placement-api.log (or any other recent perfload
run) where there are multiple errors when trying to create aggregates.

Various bits of work have been done to try to fix that up, but
apparently none of them have fully worked.

Tetsuro had some ideas on using better transaction defaults in mysql's
configs, but I was reluctant to do that because presumably a lot of
people install and use the defaults and ideally our solution would "just
work" with the defaults.

Perhaps I'm completely wrong about that. In a very high concurrency
situation (which is what's happening in the perfload job) tweaks of the
db may be required.

In any case, this probably needs more attention: whatever the solution
we don't want to be able to create 500s so easily. And the solution is
not simply to make them 4xx. We want the problem to not happen.

** Affects: nova
 Importance: Low
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1818498

Title:
  Placement aggregate creation continues to be unstable under very high
  load

Status in OpenStack Compute (nova):
  New

Bug description:
  See: http://logs.openstack.org/89/639889/3/check/placement-
  perfload/e56f0a0/logs/placement-api.log (or any other recent perfload
  run) where there are multiple errors when trying to create aggregates.

  Various bits of work have been done to try to fix that up, but
  apparently none of them have fully worked.

  Tetsuro had some ideas on using better transaction defaults in mysql's
  configs, but I was reluctant to do that because presumably a lot of
  people install and use the defaults and ideally our solution would
  "just work" with the defaults.

  Perhaps I'm completely wrong about that. In a very high concurrency
  situation (which is what's happening in the perfload job) tweaks of
  the db may be required.

  In any case, this probably needs more attention: whatever the solution
  we don't want to be able to create 500s so easily. And the solution is
  not simply to make them 4xx. We want the problem to not happen.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1818498/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1817633] [NEW] Listing placement usages causes a circular reference error

2019-02-25 Thread Chris Dent
Public bug reported:

With the removal of oslo versioned objects from from the Usage and
UsageList classes, serializing usages can result in a ValueError:
Circular reference detected because there is a decimal.Decimal in the
data to be returned.

This is a result of a func.sum used in the query to get lists of
aggregates.

Note that the creation of a Decimal does not happen with sqlite, so the
problem is not revealed by the functional gabbi tests. It happened to
show up in the pending gabbi-based integration tests
https://review.openstack.org/#/c/601614/ . See:
http://logs.openstack.org/08/607508/14/check/placement-gabbi-
tempest/f7c3eca/controller/logs/screen-placement-
api.txt.gz#_Feb_25_22_37_31_423135 for an example.

This can be fixed by casting the 'used' value to an int when creating a
Usage.

** Affects: nova
 Importance: High
 Assignee: Chris Dent (cdent)
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1817633

Title:
  Listing placement usages causes a circular reference error

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  With the removal of oslo versioned objects from from the Usage and
  UsageList classes, serializing usages can result in a ValueError:
  Circular reference detected because there is a decimal.Decimal in the
  data to be returned.

  This is a result of a func.sum used in the query to get lists of
  aggregates.

  Note that the creation of a Decimal does not happen with sqlite, so
  the problem is not revealed by the functional gabbi tests. It happened
  to show up in the pending gabbi-based integration tests
  https://review.openstack.org/#/c/601614/ . See:
  http://logs.openstack.org/08/607508/14/check/placement-gabbi-
  tempest/f7c3eca/controller/logs/screen-placement-
  api.txt.gz#_Feb_25_22_37_31_423135 for an example.

  This can be fixed by casting the 'used' value to an int when creating
  a Usage.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1817633/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1816230] [NEW] allocations by consumer and resource provider use wrong timestamp

2019-02-16 Thread Chris Dent
Public bug reported:

When listing allocations by resource provider or consumer uuid, the
updated_at and created_at fields in the database are not loaded into the
object, so default to now when their times are used to generate last-
modified headers in http responses.

This isn't a huge problem because we tend not to care about those times
(at the moment), but it would be a useful thing to clean up.

The issues are in AllocationList.get_all_by_resource_provider and
.get_all_by_consumer_id

** Affects: nova
 Importance: Low
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1816230

Title:
  allocations by consumer and resource provider use wrong timestamp

Status in OpenStack Compute (nova):
  New

Bug description:
  When listing allocations by resource provider or consumer uuid, the
  updated_at and created_at fields in the database are not loaded into
  the object, so default to now when their times are used to generate
  last-modified headers in http responses.

  This isn't a huge problem because we tend not to care about those
  times (at the moment), but it would be a useful thing to clean up.

  The issues are in AllocationList.get_all_by_resource_provider and
  .get_all_by_consumer_id

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1816230/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1807976] [NEW] In python 3.7 the definition of a printable character is changed so test_flavors fails

2018-12-11 Thread Chris Dent
Public bug reported:

'test_name_with_non_printable_characters' in the 'test_flavors' unit
tests checks to see that a non-printable character cannot be allowed in
a flavor name. This fails in python 3.7.

The reason it fails is because in Python 3.7 the 'unicodedata' package
was updated [1] to Unicode 11 and what's printable has changed.

The fix to the problem is to use a _really_ unprintable unicode char,
according unicode 11.

[1] https://docs.python.org/3/whatsnew/3.7.html#unicodedata

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1807976

Title:
  In python 3.7 the definition of a printable character is changed so
  test_flavors fails

Status in OpenStack Compute (nova):
  New

Bug description:
  'test_name_with_non_printable_characters' in the 'test_flavors' unit
  tests checks to see that a non-printable character cannot be allowed
  in a flavor name. This fails in python 3.7.

  The reason it fails is because in Python 3.7 the 'unicodedata' package
  was updated [1] to Unicode 11 and what's printable has changed.

  The fix to the problem is to use a _really_ unprintable unicode char,
  according unicode 11.

  [1] https://docs.python.org/3/whatsnew/3.7.html#unicodedata

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1807976/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1807970] [NEW] test_multi_cell_list fails in python 3.7

2018-12-11 Thread Chris Dent
Public bug reported:

Generators used in multi cell list handling raise StopIteration, which
is not something python 3.7 likes. Efforts to add python 3.7 testing to
nova [1] revealed this (and similar for neighboring tests):

nova.tests.unit.compute.test_multi_cell_list.TestBaseClass.test_with_failing_cells
--

Captured traceback:
~~~
b'Traceback (most recent call last):'
b'  File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 
101, in query_wrapper'
b'for record in fn(ctx, *args, **kwargs):'
b'  File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 
348, in do_query'
b'**kwargs)'
b'  File 
"/mnt/share/cdentsrc/nova/nova/tests/unit/compute/test_multi_cell_list.py", 
line 356, in get_by_filters'
b'raise exception.CellTimeout'
b'nova.exception.CellTimeout: Timeout waiting for response from cell'
b''
b'During handling of the above exception, another exception occurred:'
b''
b'Traceback (most recent call last):'
b'  File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 
108, in query_wrapper'
b'raise StopIteration'
b'StopIteration'
b''
b'The above exception was the direct cause of the following exception:'
b''
b'Traceback (most recent call last):'
b'  File 
"/mnt/share/cdentsrc/nova/.tox/py37/lib/python3.7/site-packages/mock/mock.py", 
line 1305, in patched'
b'return func(*args, **keywargs)'
b'  File 
"/mnt/share/cdentsrc/nova/nova/tests/unit/compute/test_multi_cell_list.py", 
line 384, in test_with_failing_cells'
b'self.assertEqual(50, len(list(result)))'
b'  File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 
400, in get_records_sorted'
b'item = next(feeder)'
b'  File "/mnt/share/cdentsrc/nova/.tox/py37/lib/python3.7/heapq.py", line 
359, in merge'
b's[0] = next()   # raises StopIteration when exhausted'
b'RuntimeError: generator raised StopIteration'
b''

According to pep 479 [2] the fix for this is to 'return' instead of
'raise StopIteration'.


[1] https://review.openstack.org/#/c/624055/
[2] 
https://www.python.org/dev/peps/pep-0479/#writing-backwards-and-forwards-compatible-code

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1807970

Title:
  test_multi_cell_list fails in python 3.7

Status in OpenStack Compute (nova):
  New

Bug description:
  Generators used in multi cell list handling raise StopIteration, which
  is not something python 3.7 likes. Efforts to add python 3.7 testing
  to nova [1] revealed this (and similar for neighboring tests):

  
nova.tests.unit.compute.test_multi_cell_list.TestBaseClass.test_with_failing_cells
  
--

  Captured traceback:
  ~~~
  b'Traceback (most recent call last):'
  b'  File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 
101, in query_wrapper'
  b'for record in fn(ctx, *args, **kwargs):'
  b'  File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 
348, in do_query'
  b'**kwargs)'
  b'  File 
"/mnt/share/cdentsrc/nova/nova/tests/unit/compute/test_multi_cell_list.py", 
line 356, in get_by_filters'
  b'raise exception.CellTimeout'
  b'nova.exception.CellTimeout: Timeout waiting for response from cell'
  b''
  b'During handling of the above exception, another exception occurred:'
  b''
  b'Traceback (most recent call last):'
  b'  File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 
108, in query_wrapper'
  b'raise StopIteration'
  b'StopIteration'
  b''
  b'The above exception was the direct cause of the following exception:'
  b''
  b'Traceback (most recent call last):'
  b'  File 
"/mnt/share/cdentsrc/nova/.tox/py37/lib/python3.7/site-packages/mock/mock.py", 
line 1305, in patched'
  b'return func(*args, **keywargs)'
  b'  File 
"/mnt/share/cdentsrc/nova/nova/tests/unit/compute/test_multi_cell_list.py", 
line 384, in test_with_failing_cells'
  b'self.assertEqual(50, len(list(result)))'
  b'  File "/mnt/share/cdentsrc/nova/nova/compute/multi_cell_list.py", line 
400, in get_records_sorted'
  b'item = next(feeder)'
  b'  File "/mnt/share/cdentsrc/nova/.tox/py37/lib/python3.7/heapq.py", 
line 359, in merge'
  b's[0] = next()   # raises StopIteration when exhausted'
  b'RuntimeError: generator raised StopIteration'
  b''

  According to pep 479 [2] the fix for this is to 'return' instead of
  'raise StopIteration'.

  
  [1] 

[Yahoo-eng-team] [Bug 1805858] [NEW] placement/objects/resource_provider.py missing test coverage for several methods

2018-11-29 Thread Chris Dent
Public bug reported:

In the extracted placement, the coverage tests run both unit and
functional tests. This is because so much of the functionality is tested
by gabbi. In those runs we can see that a few methods in
placement/objects/resource_provider.py are not reached. They are:


def _provider_aggregates
  
http://logs.openstack.org/12/620412/3/check/openstack-tox-cover/b97d53f/cover/placement_objects_resource_provider_py.html#n3009

def _aggregates_associated_with_providers
  
http://logs.openstack.org/12/620412/3/check/openstack-tox-cover/b97d53f/cover/placement_objects_resource_provider_py.html#n3385

def _shared_allocation_request_resources
  
http://logs.openstack.org/12/620412/3/check/openstack-tox-cover/b97d53f/cover/placement_objects_resource_provider_py.html#n3398

** Affects: nova
 Importance: Medium
 Status: Triaged


** Tags: needs-functional-test placement testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1805858

Title:
  placement/objects/resource_provider.py missing test coverage for
  several methods

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  In the extracted placement, the coverage tests run both unit and
  functional tests. This is because so much of the functionality is
  tested by gabbi. In those runs we can see that a few methods in
  placement/objects/resource_provider.py are not reached. They are:

  
  def _provider_aggregates

http://logs.openstack.org/12/620412/3/check/openstack-tox-cover/b97d53f/cover/placement_objects_resource_provider_py.html#n3009

  def _aggregates_associated_with_providers

http://logs.openstack.org/12/620412/3/check/openstack-tox-cover/b97d53f/cover/placement_objects_resource_provider_py.html#n3385

  def _shared_allocation_request_resources

http://logs.openstack.org/12/620412/3/check/openstack-tox-cover/b97d53f/cover/placement_objects_resource_provider_py.html#n3398

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1805858/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1805408] [NEW] ServerGroupTestV21.test_boot_servers_with_anti_affinity and related tests can race

2018-11-27 Thread Chris Dent
Public bug reported:

The ServerGroup functional tests do not adequately manage the _SUPPORTS*
globals used at
https://github.com/openstack/nova/blob/1a1ea8e2aa66a2654e6cc141c735e47bbd8c4fef/nova/scheduler/utils.py#L805
leading to tests that come sometimes fail.

The easy fix is to reset the globals before and after the tests.

The longer term, less easy, fix is to not use globals.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1805408

Title:
  ServerGroupTestV21.test_boot_servers_with_anti_affinity and related
  tests can race

Status in OpenStack Compute (nova):
  New

Bug description:
  The ServerGroup functional tests do not adequately manage the
  _SUPPORTS* globals used at
  
https://github.com/openstack/nova/blob/1a1ea8e2aa66a2654e6cc141c735e47bbd8c4fef/nova/scheduler/utils.py#L805
  leading to tests that come sometimes fail.

  The easy fix is to reset the globals before and after the tests.

  The longer term, less easy, fix is to not use globals.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1805408/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1804453] [NEW] maximum recursion possible while setting aggregates in placement

2018-11-21 Thread Chris Dent
Public bug reported:

It's possible for the _ensure_aggregate code in
objects/resource_provider.py to, under unusual circumstances, reach a
maximum recursion error, because it calls itself when there is a
DBDuplicateEntry error.

http://logs.openstack.org/84/602484/30/check/placement-
perfload/8a8642e/controller/logs/screen-placement-
api.txt.gz#_Nov_21_13_05_03_661629

http://logs.openstack.org/84/602484/30/check/placement-
perfload/8a8642e/controller/logs/screen-placement-
api.txt.gz#_Nov_21_13_05_03_654874

" ERROR placement.fault_wrap [None req-5fc62d1e-a1bd-47e3-a61e-
45e01281fed3 None None] Placement API unexpected error: maximum
recursion depth exceeded while getting the str of an object:
RuntimeError: maximum recursion depth exceeded while getting the str of
an object"

The "getting the str" part appears to be a coincidence based on reaching
a bad stack depth at that particular moment.

This happened while the placeload script was doing its thing of adding
aggregates to to 1000 resource providers using asyncio, so concurrency
is high and weird. See https://review.openstack.org/#/c/602484/ for the
code that caused this.

It is unlikely that this is going to happen in the real world, but it is
the sort of thing it would be nice to be more robust about, perhaps by
counting attempts and bailing out?

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1804453

Title:
  maximum recursion possible while setting aggregates in placement

Status in OpenStack Compute (nova):
  New

Bug description:
  It's possible for the _ensure_aggregate code in
  objects/resource_provider.py to, under unusual circumstances, reach a
  maximum recursion error, because it calls itself when there is a
  DBDuplicateEntry error.

  http://logs.openstack.org/84/602484/30/check/placement-
  perfload/8a8642e/controller/logs/screen-placement-
  api.txt.gz#_Nov_21_13_05_03_661629

  http://logs.openstack.org/84/602484/30/check/placement-
  perfload/8a8642e/controller/logs/screen-placement-
  api.txt.gz#_Nov_21_13_05_03_654874

  " ERROR placement.fault_wrap [None req-5fc62d1e-a1bd-47e3-a61e-
  45e01281fed3 None None] Placement API unexpected error: maximum
  recursion depth exceeded while getting the str of an object:
  RuntimeError: maximum recursion depth exceeded while getting the str
  of an object"

  The "getting the str" part appears to be a coincidence based on
  reaching a bad stack depth at that particular moment.

  This happened while the placeload script was doing its thing of adding
  aggregates to to 1000 resource providers using asyncio, so concurrency
  is high and weird. See https://review.openstack.org/#/c/602484/ for
  the code that caused this.

  It is unlikely that this is going to happen in the real world, but it
  is the sort of thing it would be nice to be more robust about, perhaps
  by counting attempts and bailing out?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1804453/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1804062] [NEW] test_hacking fails for python 3.6.7 and newer

2018-11-19 Thread Chris Dent
Public bug reported:

The check for double words in test_hacking is failing in python 3.6.7
(released in ubuntu 18.04 within the last few days) and in new versions
of 3.7.x. This is is because of this change to python:
https://bugs.python.org/issue33899 .

This is causing failures in python 36 unit tests for nova.

The fix ought to be adding a newline to the code sample. Maybe.

** Affects: nova
 Importance: Undecided
 Status: Confirmed


** Tags: gate-failure testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1804062

Title:
  test_hacking fails for python 3.6.7 and newer

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  The check for double words in test_hacking is failing in python 3.6.7
  (released in ubuntu 18.04 within the last few days) and in new
  versions of 3.7.x. This is is because of this change to python:
  https://bugs.python.org/issue33899 .

  This is causing failures in python 36 unit tests for nova.

  The fix ought to be adding a newline to the code sample. Maybe.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1804062/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1802925] [NEW] Unable to start placement wsgi app without conf file

2018-11-12 Thread Chris Dent
Public bug reported:

oslo_config allows configuration to come from the process environment
(e.g., OS_PLACEMENT_DATABASE__CONNECTION). This was developed to allow
services that use oslo_config to hosted within immutable containers.

However, as currently written, the placement-wsgi app (in both nova and
placement) cannot start if there isn't at least an empty placement.conf,
either in /etc/placement or in the directory defined by
OS_PLACEMENT_CONFIG_DIR.

It's easy enough to work around this: set the empty file in the
container and forget about it.

In placement/wsgi.py:

conf.CONF(argv[1:], project='placement',
  version=version_info.version_string(),
  default_config_files=default_config_files)

is the call that causes the problem, because default_config_files is
always non-None, even when the original default is being used. We can
fix this by only setting the value when a non default is being used.

(This is possible now that placement is placement, not nova).

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1802925

Title:
  Unable to start placement wsgi app without conf file

Status in OpenStack Compute (nova):
  New

Bug description:
  oslo_config allows configuration to come from the process environment
  (e.g., OS_PLACEMENT_DATABASE__CONNECTION). This was developed to allow
  services that use oslo_config to hosted within immutable containers.

  However, as currently written, the placement-wsgi app (in both nova
  and placement) cannot start if there isn't at least an empty
  placement.conf, either in /etc/placement or in the directory defined
  by OS_PLACEMENT_CONFIG_DIR.

  It's easy enough to work around this: set the empty file in the
  container and forget about it.

  In placement/wsgi.py:

  conf.CONF(argv[1:], project='placement',
version=version_info.version_string(),
default_config_files=default_config_files)

  is the call that causes the problem, because default_config_files is
  always non-None, even when the original default is being used. We can
  fix this by only setting the value when a non default is being used.

  (This is possible now that placement is placement, not nova).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1802925/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1800124] [NEW] internal NotFound error can lead to 500 error in modern devstack setup

2018-10-26 Thread Chris Dent
Public bug reported:

I'm using a recent devstack in late October 2018, no special keystone
configuration, it is running under uwsgi and apache2.

If I make a request of the service to a bogus URL:


curl -v http://localhost/identity/v3/narf

> GET /identity/v3/narf HTTP/1.1
> Host: localhost
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 500 INTERNAL SERVER ERROR
< Date: Fri, 26 Oct 2018 10:08:34 GMT
< Server: Apache/2.4.29 (Ubuntu)
< Content-Type: application/json
< Content-Length: 138
< Vary: X-Auth-Token
< x-openstack-request-id: req-cfafa26b-75a9-4573-9076-61ff9290c6a7
< Connection: close
< 
{"error":{"code":500,"message":"An unexpected error prevented the server from 
fulfilling your request.","title":"Internal Server Error"}}


I stumbled upon this because I was experimenting with pulling the
catalog and requests /v3/catalog instead of /v3/services

Which doesn't seem ideal :)

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1800124

Title:
  internal NotFound error can lead to 500 error in modern devstack setup

Status in OpenStack Identity (keystone):
  New

Bug description:
  I'm using a recent devstack in late October 2018, no special keystone
  configuration, it is running under uwsgi and apache2.

  If I make a request of the service to a bogus URL:

  
  curl -v http://localhost/identity/v3/narf

  > GET /identity/v3/narf HTTP/1.1
  > Host: localhost
  > User-Agent: curl/7.58.0
  > Accept: */*
  > 
  < HTTP/1.1 500 INTERNAL SERVER ERROR
  < Date: Fri, 26 Oct 2018 10:08:34 GMT
  < Server: Apache/2.4.29 (Ubuntu)
  < Content-Type: application/json
  < Content-Length: 138
  < Vary: X-Auth-Token
  < x-openstack-request-id: req-cfafa26b-75a9-4573-9076-61ff9290c6a7
  < Connection: close
  < 
  {"error":{"code":500,"message":"An unexpected error prevented the server from 
fulfilling your request.","title":"Internal Server Error"}}
  

  I stumbled upon this because I was experimenting with pulling the
  catalog and requests /v3/catalog instead of /v3/services

  Which doesn't seem ideal :)

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1800124/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1778227] Re: Docs needed for optional placement database

2018-10-04 Thread Chris Dent
This is differently out of date now that we have an extracted placement.

** Changed in: nova
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1778227

Title:
  Docs needed for optional placement database

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Blueprint https://blueprints.launchpad.net/nova/+spec/optional-
  placement-database added support for configuring a separate database
  for the placement service so it's not just part of the nova_api
  database (even though it's the same schema for now).

  This is important for extracting placement from nova, and should be
  part of a new base install so people don't have to migrate later.

  There are at least two places we should update in the docs:

  1. The install guide for fresh installs should tell people to create a
  database for placement and configure nova.conf appropriately using the
  new placement database config options.

  Looking at the install guide, we have 3 different guides to update
  (ubuntu, rdo, suse), but looks like:

  a) https://docs.openstack.org/nova/latest/install/controller-install-
  ubuntu.html#prerequisites - create a placement database using the
  nova_api schema.

  b) configure the placement db

  https://docs.openstack.org/nova/latest/install/controller-install-
  ubuntu.html#install-and-configure-components

  I think that's it. "nova-manage api_db sync" will sync the placement
  database and the nova_api database, so we should be good there.

  2. Update the placement upgrade docs for Rocky to mention the new
  config option and, at a high level, the options people have for
  migrating from nova_api to placement db, e.g. stop api services, copy
  the nova_api db, deploy placement db using the nova_api copy, config
  and restart api services.

  https://docs.openstack.org/nova/latest/user/placement.html#rocky-18-0-0

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1778227/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1795502] [NEW] running nova-api-wsgi from the command line fails in python3.6

2018-10-01 Thread Chris Dent
Public bug reported:

The nova-api-wsgi script provides a quick and easy way to run the nova-
api from the command line. In at least python3.6 it fails with:

ERROR nova Traceback (most recent call last):
ERROR nova   File "/usr/local/bin/nova-api-wsgi", line 50, in 
ERROR nova server.serve_forever()
ERROR nova   File "/usr/lib/python3.6/socketserver.py", line 232, in 
serve_forever
ERROR nova with _ServerSelector() as selector:
ERROR nova   File "/usr/lib/python3.6/selectors.py", line 348, in __init__
ERROR nova self._poll = select.poll()
ERROR nova AttributeError: module 'select' has no attribute 'poll'
ERROR nova 

this is because eventlet is being monkey patched too late, see
https://stackoverflow.com/questions/51524589/attributeerror-module-
select-has-no-attribute-poll

importing eventlet at the top of the script fixes it.

It's not clear this is necessarily worth fixing, as running nova-api
this way is not really a thing. I only did because I was trying to
confirm that a problem was not the result of uwsgi.

But report it for sake of discussion.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: api

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1795502

Title:
  running nova-api-wsgi from the command line fails in python3.6

Status in OpenStack Compute (nova):
  New

Bug description:
  The nova-api-wsgi script provides a quick and easy way to run the
  nova-api from the command line. In at least python3.6 it fails with:

  ERROR nova Traceback (most recent call last):
  ERROR nova   File "/usr/local/bin/nova-api-wsgi", line 50, in 
  ERROR nova server.serve_forever()
  ERROR nova   File "/usr/lib/python3.6/socketserver.py", line 232, in 
serve_forever
  ERROR nova with _ServerSelector() as selector:
  ERROR nova   File "/usr/lib/python3.6/selectors.py", line 348, in __init__
  ERROR nova self._poll = select.poll()
  ERROR nova AttributeError: module 'select' has no attribute 'poll'
  ERROR nova 

  this is because eventlet is being monkey patched too late, see
  https://stackoverflow.com/questions/51524589/attributeerror-module-
  select-has-no-attribute-poll

  importing eventlet at the top of the script fixes it.

  It's not clear this is necessarily worth fixing, as running nova-api
  this way is not really a thing. I only did because I was trying to
  confirm that a problem was not the result of uwsgi.

  But report it for sake of discussion.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1795502/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1795425] [NEW] create server api sends location header as bytestring in py3

2018-10-01 Thread Chris Dent
Public bug reported:

PEP  points out that request and response headers, inside a WSGI
application, should be native strings. That is: whatever `str` is in the
version of Python being used:
https://www.python.org/dev/peps/pep-/#a-note-on-string-types

The create server api returns a location header which is encoded to
UTF-8 in python, making it a bytestring in python3. This violates the
spec but also leads to issues when testing nova under wsgi-intercept
(which removes whatever normalisation most WSGI servers helpfully do for
"bad" applications). The issues show up when concatenating the response
header with other values, such as base URLs.

** Affects: nova
 Importance: Undecided
 Assignee: Chris Dent (cdent)
 Status: In Progress


** Tags: api

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1795425

Title:
  create server api sends location header as bytestring in py3

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  PEP  points out that request and response headers, inside a WSGI
  application, should be native strings. That is: whatever `str` is in
  the version of Python being used:
  https://www.python.org/dev/peps/pep-/#a-note-on-string-types

  The create server api returns a location header which is encoded to
  UTF-8 in python, making it a bytestring in python3. This violates the
  spec but also leads to issues when testing nova under wsgi-intercept
  (which removes whatever normalisation most WSGI servers helpfully do
  for "bad" applications). The issues show up when concatenating the
  response header with other values, such as base URLs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1795425/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1789633] [NEW] placement aggregate handling has lock trouble with sqlite files

2018-08-29 Thread Chris Dent
Public bug reported:

Not sure if this is in itself a bug, but instead indicates that there
are issues with aggregate handling that may show up in the real world
under some conditions.

The placecat tooling at https://github.com/cdent/placecat has a suite of
automated tests that run every now and again against a docker container
built from nova master. It uses a file-based sqlite database. With
recent updates those tests are now failing with 'database locked' errors
(traceback below is from functional tests, but the one in the placecat
server is the same). travis logs at https://travis-
ci.org/cdent/placecat/builds/421991104 but the traceback is not visible
there

In a nova checkout, I've changed the functional tests to use a file
based database for sqlite and can intermittently replicate the problem
when PUTting aggregates for a resource provider. It seems this came in
with 2d7ed309ec4 ( https://review.openstack.org/#/c/592654/ ).


-=-=-
   ERROR [nova.api.openstack.placement.fault_wrap] Placement API unexpected 
error: (sqlite3.OperationalError) database is locked (Background on this error 
at: http://sqlalche.me/e/e3q8)
Traceback (most recent call last):
  File "nova/api/openstack/placement/fault_wrap.py", line 40, in __call__
return self.application(environ, start_response)
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/webob/dec.py",
 line 129, in __call__
resp = self.call_func(req, *args, **kw)
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/webob/dec.py",
 line 193, in call_func
return self.func(req, *args, **kwargs)
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/microversion_parse/middleware.py",
 line 80, in __call__
response = req.get_response(self.application)
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/webob/request.py",
 line 1313, in send
application, catch_exc_info=False)
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/webob/request.py",
 line 1277, in call_application
app_iter = application(self.environ, start_response)
  File "nova/api/openstack/placement/handler.py", line 213, in __call__
return dispatch(environ, start_response, self._map)
  File "nova/api/openstack/placement/handler.py", line 150, in dispatch
return handler(environ, start_response)
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/webob/dec.py",
 line 129, in __call__
resp = self.call_func(req, *args, **kw)
  File "nova/api/openstack/placement/wsgi_wrapper.py", line 29, in call_func
super(PlacementWsgify, self).call_func(req, *args, **kwargs)
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/webob/dec.py",
 line 193, in call_func
return self.func(req, *args, **kwargs)
  File "nova/api/openstack/placement/util.py", line 191, in 
decorated_function
return f(req)
  File "nova/api/openstack/placement/microversion.py", line 166, in 
decorated_func
return _find_method(f, version, status_code)(req, *args, **kwargs)
  File "nova/api/openstack/placement/handlers/aggregate.py", line 131, in 
set_aggregates
increment_generation=consider_generation)
  File "nova/api/openstack/placement/handlers/aggregate.py", line 72, in 
_set_aggregates
aggregate_uuids, increment_generation=increment_generation)
  File "nova/api/openstack/placement/objects/resource_provider.py", line 
991, in set_aggregates
increment_generation=increment_generation)
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
 line 993, in wrapper
return fn(*args, **kwargs)
  File "nova/api/openstack/placement/objects/resource_provider.py", line 
557, in _set_aggregates
agg_id = _ensure_aggregate(context, agg_uuid)
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
 line 993, in wrapper
return fn(*args, **kwargs)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
 line 1043, in _transaction_scope
yield resource
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
 line 653, in _session
self.session.rollback()
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
 line 220, in __exit__
self.force_reraise()
  File 
"/mnt/share/cdentsrc/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_utils/excutils.py",
 line 196, 

[Yahoo-eng-team] [Bug 1788176] [NEW] placement functional tests can variably fail because of conf

2018-08-21 Thread Chris Dent
Public bug reported:

When running just the placement functional tests with `tox -efunctional
placement` on a host with sufficient cpus (16 in this case, on a host
with only 4 it wasn't happening) some tests can fail because of conf not
being properly set with "oslo_config.cfg.NoSuchOptError: no such option
oslo_policy in group [DEFAULT]". Example traceback at
http://paste.openstack.org/show/728513/

This appears to because in some cases registering the policy opts is
happening elsewhere in the processing before running the gabbi tests but
only sometimes. This is a symptom of long term global conf leakage that
we may wish to consider fixing, but the short term fix is to register
the opts in the gabbi fixture. Patch forthcoming.

** Affects: nova
 Importance: Medium
 Assignee: Chris Dent (cdent)
 Status: In Progress


** Tags: placement testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1788176

Title:
  placement functional tests can variably fail because of conf

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When running just the placement functional tests with `tox
  -efunctional placement` on a host with sufficient cpus (16 in this
  case, on a host with only 4 it wasn't happening) some tests can fail
  because of conf not being properly set with
  "oslo_config.cfg.NoSuchOptError: no such option oslo_policy in group
  [DEFAULT]". Example traceback at
  http://paste.openstack.org/show/728513/

  This appears to because in some cases registering the policy opts is
  happening elsewhere in the processing before running the gabbi tests
  but only sometimes. This is a symptom of long term global conf leakage
  that we may wish to consider fixing, but the short term fix is to
  register the opts in the gabbi fixture. Patch forthcoming.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1788176/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1786703] [NEW] Placement duplicate aggregate uuid handling during concurrent aggregate create insufficiently robust

2018-08-12 Thread Chris Dent
Public bug reported:

NOTE: This may be just a postgresql problem, not sure.

When doing some further experiments with load testing placement, my
resource provider create script, which uses asyncio was able to cause
several 500 errors from the placement service of the following form:

```
cdent-a01:~/src/placeload(master) $ docker logs zen_murdock |grep 
'req-d4dcbfed-b050-4a3b-ab0f-d2489a31c3f2'
2018-08-12 16:03:30.698 9 DEBUG nova.api.openstack.placement.requestlog 
[req-d4dcbfed-b050-4a3b-ab0f-d2489a31c3f2 admin admin - - -] Starting request: 
172.17.0.1 "PUT 
/resource_providers/13b09bc9-164f-4d03-8a61-5e78c05a73ad/aggregates" __call__ 
/usr/lib/python3.6/site-packages/nova/api/openstack/placement/requestlog.py:38
2018-08-12 16:03:30.903 9 ERROR nova.api.openstack.placement.fault_wrap 
[req-d4dcbfed-b050-4a3b-ab0f-d2489a31c3f2 admin admin - - -] Placement API 
unexpected error: This Session's transaction has been rolled back due to a 
previous exception during flush. To begin a new transaction with this Session, 
first issue Session.rollback(). Original exception was: 
(psycopg2.IntegrityError) duplicate key value violates unique constraint 
"uniq_placement_aggregates0uuid"
2018-08-12 16:03:30.914 9 INFO nova.api.openstack.placement.requestlog 
[req-d4dcbfed-b050-4a3b-ab0f-d2489a31c3f2 admin admin - - -] 172.17.0.1 "PUT 
/resource_providers/13b09bc9-164f-4d03-8a61-5e78c05a73ad/aggregates" status: 
500 len: 997 microversion: 1.29
```

"DETAIL:  Key (uuid)=(14a5c8a3-5a99-4e8f-88be-00d85fcb1c17) already
exists."


This is because the code at 
https://github.com/openstack/nova/blob/a29ace1d48b5473b9e7b5decdf3d5d19f3d262f3/nova/api/openstack/placement/objects/resource_provider.py#L519-L529
 is not trapping the right error when the server thinks it needs to create a 
new aggregate at the same time that it is already creating it.

It's not clear to me if this is because oslo_db is not transforming the
postgresql error properly or that the generic error there is the wrong
one and we've never noticed before because we don't hit the concurrency
situation hard enough.

** Affects: nova
 Importance: Medium
 Status: New


** Tags: db placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1786703

Title:
  Placement duplicate aggregate uuid handling during concurrent
  aggregate create insufficiently robust

Status in OpenStack Compute (nova):
  New

Bug description:
  NOTE: This may be just a postgresql problem, not sure.

  When doing some further experiments with load testing placement, my
  resource provider create script, which uses asyncio was able to cause
  several 500 errors from the placement service of the following form:

  ```
  cdent-a01:~/src/placeload(master) $ docker logs zen_murdock |grep 
'req-d4dcbfed-b050-4a3b-ab0f-d2489a31c3f2'
  2018-08-12 16:03:30.698 9 DEBUG nova.api.openstack.placement.requestlog 
[req-d4dcbfed-b050-4a3b-ab0f-d2489a31c3f2 admin admin - - -] Starting request: 
172.17.0.1 "PUT 
/resource_providers/13b09bc9-164f-4d03-8a61-5e78c05a73ad/aggregates" __call__ 
/usr/lib/python3.6/site-packages/nova/api/openstack/placement/requestlog.py:38
  2018-08-12 16:03:30.903 9 ERROR nova.api.openstack.placement.fault_wrap 
[req-d4dcbfed-b050-4a3b-ab0f-d2489a31c3f2 admin admin - - -] Placement API 
unexpected error: This Session's transaction has been rolled back due to a 
previous exception during flush. To begin a new transaction with this Session, 
first issue Session.rollback(). Original exception was: 
(psycopg2.IntegrityError) duplicate key value violates unique constraint 
"uniq_placement_aggregates0uuid"
  2018-08-12 16:03:30.914 9 INFO nova.api.openstack.placement.requestlog 
[req-d4dcbfed-b050-4a3b-ab0f-d2489a31c3f2 admin admin - - -] 172.17.0.1 "PUT 
/resource_providers/13b09bc9-164f-4d03-8a61-5e78c05a73ad/aggregates" status: 
500 len: 997 microversion: 1.29
  ```

  "DETAIL:  Key (uuid)=(14a5c8a3-5a99-4e8f-88be-00d85fcb1c17) already
  exists."

  
  This is because the code at 
https://github.com/openstack/nova/blob/a29ace1d48b5473b9e7b5decdf3d5d19f3d262f3/nova/api/openstack/placement/objects/resource_provider.py#L519-L529
 is not trapping the right error when the server thinks it needs to create a 
new aggregate at the same time that it is already creating it.

  It's not clear to me if this is because oslo_db is not transforming
  the postgresql error properly or that the generic error there is the
  wrong one and we've never noticed before because we don't hit the
  concurrency situation hard enough.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1786703/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1786519] [NEW] debugging why NoValidHost with placement challenging

2018-08-10 Thread Chris Dent
Public bug reported:

With the advent of placement, the FilterScheduler no longer provides
granular information about which class of resource (disk, VCPU, RAM) is
not available in sufficient quantities to allow a host to be found.

This is because placement is now making those choices and does not (yet)
break down the results of its queries into easy to understand chunks. If
it returns zero results all you know is "we didn't have enough
resources". Nothing about which resources.

This can be fixed by changing the way in queries are made so that there
are a series of queries. After each one a report of how many results are
left can be made.

While this relatively straightforward to do for the (currently-)common
simple non-nested and non-sharing providers situation it will be more
difficult for the non-simple cases. Therefore, it makes sense to have
different code paths for simple and non-simple allocation candidate
queries. This will also result in performance gains for the common case.

See this email thread for additional discussion and reports of problems
in the wild: http://lists.openstack.org/pipermail/openstack-
dev/2018-August/132735.html

** Affects: nova
 Importance: High
 Assignee: Jay Pipes (jaypipes)
 Status: Confirmed


** Tags: placement rocky-rc-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1786519

Title:
  debugging why NoValidHost with placement challenging

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  With the advent of placement, the FilterScheduler no longer provides
  granular information about which class of resource (disk, VCPU, RAM)
  is not available in sufficient quantities to allow a host to be found.

  This is because placement is now making those choices and does not
  (yet) break down the results of its queries into easy to understand
  chunks. If it returns zero results all you know is "we didn't have
  enough resources". Nothing about which resources.

  This can be fixed by changing the way in queries are made so that
  there are a series of queries. After each one a report of how many
  results are left can be made.

  While this relatively straightforward to do for the (currently-)common
  simple non-nested and non-sharing providers situation it will be more
  difficult for the non-simple cases. Therefore, it makes sense to have
  different code paths for simple and non-simple allocation candidate
  queries. This will also result in performance gains for the common
  case.

  See this email thread for additional discussion and reports of
  problems in the wild: http://lists.openstack.org/pipermail/openstack-
  dev/2018-August/132735.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1786519/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1786498] [NEW] placement api produces many warnings about policy scope check failures

2018-08-10 Thread Chris Dent
Public bug reported:

When oslo policy checks were added to placement, fixtures and functional
tests were updated to hide warnings related to scope checks that cannot
(yet) work in the way placement is managing policy.

Those some warnings happen with every request on an actually running
service. The warnings need to be stifled there too.

** Affects: nova
 Importance: Medium
 Assignee: Matt Riedemann (mriedem)
 Status: In Progress

** Affects: nova/rocky
 Importance: Medium
 Assignee: Matt Riedemann (mriedem)
 Status: In Progress


** Tags: placement rocky-rc-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1786498

Title:
  placement api produces many warnings about policy scope check failures

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) rocky series:
  In Progress

Bug description:
  When oslo policy checks were added to placement, fixtures and
  functional tests were updated to hide warnings related to scope checks
  that cannot (yet) work in the way placement is managing policy.

  Those some warnings happen with every request on an actually running
  service. The warnings need to be stifled there too.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1786498/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1786055] [NEW] performance degradation in placement with large number of resource providers

2018-08-08 Thread Chris Dent
Public bug reported:

Using today's master, there is a big performance degradation in GET
/allocation_candidates when there is a large number of resource
providers (in my tests 1000, each with the same inventory as described
in [1]). 17s when querying all three resource classes with
http://127.0.0.1:8081/allocation_candidates?resources=VCPU:1,MEMORY_MB:256,DISK_GB:10

Using a limit does not make any difference, the cost is in generating
the original data.

I did some advanced LOG.debug based benchmarking to determine three
places where things are a problem, and maybe even fixed the worst one.
See the diff below. The two main culprits are
ResourceProvider.get_by_uuid calls looping over the full set. These can
be replaced by either using data we already have from early queries, or
by changing so we are making single queries.

In the diff I've already changed one of them (the second chunk) to use
the data that _build_provider_summaries is already getting. (functional
tests still pass with this change)

The third chunk is because we have a big loop, but I suspect there is
some duplication that can be avoided. I have no investigated that
closely (yet).

-=-=-
diff --git a/nova/api/openstack/placement/objects/resource_provider.py 
b/nova/api/openstack/placement/objects/resource_provider.py
index 851f9719e4..e6c894b8fe 100644
--- a/nova/api/openstack/placement/objects/resource_provider.py
+++ b/nova/api/openstack/placement/objects/resource_provider.py
@@ -3233,6 +3233,8 @@ def _build_provider_summaries(context, usages, 
prov_traits):
 if not summary:
 summary = ProviderSummary(
 context,
+# This is _expensive_ when there are a large number of rps.
+# Building the objects differently may be better.
 resource_provider=ResourceProvider.get_by_uuid(context,
uuid=rp_uuid),
 resources=[],
@@ -3519,8 +3521,7 @@ def _alloc_candidates_multiple_providers(ctx, 
requested_resources,
 rp_uuid = rp_summary.resource_provider.uuid
 tree_dict[root_id][rc_id].append(
 AllocationRequestResource(
-ctx, resource_provider=ResourceProvider.get_by_uuid(ctx,
-rp_uuid),
+ctx, resource_provider=rp_summary.resource_provider,
 resource_class=_RC_CACHE.string_from_id(rc_id),
 amount=requested_resources[rc_id]))
 
@@ -3535,6 +3536,8 @@ def _alloc_candidates_multiple_providers(ctx, 
requested_resources,
 alloc_prov_ids = []
 
 # Let's look into each tree
+# With many resource providers this takes a long time, but each trip
+# through the loop is not too bad.
 for root_id, alloc_dict in tree_dict.items():
 # Get request_groups, which is a list of lists of
 # AllocationRequestResource(ARR) per requested resource class(rc).
-=-=-


[1]
https://github.com/cdent/placeload/blob/master/placeload/__init__.py#L23

** Affects: nova
 Importance: High
 Status: Confirmed


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1786055

Title:
  performance degradation in placement with large number of resource
  providers

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Using today's master, there is a big performance degradation in GET
  /allocation_candidates when there is a large number of resource
  providers (in my tests 1000, each with the same inventory as described
  in [1]). 17s when querying all three resource classes with
  
http://127.0.0.1:8081/allocation_candidates?resources=VCPU:1,MEMORY_MB:256,DISK_GB:10

  Using a limit does not make any difference, the cost is in generating
  the original data.

  I did some advanced LOG.debug based benchmarking to determine three
  places where things are a problem, and maybe even fixed the worst one.
  See the diff below. The two main culprits are
  ResourceProvider.get_by_uuid calls looping over the full set. These
  can be replaced by either using data we already have from early
  queries, or by changing so we are making single queries.

  In the diff I've already changed one of them (the second chunk) to use
  the data that _build_provider_summaries is already getting.
  (functional tests still pass with this change)

  The third chunk is because we have a big loop, but I suspect there is
  some duplication that can be avoided. I have no investigated that
  closely (yet).

  -=-=-
  diff --git a/nova/api/openstack/placement/objects/resource_provider.py 
b/nova/api/openstack/placement/objects/resource_provider.py
  index 851f9719e4..e6c894b8fe 100644
  --- a/nova/api/openstack/placement/objects/resource_provider.py
  +++ b/nova/api/openstack/placement/objects/resource_provider.py
  

[Yahoo-eng-team] [Bug 1784664] [NEW] cors.set_defaults does not have real test coverage in placement and probably not nova either

2018-07-31 Thread Chris Dent
Public bug reported:

Both the placement and nova apis allow oslo_middleware.cors in their
WSGI middleware stacks.

Placement has some gabbi functional tests which test that the middleware
is present and does the right thing when using the middleware's own
configuration defaults. Both when it is on or off in cors.yaml and non-
cors.yaml.

However, the WSGI application that is actually used in a deployment, the
one created by nova/api/openstack/placement/wsgi.py is not used in those
functional tests. That code calls the set_defaults method on the cors
middleware to change and define those HTTP headers and request methods
which will be allowed without further configuration.

As far as I know, nothing (such as a tempest test) confirms those
headers in either placement or nova, and it's relatively certain they
are incomplete with regard to microversions (as OpenStack-API-Version
and X-OpenStack-Nova-API-Version).

This bug is the result of discussion on
https://review.openstack.org/#/c/587183/2/nova/api/openstack/placement/wsgi.py

The gabbi tests show the kinds of requests that can be done to confirm
the right headers are generated:

https://github.com/openstack/nova/blob/master/nova/tests/functional/api/openstack/placement/gabbits/cors.yaml

https://github.com/openstack/nova/blob/master/nova/tests/functional/api/openstack/placement/gabbits
/non-cors.yaml

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: api placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784664

Title:
  cors.set_defaults does not have real test coverage in placement and
  probably not nova either

Status in OpenStack Compute (nova):
  New

Bug description:
  Both the placement and nova apis allow oslo_middleware.cors in their
  WSGI middleware stacks.

  Placement has some gabbi functional tests which test that the
  middleware is present and does the right thing when using the
  middleware's own configuration defaults. Both when it is on or off in
  cors.yaml and non-cors.yaml.

  However, the WSGI application that is actually used in a deployment,
  the one created by nova/api/openstack/placement/wsgi.py is not used in
  those functional tests. That code calls the set_defaults method on the
  cors middleware to change and define those HTTP headers and request
  methods which will be allowed without further configuration.

  As far as I know, nothing (such as a tempest test) confirms those
  headers in either placement or nova, and it's relatively certain they
  are incomplete with regard to microversions (as OpenStack-API-Version
  and X-OpenStack-Nova-API-Version).

  This bug is the result of discussion on
  https://review.openstack.org/#/c/587183/2/nova/api/openstack/placement/wsgi.py

  The gabbi tests show the kinds of requests that can be done to confirm
  the right headers are generated:

  
https://github.com/openstack/nova/blob/master/nova/tests/functional/api/openstack/placement/gabbits/cors.yaml

  
https://github.com/openstack/nova/blob/master/nova/tests/functional/api/openstack/placement/gabbits
  /non-cors.yaml

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784664/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1784663] [NEW] cors.set_defaults does not have real test coverage in placement and probably not nova either

2018-07-31 Thread Chris Dent
Public bug reported:

Both the placement and nova apis allow oslo_middleware.cors in their
WSGI middleware stacks.

Placement has some gabbi functional tests which test that the middleware
is present and does the right thing when using the middleware's own
configuration defaults. Both when it is on or off in cors.yaml and non-
cors.yaml.

However, the WSGI application that is actually used in a deployment, the
one created by nova/api/openstack/placement/wsgi.py is not used in those
functional tests. That code calls the set_defaults method on the cors
middleware to change and define those HTTP headers and request methods
which will be allowed without further configuration.

As far as I know, nothing (such as a tempest test) confirms those
headers in either placement or nova, and it's relatively certain they
are incomplete with regard to microversions (as OpenStack-API-Version
and X-OpenStack-Nova-API-Version).

This bug is the result of discussion on
https://review.openstack.org/#/c/587183/2/nova/api/openstack/placement/wsgi.py

The gabbi tests show the kinds of requests that can be done to confirm
the right headers are generated:

https://github.com/openstack/nova/blob/master/nova/tests/functional/api/openstack/placement/gabbits/cors.yaml

https://github.com/openstack/nova/blob/master/nova/tests/functional/api/openstack/placement/gabbits
/non-cors.yaml

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: api placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784663

Title:
  cors.set_defaults does not have real test coverage in placement and
  probably not nova either

Status in OpenStack Compute (nova):
  New

Bug description:
  Both the placement and nova apis allow oslo_middleware.cors in their
  WSGI middleware stacks.

  Placement has some gabbi functional tests which test that the
  middleware is present and does the right thing when using the
  middleware's own configuration defaults. Both when it is on or off in
  cors.yaml and non-cors.yaml.

  However, the WSGI application that is actually used in a deployment,
  the one created by nova/api/openstack/placement/wsgi.py is not used in
  those functional tests. That code calls the set_defaults method on the
  cors middleware to change and define those HTTP headers and request
  methods which will be allowed without further configuration.

  As far as I know, nothing (such as a tempest test) confirms those
  headers in either placement or nova, and it's relatively certain they
  are incomplete with regard to microversions (as OpenStack-API-Version
  and X-OpenStack-Nova-API-Version).

  This bug is the result of discussion on
  https://review.openstack.org/#/c/587183/2/nova/api/openstack/placement/wsgi.py

  The gabbi tests show the kinds of requests that can be done to confirm
  the right headers are generated:

  
https://github.com/openstack/nova/blob/master/nova/tests/functional/api/openstack/placement/gabbits/cors.yaml

  
https://github.com/openstack/nova/blob/master/nova/tests/functional/api/openstack/placement/gabbits
  /non-cors.yaml

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784663/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1784577] [NEW] Some allocation candidate tests for sharing providers fail in python 3.6 (and work in python 3.5)

2018-07-31 Thread Chris Dent
Public bug reported:

When running the nova functional tests under python 3.6 the
nova.tests.functional.api.openstack.placement.db.test_allocation_candidates.AllocationCandidatesTestCase.test_all_sharing_providers.*
tests (there are 3) all fail because incorrect results are produced on
the call to rp_obj.AllocationCandidates.get_by_requests:

b"reference = [[('ss1', 'DISK_GB', 1500),"
b"  ('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss1', 'SRIOV_NET_VF', 1)],"
b" [('ss1', 'DISK_GB', 1500),"
b"  ('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss2', 'SRIOV_NET_VF', 1)],"
b" [('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss1', 'SRIOV_NET_VF', 1),"
b"  ('ss2', 'DISK_GB', 1500)],"
b" [('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss2', 'DISK_GB', 1500),"
b"  ('ss2', 'SRIOV_NET_VF', 1)]]"
b"actual= [[('ss1', 'DISK_GB', 1500),"
b"  ('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss1', 'SRIOV_NET_VF', 1)],"
b" [('ss1', 'DISK_GB', 1500),"
b"  ('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss2', 'SRIOV_NET_VF', 1)],"
b" [('ss1', 'DISK_GB', 1500),"
b"  ('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss2', 'SRIOV_NET_VF', 1)],"
b" [('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss1', 'SRIOV_NET_VF', 1),"
b"  ('ss2', 'DISK_GB', 1500)],"
b" [('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss2', 'DISK_GB', 1500),"
b"  ('ss2', 'SRIOV_NET_VF', 1)],"
b" [('ss1', 'IPV4_ADDRESS', 2),"
b"  ('ss2', 'DISK_GB', 1500),"
b"  ('ss2', 'SRIOV_NET_VF', 1)]]"

** Affects: nova
 Importance: Medium
 Status: Confirmed


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784577

Title:
  Some allocation candidate tests for sharing providers fail in python
  3.6 (and work in python 3.5)

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  When running the nova functional tests under python 3.6 the
  
nova.tests.functional.api.openstack.placement.db.test_allocation_candidates.AllocationCandidatesTestCase.test_all_sharing_providers.*
  tests (there are 3) all fail because incorrect results are produced on
  the call to rp_obj.AllocationCandidates.get_by_requests:

  b"reference = [[('ss1', 'DISK_GB', 1500),"
  b"  ('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss1', 'SRIOV_NET_VF', 1)],"
  b" [('ss1', 'DISK_GB', 1500),"
  b"  ('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss2', 'SRIOV_NET_VF', 1)],"
  b" [('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss1', 'SRIOV_NET_VF', 1),"
  b"  ('ss2', 'DISK_GB', 1500)],"
  b" [('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss2', 'DISK_GB', 1500),"
  b"  ('ss2', 'SRIOV_NET_VF', 1)]]"
  b"actual= [[('ss1', 'DISK_GB', 1500),"
  b"  ('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss1', 'SRIOV_NET_VF', 1)],"
  b" [('ss1', 'DISK_GB', 1500),"
  b"  ('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss2', 'SRIOV_NET_VF', 1)],"
  b" [('ss1', 'DISK_GB', 1500),"
  b"  ('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss2', 'SRIOV_NET_VF', 1)],"
  b" [('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss1', 'SRIOV_NET_VF', 1),"
  b"  ('ss2', 'DISK_GB', 1500)],"
  b" [('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss2', 'DISK_GB', 1500),"
  b"  ('ss2', 'SRIOV_NET_VF', 1)],"
  b" [('ss1', 'IPV4_ADDRESS', 2),"
  b"  ('ss2', 'DISK_GB', 1500),"
  b"  ('ss2', 'SRIOV_NET_VF', 1)]]"

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784577/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1736101] Re: nova placement resource_providers DBDuplicateEntry when moving host between cells

2018-07-25 Thread Chris Dent
While the behavior on this is as described: you can't move a resource
provider between cells, that's how things are designed. You no longer
get the DB error, instead the 409 happens.

So I think this is invalid, working as designed.

That the design is imperfect is a different problem...

** Changed in: nova
   Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1736101

Title:
  nova placement resource_providers DBDuplicateEntry when moving host
  between cells

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  OpenStack Version: Pike

  I have two compute nodes with same name. But only one record can be 
successfully created in resource_providers table.
  When resource_providers.name repeat, the record can not insert, and get error 
message:
Uncaught exception: DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, 
u"Duplicate entry 'cvk17(CVM172.25.19.80)'for key 
'uniq_resource_providers0name'") [SQL: u'INSERT INTO resource_providers 
(created_at, updated_at, uuid, name, generation) VALUES...

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1736101/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1783130] [NEW] placement reshaper doesn't clearing all inventories for a resource provider

2018-07-23 Thread Chris Dent
Public bug reported:

The /reshaper API is willing to accept an empty dictionary for the
inventories attribute of a resource provider. This is intended to mean
"clear all the inventory".

However, the backend transformer code is not prepared to handle this:

  File "nova/api/openstack/placement/handlers/reshaper.py", line 103, in 
reshape
rp_obj.reshape(context, inventory_by_rp, allocation_objects)
  File 
"/mnt/share/cdentsrc/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
 line 993, in wrapper
return fn(*args, **kwargs)
  File "nova/api/openstack/placement/objects/resource_provider.py", line 
4087, in reshape
rp = new_inv_list[0].resource_provider
  File 
"/mnt/share/cdentsrc/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py",
 line 829, in __getitem__
return self.objects[index]
IndexError: list index out of range

This happening because 'new_inv_list' can be empty at

for rp_uuid, new_inv_list in inventories.items():   
  
LOG.debug("reshaping: *interim* inventory replacement for provider %s", 
  
  rp_uuid)  
  
rp = new_inv_list[0].resource_provider  

If the length of new_inv_list is zero we need to do nothing for this
iteration though the loop.

Then a few lines later at

for rp_uuid, new_inv_list in inventories.items():   
LOG.debug("reshaping: *final* inventory replacement for provider %s",   
  rp_uuid)  
# TODO(efried): If we wanted this to be more efficient, we could keep   
# track of providers for which all inventories are being deleted in the 
# above loop and just do those and skip the rest, since they're already 
# in their final form.  
new_inv_list[0].resource_provider.set_inventory(new_inv_list)   

We have the same IndexError problem and need to behave differently.

A thing we might do is instead of using the resource_provider object on
the (maybe not there) inventory objects, is create a new object: we have
the rp_uuid.

** Affects: nova
 Importance: Medium
 Status: Confirmed


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1783130

Title:
  placement reshaper doesn't clearing all inventories for a resource
  provider

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  The /reshaper API is willing to accept an empty dictionary for the
  inventories attribute of a resource provider. This is intended to mean
  "clear all the inventory".

  However, the backend transformer code is not prepared to handle this:

File "nova/api/openstack/placement/handlers/reshaper.py", line 103, in 
reshape
  rp_obj.reshape(context, inventory_by_rp, allocation_objects)
File 
"/mnt/share/cdentsrc/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
 line 993, in wrapper
  return fn(*args, **kwargs)
File "nova/api/openstack/placement/objects/resource_provider.py", line 
4087, in reshape
  rp = new_inv_list[0].resource_provider
File 
"/mnt/share/cdentsrc/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_versionedobjects/base.py",
 line 829, in __getitem__
  return self.objects[index]
  IndexError: list index out of range

  This happening because 'new_inv_list' can be empty at

  for rp_uuid, new_inv_list in inventories.items(): 

  LOG.debug("reshaping: *interim* inventory replacement for provider 
%s",   
rp_uuid)

  rp = new_inv_list[0].resource_provider  

  If the length of new_inv_list is zero we need to do nothing for this
  iteration though the loop.

  Then a few lines later at

  for rp_uuid, new_inv_list in inventories.items(): 
  
  LOG.debug("reshaping: *final* inventory replacement for provider %s", 
  
rp_uuid)
  
  # TODO(efried): If we wanted this to be more efficient, we could keep 
  
  # track of providers for which all inventories are being deleted in 
the 
  # above loop and just do those and skip the rest, since they're 
already 
  # in their final form.
  
  new_inv_list[0].resource_provider.set_inventory(new_inv_list) 
  

  We have the same IndexError problem and need to behave differently.

  A thing we 

[Yahoo-eng-team] [Bug 1782340] [NEW] allocation schema does not set additionalProperties False in all the right places

2018-07-18 Thread Chris Dent
Public bug reported:

In microversion 1.12 of placement, a schema for allocations was
introduced that required the allocations, project_id and user_id fields.
This schema is used for subsequent microversions, copied and manipulated
as required.

However, it has a flaw. It does not set additionalProperties: False for
the object at which the required fields are set. This means you can add
a field and it glides through. This flaw cascades all the way to the
reshaper (where I had a test that refused to fail in the expected way).

The diff below demonstrates the problem and a potential fix, but this
fix may not be right as it is in the 1.12 microversion and we might want
it on microversion 1.30 and beyond, only (which is a pain).

I think we should just fix it, as below, but I'll let others chime in
too.


diff --git a/nova/api/openstack/placement/schemas/allocation.py 
b/nova/api/openstack/placement/schemas/allocation.py
index e149ae3beb..796b7c5d01 100644
--- a/nova/api/openstack/placement/schemas/allocation.py
+++ b/nova/api/openstack/placement/schemas/allocation.py
@@ -113,8 +113,9 @@ ALLOCATION_SCHEMA_V1_12 = {
 "type": "string",
 "minLength": 1,
 "maxLength": 255
-}
+}
 },
+"additionalProperties": False,
 "required": [
 "allocations",
 "project_id",
diff --git 
a/nova/tests/functional/api/openstack/placement/gabbits/allocations-1.28.yaml 
b/nova/tests/functional/api/openstack/placement/gabbits/allocations-1.28.yaml
index df8fadd66b..04107a996b 100644
--- 
a/nova/tests/functional/api/openstack/placement/gabbits/allocations-1.28.yaml
+++ 
b/nova/tests/functional/api/openstack/placement/gabbits/allocations-1.28.yaml
@@ -238,7 +238,9 @@ tests:
   project_id: $ENVIRON['PROJECT_ID']
   user_id: $ENVIRON['USER_ID']
   consumer_generation: null
-  status: 204
+  bad_field: moo
+  #status: 204
+  status: 400
 
 - name: put that allocation to existing consumer
   PUT: /allocations/----

** Affects: nova
 Importance: Medium
 Status: Confirmed


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1782340

Title:
  allocation schema does not set additionalProperties False in all the
  right places

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  In microversion 1.12 of placement, a schema for allocations was
  introduced that required the allocations, project_id and user_id
  fields. This schema is used for subsequent microversions, copied and
  manipulated as required.

  However, it has a flaw. It does not set additionalProperties: False
  for the object at which the required fields are set. This means you
  can add a field and it glides through. This flaw cascades all the way
  to the reshaper (where I had a test that refused to fail in the
  expected way).

  The diff below demonstrates the problem and a potential fix, but this
  fix may not be right as it is in the 1.12 microversion and we might
  want it on microversion 1.30 and beyond, only (which is a pain).

  I think we should just fix it, as below, but I'll let others chime in
  too.


  diff --git a/nova/api/openstack/placement/schemas/allocation.py 
b/nova/api/openstack/placement/schemas/allocation.py
  index e149ae3beb..796b7c5d01 100644
  --- a/nova/api/openstack/placement/schemas/allocation.py
  +++ b/nova/api/openstack/placement/schemas/allocation.py
  @@ -113,8 +113,9 @@ ALLOCATION_SCHEMA_V1_12 = {
   "type": "string",
   "minLength": 1,
   "maxLength": 255
  -}
  +}
   },
  +"additionalProperties": False,
   "required": [
   "allocations",
   "project_id",
  diff --git 
a/nova/tests/functional/api/openstack/placement/gabbits/allocations-1.28.yaml 
b/nova/tests/functional/api/openstack/placement/gabbits/allocations-1.28.yaml
  index df8fadd66b..04107a996b 100644
  --- 
a/nova/tests/functional/api/openstack/placement/gabbits/allocations-1.28.yaml
  +++ 
b/nova/tests/functional/api/openstack/placement/gabbits/allocations-1.28.yaml
  @@ -238,7 +238,9 @@ tests:
 project_id: $ENVIRON['PROJECT_ID']
 user_id: $ENVIRON['USER_ID']
 consumer_generation: null
  -  status: 204
  +  bad_field: moo
  +  #status: 204
  +  status: 400
   
   - name: put that allocation to existing consumer
 PUT: /allocations/----

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1782340/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1731072] Re: AllocationCandidates.get_by_filters returns garbage with multiple aggregates

2018-07-06 Thread Chris Dent
As the main issue here has been sort of addressed and there are many
related other issues, it's better to close this and deal with the more
granular issues as they come up.

** Changed in: nova
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1731072

Title:
  AllocationCandidates.get_by_filters returns garbage with multiple
  aggregates

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  I set up a test scenario with multiple providers (sharing and non),
  across multiple aggregates.  Requesting allocation candidates gives
  some candidates as expected, but some are garbled.  Bad behaviors
  include:

  (1) When inventory in a given RC is provided both by a non-sharing and a 
sharing RP in an aggregate, the sharing RP is ignored in the results (this is 
tracked via https://bugs.launchpad.net/nova/+bug/1724613)
  (2) When inventory in a given RC is provided solely by a sharing RP, I don't 
get the expected candidate where that sharing RP provides that inventory and 
the rest comes from the non-sharing RP.
  (3) The above applies when there are multiple sharing RPs in the same 
aggregate providing that same shared resource.
  (4) ...and also when the sharing RPs provide different resources.

  And we get a couple of unexpected candidates that are really garbled:

  (5) Where there are multiple sharing RPs with different resources, one 
candidate has the expected resources from the non-sharing RP and one of the 
sharing RPs, but is missing the third requested resource entirely.
  (6) With that same setup, we get another candidate that has the expected 
resource from the non-sharing RP; but duplicated resources spread across 
multiple sharing RPs from *different* *aggregates*.  This one is also missing 
one of the requested resources entirely.

  I will post a commit shortly that demonstrates this behavior.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1731072/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1765376] Re: nova scheduler log contains html

2018-07-06 Thread Chris Dent
Marking as fix released. If it becomes an issue we can consider
backporting it.

** Changed in: nova
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1765376

Title:
  nova scheduler log contains html

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===
  The nova scheduler log contains some HTML (see actual result). Log files 
should not contain any HTML. I suspect some wrong error message parsing here.

  
  Actual result
  =
  Log Message:
  
  2018-04-19 13:12:53.109 12125 WARNING nova.scheduler.client.report 
[req-a35b4d7e-9914-48f1-b912-27cedb6eebdd cd9715e9b4714bc6b4d77f15f12ba5a9 
fa976f761aad4d378706dfc26ddf6004 - default default] Unable to submit allocation 
for instance 6dc8e703-1174-499d-aa9b-4d05f83b7784 (409 
   
    409 Conflict
   
   
    409 Conflict
    There was a conflict when trying to complete your request.
  Unable to allocate inventory: Unable to create allocation for 'VCPU' on 
resource provider '322b4b21-f3ff-4d59-b6c8-8c1a9fe2b530'. The requested amount 
would violate inventory constraints.

   
  )
  

  Steps to reproduce
  ==
  This log line occurred after trying to live-migrate a vm from one node to 
another. obviously the request failed to do so.

  Expected result
  ===
  The log line should be better formatted. for example like this:

  
  2018-04-19 13:12:53.109 12125 WARNING nova.scheduler.client.report 
[req-a35b4d7e-9914-48f1-b912-27cedb6eebdd cd9715e9b4714bc6b4d77f15f12ba5a9 
fa976f761aad4d378706dfc26ddf6004 - default default] Unable to submit allocation 
for instance 6dc8e703-1174-499d-aa9b-4d05f83b7784 - HTTP Error 409 - There was 
a conflict when trying to complete your request. Unable to allocate inventory: 
Unable to create allocation for 'VCPU' on resource provider 
'322b4b21-f3ff-4d59-b6c8-8c1a9fe2b530'. The requested amount would violate 
inventory constraints.
  

  Environment
  ===
  Ubuntu 16.04 with the following packages from Ubuntu Cloud Archive:

  nova-api   2:16.0.4-0ubuntu1~cloud0
  nova-common2:16.0.4-0ubuntu1~cloud0
  nova-conductor 2:16.0.4-0ubuntu1~cloud0
  nova-consoleauth   2:16.0.4-0ubuntu1~cloud0
  nova-novncproxy2:16.0.4-0ubuntu1~cloud0
  nova-placement-api 2:16.0.4-0ubuntu1~cloud0
  nova-scheduler 2:16.0.4-0ubuntu1~cloud0
  python-nova2:16.0.4-0ubuntu1~cloud0
  python-novaclient  2:9.1.0-0ubuntu1~cloud0

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1765376/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1683858] Re: Allocation records do not contain overhead information

2018-06-27 Thread Chris Dent
Gonna kills this one. We seem to have reached the consensus that
overhead that an operator may manage however they like, it is not
something we will generically manage.

In the future it might make sense for the virt drivers to handle
overhead via resereved when they are working with update_provider_tree.

** Changed in: nova
   Status: Triaged => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1683858

Title:
  Allocation records do not contain overhead information

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  Some virt drivers report additional overhead per instance for memory
  and disk usage on a compute node. That is not reported in the
  allocations records for a given instance on a resource provider
  (compute node), however:

  
https://github.com/openstack/nova/blob/15.0.0/nova/scheduler/client/report.py#L157

  It is used as part of the claim test on the compute when creating an
  instance or moving an instance. For creating an instance, that's done
  here:

  
https://github.com/openstack/nova/blob/15.0.0/nova/compute/resource_tracker.py#L144-L156

  https://github.com/openstack/nova/blob/15.0.0/nova/compute/claims.py#L165

  Where Claim.memory_mb is the instance.flavor.memory_mb + overhead:

  https://github.com/openstack/nova/blob/15.0.0/nova/compute/claims.py#L106

  So ultimately what we claim on the compute node is not what we report
  to placement for allocations for that instance. This matters because
  when the filter scheduler is asking placement for a list of resource
  providers that can fit a given request memory_mb and disk_gb it relies
  on the inventory for the compute node resource provider and the
  existing usage (allocations) for that provider, and we aren't
  reporting the full story to placement.

  This could lead to placement telling the filter scheduler there is
  room to place an instance on a given compute node when in fact that
  could fail the claim once we get to the host, which would results in a
  retry of the build on another host (which can be expensive).

  Also, when we start having multi-cell support with a top-level
  conductor that the computes can't reach, we won't have build retries
  anymore, so you'd just fail the claim and the build would be done and
  the instance would go to ERROR state. So it's critical that the
  placement service has the proper information for making the correct
  decision on the first try.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1683858/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708958] Re: disabling a compute service does not disable the resource provider

2018-06-27 Thread Chris Dent
** Changed in: nova
   Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708958

Title:
  disabling a compute service does not disable the resource provider

Status in OpenStack Compute (nova):
  Won't Fix

Bug description:
  If you make a multi node devstack (nova master as of August 6th,
  2017), or otherwise have multiple compute nodes, all of those compute
  nodes will create resource providers and relevant inventory.

  Later if you disable one of the compute nodes with a nova service-
  disable {service id}, that nova-compute will be disabled, but the
  associated resource provider will still exist with legit inventory in
  the placement service.

  This will mean that /allocation_candidates or /resource_providers
  will return results including the disabled compute node, but they will
  be bogus.

  It's not clear what the right behaviour is here. Should the rp of the
  disabled service be deleted? Have its inventory truncated? If there
  are other hosts available that satisfy the request, things go forward
  as desired, so there's not a functional bug here, but the data in
  placement is incorrect, which is undesirable.

  (On a related note, if you delete a compute node's resource provider
  from the placement service and don't restart the associated nova-
  compute, the _ensure_resource_provider method does _not_ create the
  resource provider anew because the _resource_providers dict still
  contains the uuid. This might be expected behaviour but it surprised
  me while I was messing around.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708958/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1778743] [NEW] When POSTing to /allocations with multiple consumers it is possible violate inventory capacity constraints

2018-06-26 Thread Chris Dent
Public bug reported:

This is using microversion 1.28 of the placement API. I will start the
process of finding when this went wrong after submitting this bug. I'm
guessing at the start of POST to /allocations, but we'll see.

When a POST to /allocations contains multiple consumers each writing
some of their allocations to the same consumer, it is possible to "beat"
the inventory constraints.

In the gabbi test at http://paste.openstack.org/show/724317/ a resource
provider is created with an inventory of 2 VCPU. A POST is then made to
/allocations with three different consumers, each asking for 1 VCPU.
This works, and it should not because we then end up with a usage of 3
VCPU.


I found this while trying to chase some issues with consumer generations and 
allocations and fell into this hole.

** Affects: nova
 Importance: Medium
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1778743

Title:
  When POSTing to /allocations with multiple consumers it is possible
  violate inventory capacity constraints

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This is using microversion 1.28 of the placement API. I will start the
  process of finding when this went wrong after submitting this bug. I'm
  guessing at the start of POST to /allocations, but we'll see.

  When a POST to /allocations contains multiple consumers each writing
  some of their allocations to the same consumer, it is possible to
  "beat" the inventory constraints.

  In the gabbi test at http://paste.openstack.org/show/724317/ a
  resource provider is created with an inventory of 2 VCPU. A POST is
  then made to /allocations with three different consumers, each asking
  for 1 VCPU. This works, and it should not because we then end up with
  a usage of 3 VCPU.

  
  I found this while trying to chase some issues with consumer generations and 
allocations and fell into this hole.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1778743/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1778591] [NEW] GET /allocations/{uuid} on a consumer with no allocations provides no generation

2018-06-25 Thread Chris Dent
Public bug reported:

If we write some allocations with PUT /allocations/{uuid} at modern
microversions, a consumer record is created for {uuid} and a generation
is created for that consumer. Each subsequent attempt to PUT
/allocations/{uuid} must include a matching consumer generation.

If the allocations for a consumer are cleared (either DELETE, or PUT
/allocations/{uuid} with an empty dict of allocations) two things go
awry:

* the consumer record, with a generation, stays around
* GET /allocations/{uuid} returns the following:

   {u'allocations': {}}

That is, no generation is provided, and we have no way figure one out
other than inspecting the details of the error response.

Some options to address this:

* Return the generation in that response
* When the allocations for a consumer go empty, remove the consumer
* Something else?

** Affects: nova
 Importance: Medium
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1778591

Title:
  GET /allocations/{uuid} on a consumer with no allocations provides no
  generation

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  If we write some allocations with PUT /allocations/{uuid} at modern
  microversions, a consumer record is created for {uuid} and a
  generation is created for that consumer. Each subsequent attempt to
  PUT /allocations/{uuid} must include a matching consumer generation.

  If the allocations for a consumer are cleared (either DELETE, or PUT
  /allocations/{uuid} with an empty dict of allocations) two things go
  awry:

  * the consumer record, with a generation, stays around
  * GET /allocations/{uuid} returns the following:

 {u'allocations': {}}

  That is, no generation is provided, and we have no way figure one out
  other than inspecting the details of the error response.

  Some options to address this:

  * Return the generation in that response
  * When the allocations for a consumer go empty, remove the consumer
  * Something else?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1778591/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1778576] [NEW] making new allocations for one consumer against multiple resource providers fails with 409

2018-06-25 Thread Chris Dent
Public bug reported:

If you PUT some allocations for a new consumer (thus no generation), and
those allocations are against more than one resource provider, a 409
failure will happen with:

consumer generation conflict - expected 0 but got None

This because in _new_allocations in handlers/allocation.py we always use
the generation provided in the incoming data when we call
util.ensure_consumer. This works for the first resource provider but
then on the second one the consumer exists, so our generation has to be
different now.

One possible fix (already in progress) is to use the generation from
new_allocations[0].consumer.generation in subsequent trips round the
loop calling _new_allocations.

I guess we must have missed some test cases. I'll make sure to add some
when working on this. I found the problem with my placecat stuff.

** Affects: nova
 Importance: High
 Assignee: Chris Dent (cdent)
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1778576

Title:
  making new allocations for one consumer against multiple resource
  providers fails with 409

Status in OpenStack Compute (nova):
  New

Bug description:
  If you PUT some allocations for a new consumer (thus no generation),
  and those allocations are against more than one resource provider, a
  409 failure will happen with:

  consumer generation conflict - expected 0 but got None

  This because in _new_allocations in handlers/allocation.py we always
  use the generation provided in the incoming data when we call
  util.ensure_consumer. This works for the first resource provider but
  then on the second one the consumer exists, so our generation has to
  be different now.

  One possible fix (already in progress) is to use the generation from
  new_allocations[0].consumer.generation in subsequent trips round the
  loop calling _new_allocations.

  I guess we must have missed some test cases. I'll make sure to add
  some when working on this. I found the problem with my placecat stuff.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1778576/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1778071] [NEW] The placement consumer generation conflict error message can be misleading

2018-06-21 Thread Chris Dent
Public bug reported:

When using consumer generations to create new allocations, the value of
the generation is expected to be None on the python side, and 'null' in
JSON. The error response sent over the API says "expected None but got
1" which doesn't help much since the api is in JSON.

** Affects: nova
 Importance: Low
 Assignee: Eric Fried (efried)
 Status: In Progress


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1778071

Title:
  The placement consumer generation conflict error message can be
  misleading

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When using consumer generations to create new allocations, the value
  of the generation is expected to be None on the python side, and
  'null' in JSON. The error response sent over the API says "expected
  None but got 1" which doesn't help much since the api is in JSON.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1778071/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1776668] [NEW] the placement version discovery doc at / does't have a status field, it should

2018-06-13 Thread Chris Dent
Public bug reported:

Version discovery docs are supposed to have a status field:
http://specs.openstack.org/openstack/api-
wg/guidelines/microversion_specification.html#version-discovery

Placement's does not. This was probably an oversight resulting from
casual attention to detail since placement only has one version.

This is easily fixable and easily backportable, so I'll get on that.

This is causing problems for at least mnaser when trying to write his
own client code.

** Affects: nova
 Importance: Medium
 Assignee: Chris Dent (cdent)
 Status: Triaged


** Tags: placement queens-backport-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1776668

Title:
  the placement version discovery doc at / does't have a status field,
  it should

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Version discovery docs are supposed to have a status field:
  http://specs.openstack.org/openstack/api-
  wg/guidelines/microversion_specification.html#version-discovery

  Placement's does not. This was probably an oversight resulting from
  casual attention to detail since placement only has one version.

  This is easily fixable and easily backportable, so I'll get on that.

  This is causing problems for at least mnaser when trying to write his
  own client code.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1776668/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1775308] [NEW] Listing placement usages (total or per resource provider) in a new process can result in a 500

2018-06-05 Thread Chris Dent
Public bug reported:

When requesting /usages or /resource_providers/{uuid}/usages it is
possible to cause a 500 error if placement is running in a multi-process
scenario and the usages query is the first request a process has
received. This is because the methods which provide UsageLists do not
_ensure_rc_cache, resulting in:

  File 
"/usr/lib/python3.6/site-packages/nova/api/openstack/placement/objects/resource_provider.py",
 line 2374, in _from_db_object
   rc_str = _RC_CACHE.string_from_id(source['resource_class_id'])
   AttributeError: 'NoneType' object has no attribute 'string_from_id'

We presumably don't see this in our usual testing because any process
has already had other requests happen, setting the cache.

For now, the fix is to add the _ensure_rc_cache call in the right
places, but long term if/when we switch to the os-resource-class model
we can do the caching or syncing a bit differently (see
https://review.openstack.org/#/c/553857/ for an example).

** Affects: nova
 Importance: Medium
 Assignee: Chris Dent (cdent)
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1775308

Title:
  Listing placement usages (total or per resource provider) in a new
  process can result in a 500

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  When requesting /usages or /resource_providers/{uuid}/usages it is
  possible to cause a 500 error if placement is running in a multi-
  process scenario and the usages query is the first request a process
  has received. This is because the methods which provide UsageLists do
  not _ensure_rc_cache, resulting in:

File 
"/usr/lib/python3.6/site-packages/nova/api/openstack/placement/objects/resource_provider.py",
 line 2374, in _from_db_object
 rc_str = _RC_CACHE.string_from_id(source['resource_class_id'])
 AttributeError: 'NoneType' object has no attribute 'string_from_id'

  We presumably don't see this in our usual testing because any process
  has already had other requests happen, setting the cache.

  For now, the fix is to add the _ensure_rc_cache call in the right
  places, but long term if/when we switch to the os-resource-class model
  we can do the caching or syncing a bit differently (see
  https://review.openstack.org/#/c/553857/ for an example).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1775308/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1773225] [NEW] placement needs to stop using accept.best_match from webob it is deprecated

2018-05-24 Thread Chris Dent
Public bug reported:

Modern webob has improved its management of accept headers to be more in
alignment with the HTTP RFCs (see bug
https://bugs.launchpad.net/nova/+bug/1765748 ), deprecating their old
handling:

DeprecationWarning: The behavior of AcceptValidHeader.best_match is
currently being maintained for backward compatibility, but it will be
deprecated in the future, as it does not conform to the RFC.

Eventually placement (in nova.api.openstack.placement.util:check_accept)
should be updated to use the new way.

Creating a separate bug to be task oriented.

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1773225

Title:
  placement needs to stop using accept.best_match from webob it is
  deprecated

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Modern webob has improved its management of accept headers to be more
  in alignment with the HTTP RFCs (see bug
  https://bugs.launchpad.net/nova/+bug/1765748 ), deprecating their old
  handling:

  DeprecationWarning: The behavior of AcceptValidHeader.best_match is
  currently being maintained for backward compatibility, but it will be
  deprecated in the future, as it does not conform to the RFC.

  Eventually placement (in
  nova.api.openstack.placement.util:check_accept) should be updated to
  use the new way.

  Creating a separate bug to be task oriented.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1773225/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1771384] [NEW] placement api send text/html error responses when accept headers is '*/*'

2018-05-15 Thread Chris Dent
Public bug reported:

In change https://review.openstack.org/#/c/518223/ the placement service
was adjusted to default to application/json as the accept header when no
accept header is present. This was done to control the automatic content
negotiation that webob does for formatting error responses. By default,
it wants to send text/html error responses, which we do not want in
placement.

Unfortunately, the fix in that change is incomplete. If a client sends
'*/*' as the accept header, what they are saying is "we'll take whatever
you want to send". Since the above fix is trying to say "the default is
JSON" we should consider '*/*' as 'application/json'.

We can fix this relatively easily and without impact on the error
processing that inspects error strings, as the strings will remain the
same. In some cases, they will have a different format.

** Affects: nova
 Importance: Undecided
 Assignee: Chris Dent (cdent)
 Status: In Progress


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1771384

Title:
  placement api send text/html error responses when accept headers is
  '*/*'

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  In change https://review.openstack.org/#/c/518223/ the placement
  service was adjusted to default to application/json as the accept
  header when no accept header is present. This was done to control the
  automatic content negotiation that webob does for formatting error
  responses. By default, it wants to send text/html error responses,
  which we do not want in placement.

  Unfortunately, the fix in that change is incomplete. If a client sends
  '*/*' as the accept header, what they are saying is "we'll take
  whatever you want to send". Since the above fix is trying to say "the
  default is JSON" we should consider '*/*' as 'application/json'.

  We can fix this relatively easily and without impact on the error
  processing that inspects error strings, as the strings will remain the
  same. In some cases, they will have a different format.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1771384/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1770220] [NEW] report client allocation retry handling insufficient

2018-05-09 Thread Chris Dent
Public bug reported:

In stress testing of a nova+placement scenario where there is only one
nova-compute process (and thus only one resource provider) but more than
one thread worth of nova-scheduler it is fairly easy to trigger the
"Failed scheduler client operation claim_resources: out of retries:
Retry" error found near
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L110

(In a quick test on a devstack with a fake compute driver, 100 separate
requests to boot one server, 13 failed for this reason.)

If we imagine 4 threads:

* A is one nova-scheduler
* B is one placement request/response
* C is another nova-scheduler
* D is a different placement request/request

A starts a PUT to /allocations, request B, at the start of which it
reads the resource provider and gets a generation and the for whatever
reason waits for a while. Then C starts a PUT to /allocations, request
D, reads the same resource provider, same generation, but actually
completes, getting to increment generation before B.

When B gets to increment generation, it fails because now the generation
it has is no good for the increment procedure.

This is all working as expected but apparently is not ideal for high
concurrency with low numbers of compute nodes.

The currently retry loop has no sleep() and only counts up to 3 retries.
It might make sense for it to do a random sleep before retrying (so as
to introduce a bit of jitter in the system), and perhaps retry more
times.

Input desired. Thoughts?

Another option, of course, is "don't run with so few compute nodes", but
as we can likely expect this kind of stress testing (it was a real life
stress test that worked fine in older (pre-claims-in-the-scheduler)
versions that exposed this) we may wish to make it happier.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1770220

Title:
  report client allocation retry handling insufficient

Status in OpenStack Compute (nova):
  New

Bug description:
  In stress testing of a nova+placement scenario where there is only one
  nova-compute process (and thus only one resource provider) but more
  than one thread worth of nova-scheduler it is fairly easy to trigger
  the "Failed scheduler client operation claim_resources: out of
  retries: Retry" error found near
  
https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L110

  (In a quick test on a devstack with a fake compute driver, 100
  separate requests to boot one server, 13 failed for this reason.)

  If we imagine 4 threads:

  * A is one nova-scheduler
  * B is one placement request/response
  * C is another nova-scheduler
  * D is a different placement request/request

  A starts a PUT to /allocations, request B, at the start of which it
  reads the resource provider and gets a generation and the for whatever
  reason waits for a while. Then C starts a PUT to /allocations, request
  D, reads the same resource provider, same generation, but actually
  completes, getting to increment generation before B.

  When B gets to increment generation, it fails because now the
  generation it has is no good for the increment procedure.

  This is all working as expected but apparently is not ideal for high
  concurrency with low numbers of compute nodes.

  The currently retry loop has no sleep() and only counts up to 3
  retries. It might make sense for it to do a random sleep before
  retrying (so as to introduce a bit of jitter in the system), and
  perhaps retry more times.

  Input desired. Thoughts?

  Another option, of course, is "don't run with so few compute nodes",
  but as we can likely expect this kind of stress testing (it was a real
  life stress test that worked fine in older (pre-claims-in-the-
  scheduler) versions that exposed this) we may wish to make it happier.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1770220/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1765204] [NEW] Listing resource providers in placement with a postgresql backend gets a group by error

2018-04-18 Thread Chris Dent
Public bug reported:

When listing resource providers in placement with a postgresql database,
this error recently started showing up:

column "root_rp.uuid" must appear in the GROUP BY clause or be used
in an aggregate function

And then once you fix that it says the same for parent_rp.uuid.

It's not clear when this problem came on the scene, since we don't test
regularly with postgresql, but I do use it in my experiments with
placement in a container [1] and it wasn't there as of April 6th, and
probably a bit later.

Work is in progress to fix this as well as turn on an experimental job
for checking with postgres every now and again.


[1] https://hub.docker.com/r/cdent/placedock/

** Affects: nova
 Importance: Low
 Assignee: Chris Dent (cdent)
 Status: In Progress


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1765204

Title:
  Listing resource providers in placement with a postgresql backend gets
  a group by error

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When listing resource providers in placement with a postgresql
  database, this error recently started showing up:

  column "root_rp.uuid" must appear in the GROUP BY clause or be
  used in an aggregate function

  And then once you fix that it says the same for parent_rp.uuid.

  It's not clear when this problem came on the scene, since we don't
  test regularly with postgresql, but I do use it in my experiments with
  placement in a container [1] and it wasn't there as of April 6th, and
  probably a bit later.

  Work is in progress to fix this as well as turn on an experimental job
  for checking with postgres every now and again.

  
  [1] https://hub.docker.com/r/cdent/placedock/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1765204/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1761295] [NEW] placement aggregate handler imports wrong exception module

2018-04-04 Thread Chris Dent
Public bug reported:

The placement/handlers/aggregate.py handler imports nova.exception and
then tries to use exception.ConcurrentUpdateDetected. Since the move of
exceptions to the placement namespace, that exception is no longer in
the module.

However, we're not seeing an error because this piece of code is very
very rarely called and we don't have tests (can) cover it.

I just happened to notice while doing a readthrough.

** Affects: nova
 Importance: Medium
 Assignee: Chris Dent (cdent)
 Status: In Progress


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1761295

Title:
  placement aggregate handler imports wrong exception module

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  The placement/handlers/aggregate.py handler imports nova.exception and
  then tries to use exception.ConcurrentUpdateDetected. Since the move
  of exceptions to the placement namespace, that exception is no longer
  in the module.

  However, we're not seeing an error because this piece of code is very
  very rarely called and we don't have tests (can) cover it.

  I just happened to notice while doing a readthrough.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1761295/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1759863] [NEW] placement functional tests can collide when synchronising the traits table

2018-03-29 Thread Chris Dent
Public bug reported:

The placement functional tests make use of the traits table. At the
start of most requests to the objects in resource_provider.py some code
is run to make sure that traits in the os-traits library are
synchronised to the table.

A global flag is present which says "I've already synchronised".
Functional tests are responsible for making sure this is in the right
state.

It turns out that this management was not complete, and after a recent
move of db/test_resource_provider.py and
db/test_allocation_candidates.py within the placement hierarchy the
delicate balance of how tests are split among processes by stestr was
upset. This leads to test failures where no traits are in the traits
table.

The fix is to ensure that functional tests manage the related db flags
both during setup and teardown and not rely solely on one or the other
(as people can easily get it wrong).

** Affects: nova
 Importance: Medium
 Assignee: Chris Dent (cdent)
 Status: Triaged


** Tags: placement testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1759863

Title:
  placement functional tests can collide when synchronising the traits
  table

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  The placement functional tests make use of the traits table. At the
  start of most requests to the objects in resource_provider.py some
  code is run to make sure that traits in the os-traits library are
  synchronised to the table.

  A global flag is present which says "I've already synchronised".
  Functional tests are responsible for making sure this is in the right
  state.

  It turns out that this management was not complete, and after a recent
  move of db/test_resource_provider.py and
  db/test_allocation_candidates.py within the placement hierarchy the
  delicate balance of how tests are split among processes by stestr was
  upset. This leads to test failures where no traits are in the traits
  table.

  The fix is to ensure that functional tests manage the related db flags
  both during setup and teardown and not rely solely on one or the other
  (as people can easily get it wrong).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1759863/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1758057] [NEW] When creating uuid-based entities we can duplicate UUIDs

2018-03-22 Thread Chris Dent
Public bug reported:

It is possible to create two different resource providers (and probably
other entities) with the same UUID by creating one with '-' and the
other without. This is because in both json schema and ovo validate
UUIDs using the same route (different code but same concept): with or
without - is okay.

Then, we save these strings into a column in the database which is not a
uuid type, instead it is varchar 36.

Thus we can make this happen (gabbi format):

-=-=-
# Some tests to see if different representations of the
# same uuid result in different resource providers.

fixtures:
- APIFixture

defaults:
request_headers:
x-auth-token: admin
accept: application/json
content-type: application/json
openstack-api-version: placement latest

tests:
- name: create dashed
  POST: /resource_providers
  data:
  name: dashed
  uuid: b7c31381-0cd6-421c-a2d2-009d645615dc

- name: create not dashed
  POST: /resource_providers
  data:
  name: not dashed
  uuid: b7c313810cd6421ca2d2009d645615dc

- name: check length
  GET: /resource_providers
  response_json_paths:
  # This may be should be 1 but on current master is 2
  $.resource_providers.`len`: 1
-=-=-

We might be able to get away with this not being a problem except that
there is one place where we expected a dashed uuid for resource
providers: in the JSON schema for PUTting allocations in the dict
format:
https://github.com/openstack/nova/blob/master/nova/api/openstack/placement/schemas/allocation.py#L80

This happened because I couldn't figure out how to use a format checker
for a PatternProperties and wrote a pattern only accepting a 36 length
UUID.

This means we've got at least two potential problems:

* we can create a resource provider for which we can't write allocations 
(unless we use the older list style)
* clients have the potential to think they are using the same UUID when the 
placement server thinks they are not

We can solve this in a few different ways, this list is not mutually
exclusive:

* do nothing, expect people to do the right thing
* change the PatternProperty on allocation put to make dash optional
* continue accepting non-dashed input, but always dash them early in processing
* reject non-dashed input everywhere

And I haven't looked into consumer uuids, but I suspect there's some
ambiguity there too.

The root issue here, in case it is not clear, is that code in the wild
that we don't control that is creating and stringifying UUID may be
creating the non-dashed format.

** Affects: nova
 Importance: Undecided
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1758057

Title:
  When creating uuid-based  entities we can duplicate UUIDs

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  It is possible to create two different resource providers (and
  probably other entities) with the same UUID by creating one with '-'
  and the other without. This is because in both json schema and ovo
  validate UUIDs using the same route (different code but same concept):
  with or without - is okay.

  Then, we save these strings into a column in the database which is not
  a uuid type, instead it is varchar 36.

  Thus we can make this happen (gabbi format):

  -=-=-
  # Some tests to see if different representations of the
  # same uuid result in different resource providers.

  fixtures:
  - APIFixture

  defaults:
  request_headers:
  x-auth-token: admin
  accept: application/json
  content-type: application/json
  openstack-api-version: placement latest

  tests:
  - name: create dashed
POST: /resource_providers
data:
name: dashed
uuid: b7c31381-0cd6-421c-a2d2-009d645615dc

  - name: create not dashed
POST: /resource_providers
data:
name: not dashed
uuid: b7c313810cd6421ca2d2009d645615dc

  - name: check length
GET: /resource_providers
response_json_paths:
# This may be should be 1 but on current master is 2
$.resource_providers.`len`: 1
  -=-=-

  We might be able to get away with this not being a problem except that
  there is one place where we expected a dashed uuid for resource
  providers: in the JSON schema for PUTting allocations in the dict
  format:
  
https://github.com/openstack/nova/blob/master/nova/api/openstack/placement/schemas/allocation.py#L80

  This happened because I couldn't figure out how to use a format
  checker for a PatternProperties and wrote a pattern only accepting a
  36 length UUID.

  This means we've got at least two potential problems:

  * we can create a resource provider for which we can't write allocations 
(unless we use the older list style)
  * clients have the potential to think they are using the same UUID when the 
placement server 

[Yahoo-eng-team] [Bug 1756151] [NEW] placement os-traits sync checked every request

2018-03-15 Thread Chris Dent
Public bug reported:

Most requests to the placement service eventually reach the database
code in the resource_provider.py file and as a result eventually run
_ensure_trait_sync to make sure that this process has synced the os-
traits library to its traits database table.

While there is a flag to make sure that the syncing doesn't happen if it
already happened before, we lock around checking that flag for nearly
every request. This isn't the end of the world, but it is wasted
activity and it can make a lot of noise in the logs if you're running
DEBUG:

2018-03-15 17:35:37.653 7 DEBUG oslo_concurrency.lockutils 
[req-7904bbbe-52f7-420f-b058-13779dc018d1 admin admin - - -] Acquired semaphore 
"trait_sync" lock 
/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:212
2018-03-15 17:35:37.653 7 DEBUG oslo_concurrency.lockutils 
[req-7904bbbe-52f7-420f-b058-13779dc018d1 admin admin - - -] Releasing 
semaphore "trait_sync" lock 
/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:228

This is redundant. I'm pretty sure I wrote this code, so I'm not sure
what I was thinking, probably cargo culting off ensuring the resource
classes. We only need to sync the traits to the database once and we
only need to check if it has been done once per process. The traits are
read from a python module that is only imported once per process.

So, what we could do is move the calling of ensure_trait_sync to the
part of placement that establishes the database connection facade
thingie. If we merge https://review.openstack.org/#/c/541435/ (which
moves placement database connection establishment to its own file) or
something like it, we have an easy place to do it, at server boot time.

We find the session, make a context, do the sync, set the global and
never worry about it again. All the _ensure_trait_sync calls can be
removed.

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1756151

Title:
  placement os-traits sync checked every request

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Most requests to the placement service eventually reach the database
  code in the resource_provider.py file and as a result eventually run
  _ensure_trait_sync to make sure that this process has synced the os-
  traits library to its traits database table.

  While there is a flag to make sure that the syncing doesn't happen if
  it already happened before, we lock around checking that flag for
  nearly every request. This isn't the end of the world, but it is
  wasted activity and it can make a lot of noise in the logs if you're
  running DEBUG:

  2018-03-15 17:35:37.653 7 DEBUG oslo_concurrency.lockutils 
[req-7904bbbe-52f7-420f-b058-13779dc018d1 admin admin - - -] Acquired semaphore 
"trait_sync" lock 
/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:212
  2018-03-15 17:35:37.653 7 DEBUG oslo_concurrency.lockutils 
[req-7904bbbe-52f7-420f-b058-13779dc018d1 admin admin - - -] Releasing 
semaphore "trait_sync" lock 
/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:228

  This is redundant. I'm pretty sure I wrote this code, so I'm not sure
  what I was thinking, probably cargo culting off ensuring the resource
  classes. We only need to sync the traits to the database once and we
  only need to check if it has been done once per process. The traits
  are read from a python module that is only imported once per process.

  So, what we could do is move the calling of ensure_trait_sync to the
  part of placement that establishes the database connection facade
  thingie. If we merge https://review.openstack.org/#/c/541435/ (which
  moves placement database connection establishment to its own file) or
  something like it, we have an easy place to do it, at server boot
  time.

  We find the session, make a context, do the sync, set the global and
  never worry about it again. All the _ensure_trait_sync calls can be
  removed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1756151/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1749797] Re: placement returns 503 when keystone is down

2018-03-09 Thread Chris Dent
** Changed in: nova
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1749797

Title:
  placement returns 503 when keystone is down

Status in keystonemiddleware:
  Fix Released
Status in OpenStack Compute (nova):
  Invalid

Bug description:
  See the logs here: http://logs.openstack.org/50/544750/8/check/ironic-
  grenade-dsvm-multinode-multitenant/5713fb8/logs/screen-placement-
  api.txt.gz#_Feb_15_17_58_22_463228

  This is during an upgrade while Keystone is down. Placement returns a
  503 because it cannot reach keystone.

  I'm not sure what the expected behavior should be, but a 503 feels
  wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystonemiddleware/+bug/1749797/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1632852] Re: placement api responses should not be cacheable

2018-02-19 Thread Chris Dent
This is actually done now, but my use of partial-bug on both changes
made the automation not happen.

** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1632852

Title:
  placement api responses should not be cacheable

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  In version 1.0 of the placement API, responses are sent without any
  cache-busting headers. This means that the responses may be cached by
  the user-agent. It's not predictable.

  Caching of resource providers is not desired so it would be good to
  send cache headers to enforce that responses are not cached.

  This old document remains the bizness for learning how to do such
  things: https://www.mnot.net/cache_docs/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1632852/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750355] [NEW] nova.tests.unit.test_api_validation.PatternPropertiesTestCase.test_validate_patternProperties_fails fails in 3.6 because py3 check is limited to 3.5

2018-02-19 Thread Chris Dent
Public bug reported:


nova.tests.unit.test_api_validation.PatternPropertiesTestCase.test_validate_patternProperties_fails
 fails in 3.6 because py3 check is limited to 3.5 with:

```
Captured traceback:
~~~
b'Traceback (most recent call last):'
b'  File "/Users/cdent/src/nova/nova/api/validation/validators.py", line 
300, in validate'
b'self.validator.validate(*args, **kwargs)'
b'  File 
"/Users/cdent/src/nova/.tox/py36/lib/python3.6/site-packages/jsonschema/validators.py",
 line 129, in validate'
b'for error in self.iter_errors(*args, **kwargs):'
b'  File 
"/Users/cdent/src/nova/.tox/py36/lib/python3.6/site-packages/jsonschema/validators.py",
 line 105, in iter_errors'
b'for error in errors:'
b'  File 
"/Users/cdent/src/nova/.tox/py36/lib/python3.6/site-packages/jsonschema/_validators.py",
 line 14, in patternProperties'
b'if re.search(pattern, k):'
b'  File "/Users/cdent/src/nova/.tox/py36/lib/python3.6/re.py", line 182, 
in search'
b'return _compile(pattern, flags).search(string)'
b'TypeError: expected string or bytes-like object'
```

** Affects: nova
     Importance: Undecided
 Assignee: Chris Dent (cdent)
 Status: In Progress


** Tags: low-hanging-fruit testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750355

Title:
  
nova.tests.unit.test_api_validation.PatternPropertiesTestCase.test_validate_patternProperties_fails
  fails in 3.6 because py3 check is limited to 3.5

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  
  
nova.tests.unit.test_api_validation.PatternPropertiesTestCase.test_validate_patternProperties_fails
 fails in 3.6 because py3 check is limited to 3.5 with:

  ```
  Captured traceback:
  ~~~
  b'Traceback (most recent call last):'
  b'  File "/Users/cdent/src/nova/nova/api/validation/validators.py", line 
300, in validate'
  b'self.validator.validate(*args, **kwargs)'
  b'  File 
"/Users/cdent/src/nova/.tox/py36/lib/python3.6/site-packages/jsonschema/validators.py",
 line 129, in validate'
  b'for error in self.iter_errors(*args, **kwargs):'
  b'  File 
"/Users/cdent/src/nova/.tox/py36/lib/python3.6/site-packages/jsonschema/validators.py",
 line 105, in iter_errors'
  b'for error in errors:'
  b'  File 
"/Users/cdent/src/nova/.tox/py36/lib/python3.6/site-packages/jsonschema/_validators.py",
 line 14, in patternProperties'
  b'if re.search(pattern, k):'
  b'  File "/Users/cdent/src/nova/.tox/py36/lib/python3.6/re.py", line 182, 
in search'
  b'return _compile(pattern, flags).search(string)'
  b'TypeError: expected string or bytes-like object'
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750355/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1749797] Re: placement returns 503 when keystone is down

2018-02-16 Thread Chris Dent
** Also affects: keystonemiddleware
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1749797

Title:
  placement returns 503 when keystone is down

Status in keystonemiddleware:
  New
Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  See the logs here: http://logs.openstack.org/50/544750/8/check/ironic-
  grenade-dsvm-multinode-multitenant/5713fb8/logs/screen-placement-
  api.txt.gz#_Feb_15_17_58_22_463228

  This is during an upgrade while Keystone is down. Placement returns a
  503 because it cannot reach keystone.

  I'm not sure what the expected behavior should be, but a 503 feels
  wrong.

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystonemiddleware/+bug/1749797/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1749404] Re: nova-compute resource tracker ignores 'reserved' while reporting 'max_unit'

2018-02-14 Thread Chris Dent
I disagree with Sylvain on this one so going to re-open, but it is low-
ish priority because the impact isn't significant: if max_unit is
greater than reserved and allocation_ratio is 1 requesting a single
max_unit resource will fail in an expected way that does not involve
max_unit: total-reserved < requested resource.

Ideally we shouldn't have that conflict, so I think this is worth fixing
(setting max_unit to total-reserved). It's not that much of an upgrade
problem because compute nodes are constantly verifying and updating
their inventories anyway. If a bunch of compute nodes are restarted at
the same time there will be a lot of inventory writes, but this probably
won't be an issue[1].

[1] https://anticdent.org/placement-scale-fun.html

** Changed in: nova
   Status: Won't Fix => Triaged

** Changed in: nova
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1749404

Title:
  nova-compute resource tracker ignores 'reserved' while reporting
  'max_unit'

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  The following inventory was reported after a fresh devstack build:

  curl --silent \
  --header "Accept: application/json" \
  --header "Content-Type: application/json" \
  --header "OpenStack-API-Version: placement latest" \
  --header "X-Auth-Token: ${TOKEN:?}" \
  -X GET 
http://127.0.0.1/placement/resource_providers/8d4d7926-df76-42e5-b5da-67893468f5cb/inventories
 | json_pp
  {
     "resource_provider_generation" : 1,
     "inventories" : {
    "DISK_GB" : {
   "max_unit" : 19,
   "min_unit" : 1,
   "allocation_ratio" : 1,
   "step_size" : 1,
   "reserved" : 0,
   "total" : 19
    },
    "MEMORY_MB" : {
   "allocation_ratio" : 1.5,
   "max_unit" : 5967,
   "min_unit" : 1,
   "reserved" : 512,
   "step_size" : 1,
   "total" : 5967
    },
    "VCPU" : {
   "allocation_ratio" : 16,
   "min_unit" : 1,
   "max_unit" : 2,
   "reserved" : 0,
   "step_size" : 1,
   "total" : 2
    }
     }
  }

  IMO the correct max_unit value of the MEMORY_MB resource would be
  (total - reserved). But today it equals the total value.

  nova commit: 9e9b3e1
  devstack commit: fbdefac
  devstack config: ENABLED_SERVICES+=,placement-api,placement-client

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1749404/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1747000] Re: Use of parse.urlencode with dict in nova/tests/unit/scheduler/client/test_report.py can result in unpredictable query strings and thus unreliable tests

2018-02-12 Thread Chris Dent
It is, yes. I think the duplicate was created during one of those times
when launchpad was doing timeouts and I didn't notice that I created it
twice.

** Changed in: nova
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1747000

Title:
  Use of parse.urlencode with dict in
  nova/tests/unit/scheduler/client/test_report.py can result in
  unpredictable query strings and thus unreliable tests

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  In nova/tests/unit/scheduler/client/test_report.py there are several
  tests which confirm the URLs that get passed to the placement service.
  These create query strings by using code like:

  expected_url = '/allocation_candidates?%s' % parse.urlencode(
  {'resources': 'MEMORY_MB:1024,VCPU:1',
   'required': 'CUSTOM_TRAIT1',
   'limit': 1000})

  This results in a query string that will have an unpredictable order.
  Similarly, the code which is doing the actual query string creation is
  using the same form.

  Most of the time the results are the same, and the tests pass, but
  sometimes they do not.

  There are at least two potential ways to work around this:

  * build the query strings using a sequence of tuples and set the 'doseq' 
param to urlencode to True. This will preserve order.
  * Parse the expected_url's query params in the tests back to a dict and 
compare dicts

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1747000/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1739042] Re: _move_operation_alloc_request fails with TypeError when using 1.12 version allocation request

2018-02-09 Thread Chris Dent
The strings in the log don't seem to be showing up in recent logstash,
so I'm going to mark this one dead, so it's not lingering.

** Changed in: nova
   Status: Triaged => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1739042

Title:
  _move_operation_alloc_request fails with TypeError when using 1.12
  version allocation request

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Seen here in the alternate hosts series:

  http://logs.openstack.org/58/511358/43/check/openstack-tox-
  functional/e642310/job-output.txt.gz#_2017-12-19_00_18_34_585930

  2017-12-19 00:18:34.585930 | ubuntu-xenial | Traceback (most recent call 
last):
  2017-12-19 00:18:34.585992 | ubuntu-xenial |   File 
"/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py",
 line 163, in _process_incoming
  2017-12-19 00:18:34.586021 | ubuntu-xenial | res = 
self.dispatcher.dispatch(message)
  2017-12-19 00:18:34.586082 | ubuntu-xenial |   File 
"/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
 line 220, in dispatch
  2017-12-19 00:18:34.586114 | ubuntu-xenial | return 
self._do_dispatch(endpoint, method, ctxt, args)
  2017-12-19 00:18:34.586179 | ubuntu-xenial |   File 
"/home/zuul/src/git.openstack.org/openstack/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py",
 line 190, in _do_dispatch
  2017-12-19 00:18:34.586207 | ubuntu-xenial | result = func(ctxt, 
**new_args)
  2017-12-19 00:18:34.586241 | ubuntu-xenial |   File 
"nova/conductor/manager.py", line 603, in build_instances
  2017-12-19 00:18:34.586267 | ubuntu-xenial | 
host.allocation_request_version)
  2017-12-19 00:18:34.586300 | ubuntu-xenial |   File 
"nova/scheduler/utils.py", line 800, in claim_resources
  2017-12-19 00:18:34.586335 | ubuntu-xenial | user_id, 
allocation_request_version=allocation_request_version)
  2017-12-19 00:18:34.586370 | ubuntu-xenial |   File 
"nova/scheduler/client/__init__.py", line 37, in __run_method
  2017-12-19 00:18:34.586402 | ubuntu-xenial | return 
getattr(self.instance, __name)(*args, **kwargs)
  2017-12-19 00:18:34.586435 | ubuntu-xenial |   File 
"nova/scheduler/client/report.py", line 61, in wrapper
  2017-12-19 00:18:34.586459 | ubuntu-xenial | return f(self, *a, **k)
  2017-12-19 00:18:34.586493 | ubuntu-xenial |   File 
"nova/scheduler/client/report.py", line 110, in wrapper
  2017-12-19 00:18:34.586516 | ubuntu-xenial | return f(self, *a, **k)
  2017-12-19 00:18:34.586552 | ubuntu-xenial |   File 
"nova/scheduler/client/report.py", line 1126, in claim_resources
  2017-12-19 00:18:34.586586 | ubuntu-xenial | payload = 
_move_operation_alloc_request(current_allocs, ar)
  2017-12-19 00:18:34.586625 | ubuntu-xenial |   File 
"nova/scheduler/client/report.py", line 199, in _move_operation_alloc_request
  2017-12-19 00:18:34.586657 | ubuntu-xenial | for a in 
dest_alloc_req['allocations']) - cur_rp_uuids
  2017-12-19 00:18:34.586685 | ubuntu-xenial | TypeError: string indices 
must be integers

  This is due to using a 1.12 version allocation candidate request bug
  _move_operation_alloc_request is expecting the <1.12 format, where
  allocations is a list instead of a dict.

  I don't know if we should change the calling code to format the
  allocation request to the <1.12 format, or make
  _move_operation_alloc_request handle both styles (probably better to
  do the latter).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1739042/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1747935] Re: Openstack APIs and RFC 7234 HTTP caching

2018-02-07 Thread Chris Dent
I've added api-sig to this because the fact that this issue has shown up
in the wild should be good motivation for us to make a guideline about
how to address it. The code for adding the requisite headers to
placement may be a useful starting point:
https://review.openstack.org/#/c/521640/


** Also affects: openstack-api-wg
   Importance: Undecided
   Status: New

** Changed in: openstack-api-wg
   Status: New => Confirmed

** Changed in: openstack-api-wg
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1747935

Title:
  Openstack APIs and RFC 7234 HTTP caching

Status in OpenStack Compute (nova):
  New
Status in openstack-api-sig:
  Confirmed

Bug description:
  Description
  ===
  I recently hit an issue where I was using Terraform through an HTTP proxy 
(enforced by my company IT) to provision some resources in an Openstack cloud. 
Since creating the resources took some time, the initial response from 
openstack was "still creating...". Further polling of the resource status 
resulted in receiving *cached* copies of "still creating..." from the proxy 
until time-out.

  RFC7234 that describes HTTP caching states that in absence of all
  headers describing the lifetime/validity of the response, heuristic
  algorithms may be applied by caches to guesstimate an appropriate
  value for the validity of the response... (Who knows what is
  implemented out there...) See: the HTTP caching RFC section 4.2.2
  .

  The API responses describe the current state of an object which isn't
  permanent, but has a limited validity. In fact very limited as the
  state of an object might change any moment.

  Therefore it is my opinion that the Openstack API (Nova in this case,
  but equally valid for all other APIs) should be responsible to include
  proper HTTP headers in their responses to either disallow caching of
  the response or at least limit it's validity.

  See the HTTP caching RFC section 5
   for headers that could
  be used to accomplish that.

  For sake of completeness; also see
  https://github.com/gophercloud/gophercloud/issues/727 for my initial
  client-side fix and related discussion with client-side project
  owners...

  Expected result
  ===
  Openstack APIs to include header(s) from RFC7234 section 5 
 to either disallow caching or 
to specify a meaningful lifetime or to force/implement revalidation options.

  Actual result
  =
  No headers controlling caching present whatsoever.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1747935/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1747000] [NEW] Use of parse.urlencode with dict in nova/tests/unit/scheduler/client/test_report.py can result in unpredictable query strings and thus unreliable tests

2018-02-02 Thread Chris Dent
Public bug reported:

In nova/tests/unit/scheduler/client/test_report.py there are several
tests which confirm the URLs that get passed to the placement service.
These create query strings by using code like:

expected_url = '/allocation_candidates?%s' % parse.urlencode(
{'resources': 'MEMORY_MB:1024,VCPU:1',
 'required': 'CUSTOM_TRAIT1',
 'limit': 1000})

This results in a query string that will have an unpredictable order.
Similarly, the code which is doing the actual query string creation is
using the same form.

Most of the time the results are the same, and the tests pass, but
sometimes they do not.

There are at least two potential ways to work around this:

* build the query strings using a sequence of tuples and set the 'doseq' param 
to urlencode to True. This will preserve order.
* Parse the expected_url's query params in the tests back to a dict and compare 
dicts

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: placement scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1747000

Title:
  Use of parse.urlencode with dict in
  nova/tests/unit/scheduler/client/test_report.py can result in
  unpredictable query strings and thus unreliable tests

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  In nova/tests/unit/scheduler/client/test_report.py there are several
  tests which confirm the URLs that get passed to the placement service.
  These create query strings by using code like:

  expected_url = '/allocation_candidates?%s' % parse.urlencode(
  {'resources': 'MEMORY_MB:1024,VCPU:1',
   'required': 'CUSTOM_TRAIT1',
   'limit': 1000})

  This results in a query string that will have an unpredictable order.
  Similarly, the code which is doing the actual query string creation is
  using the same form.

  Most of the time the results are the same, and the tests pass, but
  sometimes they do not.

  There are at least two potential ways to work around this:

  * build the query strings using a sequence of tuples and set the 'doseq' 
param to urlencode to True. This will preserve order.
  * Parse the expected_url's query params in the tests back to a dict and 
compare dicts

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1747000/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1747001] [NEW] Use of parse.urlencode with dict in nova/tests/unit/scheduler/client/test_report.py can result in unpredictable query strings and thus unreliable tests

2018-02-02 Thread Chris Dent
Public bug reported:

In nova/tests/unit/scheduler/client/test_report.py there are several
tests which confirm the URLs that get passed to the placement service.
These create query strings by using code like:

expected_url = '/allocation_candidates?%s' % parse.urlencode(
{'resources': 'MEMORY_MB:1024,VCPU:1',
 'required': 'CUSTOM_TRAIT1',
 'limit': 1000})

This results in a query string that will have an unpredictable order.
Similarly, the code which is doing the actual query string creation is
using the same form.

Most of the time the results are the same, and the tests pass, but
sometimes they do not.

There are at least two potential ways to work around this:

* build the query strings using a sequence of tuples and set the 'doseq' param 
to urlencode to True. This will preserve order.
* Parse the expected_url's query params in the tests back to a dict and compare 
dicts

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: placement scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1747001

Title:
  Use of parse.urlencode with dict in
  nova/tests/unit/scheduler/client/test_report.py can result in
  unpredictable query strings and thus unreliable tests

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  In nova/tests/unit/scheduler/client/test_report.py there are several
  tests which confirm the URLs that get passed to the placement service.
  These create query strings by using code like:

  expected_url = '/allocation_candidates?%s' % parse.urlencode(
  {'resources': 'MEMORY_MB:1024,VCPU:1',
   'required': 'CUSTOM_TRAIT1',
   'limit': 1000})

  This results in a query string that will have an unpredictable order.
  Similarly, the code which is doing the actual query string creation is
  using the same form.

  Most of the time the results are the same, and the tests pass, but
  sometimes they do not.

  There are at least two potential ways to work around this:

  * build the query strings using a sequence of tuples and set the 'doseq' 
param to urlencode to True. This will preserve order.
  * Parse the expected_url's query params in the tests back to a dict and 
compare dicts

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1747001/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1747003] [NEW] A bad _RC_CACHE can rarely cause unit tests to fail

2018-02-02 Thread Chris Dent
Public bug reported:

Very rarely (so rarely in fact that it only seems to happen when test
order is much different from the norm) some unit tests which encounter
the resource_class_cache can fail as follows:

http://logs.openstack.org/49/540049/2/check/openstack-tox-
py27/176a6b3/testr_results.html.gz

-=-=-
ft1.1: 
nova.tests.unit.cmd.test_status.TestUpgradeCheckResourceProviders.test_check_resource_providers_no_compute_rps_one_compute_StringException:
 pythonlogging:'': {{{2018-02-02 11:30:00,443 WARNING [oslo_config.cfg] Config 
option key_manager.api_class  is deprecated. Use option key_manager.backend 
instead.}}}

Traceback (most recent call last):
  File "nova/tests/unit/cmd/test_status.py", line 588, in 
test_check_resource_providers_no_compute_rps_one_compute
self._create_resource_provider(FAKE_IP_POOL_INVENTORY)
  File "nova/tests/unit/cmd/test_status.py", line 561, in 
_create_resource_provider
rp.set_inventory(inv_list)
  File "nova/api/openstack/placement/objects/resource_provider.py", line 737, 
in set_inventory
exceeded = _set_inventory(self._context, self, inv_list)
  File 
"/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
 line 986, in wrapper
return fn(*args, **kwargs)
  File "nova/api/openstack/placement/objects/resource_provider.py", line 372, 
in _set_inventory
_add_inventory_to_provider(context, rp, inv_list, to_add)
  File "nova/api/openstack/placement/objects/resource_provider.py", line 201, 
in _add_inventory_to_provider
if inv_record.capacity <= 0:
AttributeError: 'NoneType' object has no attribute 'capacity'
-=-=-

The find() method on InventoryList can return None if that cache is bad.

This can be resolved (apparently) by resetting the _RC_CACHE between
test runs in the same way that _TRAITS_SYNCED is reset, in nova/test.py:

-# Reset the traits sync flag
-objects.resource_provider._TRAITS_SYNCED = False
+# Reset the traits sync flag and rc cache
+resource_provider._TRAITS_SYNCED = False
+resource_provider._RC_CACHE = None

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: placement scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1747003

Title:
  A bad _RC_CACHE can rarely cause unit tests to fail

Status in OpenStack Compute (nova):
  New

Bug description:
  Very rarely (so rarely in fact that it only seems to happen when test
  order is much different from the norm) some unit tests which encounter
  the resource_class_cache can fail as follows:

  http://logs.openstack.org/49/540049/2/check/openstack-tox-
  py27/176a6b3/testr_results.html.gz

  -=-=-
  ft1.1: 
nova.tests.unit.cmd.test_status.TestUpgradeCheckResourceProviders.test_check_resource_providers_no_compute_rps_one_compute_StringException:
 pythonlogging:'': {{{2018-02-02 11:30:00,443 WARNING [oslo_config.cfg] Config 
option key_manager.api_class  is deprecated. Use option key_manager.backend 
instead.}}}

  Traceback (most recent call last):
File "nova/tests/unit/cmd/test_status.py", line 588, in 
test_check_resource_providers_no_compute_rps_one_compute
  self._create_resource_provider(FAKE_IP_POOL_INVENTORY)
File "nova/tests/unit/cmd/test_status.py", line 561, in 
_create_resource_provider
  rp.set_inventory(inv_list)
File "nova/api/openstack/placement/objects/resource_provider.py", line 737, 
in set_inventory
  exceeded = _set_inventory(self._context, self, inv_list)
File 
"/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
 line 986, in wrapper
  return fn(*args, **kwargs)
File "nova/api/openstack/placement/objects/resource_provider.py", line 372, 
in _set_inventory
  _add_inventory_to_provider(context, rp, inv_list, to_add)
File "nova/api/openstack/placement/objects/resource_provider.py", line 201, 
in _add_inventory_to_provider
  if inv_record.capacity <= 0:
  AttributeError: 'NoneType' object has no attribute 'capacity'
  -=-=-

  The find() method on InventoryList can return None if that cache is
  bad.

  This can be resolved (apparently) by resetting the _RC_CACHE between
  test runs in the same way that _TRAITS_SYNCED is reset, in
  nova/test.py:

  -# Reset the traits sync flag
  -objects.resource_provider._TRAITS_SYNCED = False
  +# Reset the traits sync flag and rc cache
  +resource_provider._TRAITS_SYNCED = False
  +resource_provider._RC_CACHE = None

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1747003/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : 

[Yahoo-eng-team] [Bug 1743860] Re: allocation candidates db backend can handle traits but the HTTP front end is not turned on

2018-01-22 Thread Chris Dent
Apparently this was mostly expected, but not fully documented, so going
to invalidate the bug.

https://review.openstack.org/#/c/535642/

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1743860

Title:
  allocation candidates db backend can handle traits but the HTTP front
  end is not turned on

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  
  The code for /allocation_candidates is set up to be able to process a 
'required' parameter alongside the 'resources' parameter. This results in a 
collection of RequestGroups which are used by the AllocationCandidates code in 
nova/objects/resource_provider.py

  But we can't request traits on /allocation_candidates because the json
  schema does not allow the 'required' parameter.

  Adding that results in a system that appears to work modulo at least
  one issue:

  * AllocationCandidates._get_by_requests raised ValueError on an
  unknown trait. It should probably raise TraitNotFound, which should be
  caught in the allocation_candidates handler. (There's also a raw
  ValueError related to sharing providers).

  Of course this doesn't address the nested situation, but that's in
  progress elsewhere.

  I have some wip code that demonstrates making it work, others should
  feel free to leap on this though as I will be out of touch for the
  next 4 days.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1743860/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1743860] [NEW] allocation candidates db backend can handle non-nested traits but the HTTP front end is not turned on

2018-01-17 Thread Chris Dent
Public bug reported:


The code for /allocation_candidates is set up to be able to process a 
'required' parameter alongside the 'resources' parameter. This results in a 
collection of RequestGroups which are used by the AllocationCandidates code in 
nova/objects/resource_provider.py

But we can't request traits on /allocation_candidates because the json
schema does not allow the 'required' parameter.

Adding that results in a system that appears to work modulo at least one
issue:

* AllocationCandidates._get_by_requests raised ValueError on an unknown
trait. It should probably raise TraitNotFound, which should be caught in
the allocation_candidates handler. (There's also a raw ValueError
related to sharing providers).

Of course this doesn't address the nested situation, but that's in
progress elsewhere.

I have some wip code that demonstrates making it work, others should
feel free to leap on this though as I will be out of touch for the next
4 days.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1743860

Title:
  allocation candidates db backend can handle non-nested traits but the
  HTTP front end is not turned on

Status in OpenStack Compute (nova):
  New

Bug description:
  
  The code for /allocation_candidates is set up to be able to process a 
'required' parameter alongside the 'resources' parameter. This results in a 
collection of RequestGroups which are used by the AllocationCandidates code in 
nova/objects/resource_provider.py

  But we can't request traits on /allocation_candidates because the json
  schema does not allow the 'required' parameter.

  Adding that results in a system that appears to work modulo at least
  one issue:

  * AllocationCandidates._get_by_requests raised ValueError on an
  unknown trait. It should probably raise TraitNotFound, which should be
  caught in the allocation_candidates handler. (There's also a raw
  ValueError related to sharing providers).

  Of course this doesn't address the nested situation, but that's in
  progress elsewhere.

  I have some wip code that demonstrates making it work, others should
  feel free to leap on this though as I will be out of touch for the
  next 4 days.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1743860/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1743120] [NEW] placement inadvertently imports many python modules it does not need

2018-01-13 Thread Chris Dent
Public bug reported:


The placement WSGI app has its own WSGI framework and is the only script/binary 
that placement uses. From the outset, it has been intended that a deployment 
could run multiple placement servers, distributed across multiple 
hosts/pods/whatever, with some form of load balancing proxy in front.

In that model, it would be ideal for the app to require as few python
module as possible, and have as small a footprint, at least starting up,
as possible.

Currently, that's not the case, for two reasons:

* Placement makes use of the FaultWrapper WSGI middleware to catch
unexpected exceptions and turn them into status 500 HTTP responses.
FaultWrapper imports nova.utils and nova.utils imports a lot of stuff.

* Placement is within the nova/api/openstack package hierarchy and
nova/api/openstack/__init__.py contains active code and imports which
placement does not need.

These problems can be addressed:

* FaultWrapper is overkill for the placement WSGI app. FaultWrapper is
capable of decoding a variety of exceptions into a variety of status
responses. This is not required by placement. The only exceptions that
can make it to FaultWrapper in placement are those that would result in
a 500. A simpler middleware can handle this.

* nova/api/openstack/__init__.py should be emptied, the contents moved
to a file which is explicitly imported by those modules that actually
want it. Presumably, this could also provide an avenue for lightening
the metadata API.

Note that this is not the only case of inadvertent imports, but is one
of the main vectors. Others will be identified and additional bugs
created.

** Affects: nova
 Importance: Low
 Assignee: Chris Dent (cdent)
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1743120

Title:
  placement inadvertently imports many python modules it does not need

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  
  The placement WSGI app has its own WSGI framework and is the only 
script/binary that placement uses. From the outset, it has been intended that a 
deployment could run multiple placement servers, distributed across multiple 
hosts/pods/whatever, with some form of load balancing proxy in front.

  In that model, it would be ideal for the app to require as few python
  module as possible, and have as small a footprint, at least starting
  up, as possible.

  Currently, that's not the case, for two reasons:

  * Placement makes use of the FaultWrapper WSGI middleware to catch
  unexpected exceptions and turn them into status 500 HTTP responses.
  FaultWrapper imports nova.utils and nova.utils imports a lot of stuff.

  * Placement is within the nova/api/openstack package hierarchy and
  nova/api/openstack/__init__.py contains active code and imports which
  placement does not need.

  These problems can be addressed:

  * FaultWrapper is overkill for the placement WSGI app. FaultWrapper is
  capable of decoding a variety of exceptions into a variety of status
  responses. This is not required by placement. The only exceptions that
  can make it to FaultWrapper in placement are those that would result
  in a 500. A simpler middleware can handle this.

  * nova/api/openstack/__init__.py should be emptied, the contents moved
  to a file which is explicitly imported by those modules that actually
  want it. Presumably, this could also provide an avenue for lightening
  the metadata API.

  Note that this is not the only case of inadvertent imports, but is one
  of the main vectors. Others will be identified and additional bugs
  created.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1743120/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1736385] [NEW] placement is not being properly restarted in grenade (pike to master)

2017-12-05 Thread Chris Dent
Public bug reported:

When the placement service is supposed to restart in grenade (pike to
master) it doesn't actually restart:

http://logs.openstack.org/93/385693/84/check/legacy-grenade-dsvm-
neutron-multinode-live-
migration/9fa93e0/logs/grenade.sh.txt.gz#_2017-12-05_00_08_01_111

This leads to issues with new microversions not being available:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Unacceptable%20version%20header%3A%201.14%5C%22

This is a latent bug that was revealed, at least in part, by efried's
(correct) changes in https://review.openstack.org/#/c/524263/

It looks like a bad assumption is being made somewhere in the handling
of the systemd unit files: a 'start' when it is already started is
success, but does not restart (thus new code is not loaded).

We can probably fix this by using the 'restart' command instead of
'start':

 restart PATTERN...
   Restart one or more units specified on the command line. If the 
units are not running yet, they will be started.


Adding grenade and devstack as relate projects as the fix is presumably in 
devstack itself.

** Affects: devstack
 Importance: Undecided
 Status: New

** Affects: grenade
 Importance: Undecided
 Status: New

** Affects: nova
 Importance: Undecided
 Status: New

** Also affects: grenade
   Importance: Undecided
   Status: New

** Also affects: devstack
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1736385

Title:
  placement is not being properly restarted in grenade (pike to master)

Status in devstack:
  New
Status in grenade:
  New
Status in OpenStack Compute (nova):
  New

Bug description:
  When the placement service is supposed to restart in grenade (pike to
  master) it doesn't actually restart:

  http://logs.openstack.org/93/385693/84/check/legacy-grenade-dsvm-
  neutron-multinode-live-
  migration/9fa93e0/logs/grenade.sh.txt.gz#_2017-12-05_00_08_01_111

  This leads to issues with new microversions not being available:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Unacceptable%20version%20header%3A%201.14%5C%22

  This is a latent bug that was revealed, at least in part, by efried's
  (correct) changes in https://review.openstack.org/#/c/524263/

  It looks like a bad assumption is being made somewhere in the handling
  of the systemd unit files: a 'start' when it is already started is
  success, but does not restart (thus new code is not loaded).

  We can probably fix this by using the 'restart' command instead of
  'start':

   restart PATTERN...
 Restart one or more units specified on the command line. If the 
units are not running yet, they will be started.

  
  Adding grenade and devstack as relate projects as the fix is presumably in 
devstack itself.

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/1736385/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1735405] [NEW] Error message from placement when creating resource provider uses ambiguous identifier

2017-11-30 Thread Chris Dent
Public bug reported:

Nova master, late november

When a resource provider fails to create after a POST
/resource_providers for some reason, the error message identifies the
provider by uuid. However, the uuid may not have been supplied by the
client, it may be generated server side. So the name should be included
at:

https://github.com/openstack/nova/blob/daa1cd6d7660a0fb41b501c44db307c3e43f7600/nova/api/openstack/placement/handlers/resource_provider.py#L145

However, because of JSONSchema, it's unlikely (impossible) that
ObjectActionError will ever be raised so another option may be to just
get rid of the handling.

To resolve this:

* figure out if the exception can happen
* if not, remove the handling
* if so, change the message to include the 'name' in the output

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1735405

Title:
  Error message from placement when creating resource provider uses
  ambiguous identifier

Status in OpenStack Compute (nova):
  New

Bug description:
  Nova master, late november

  When a resource provider fails to create after a POST
  /resource_providers for some reason, the error message identifies the
  provider by uuid. However, the uuid may not have been supplied by the
  client, it may be generated server side. So the name should be
  included at:

  
https://github.com/openstack/nova/blob/daa1cd6d7660a0fb41b501c44db307c3e43f7600/nova/api/openstack/placement/handlers/resource_provider.py#L145

  However, because of JSONSchema, it's unlikely (impossible) that
  ObjectActionError will ever be raised so another option may be to just
  get rid of the handling.

  To resolve this:

  * figure out if the exception can happen
  * if not, remove the handling
  * if so, change the message to include the 'name' in the output

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1735405/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1734491] [NEW] placement keystonemiddleware_authtoken ignores OS_PLACEMENT_CONFIG_DIR

2017-11-25 Thread Chris Dent
Public bug reported:

In placement master, late November 2017, the deploy.py file is
responsible for building the WSGI middleware stack, including loading
the keystonemiddleware.

   auth_middleware = auth_token.filter_factory(
{}, oslo_config_project=project_name)

This mode of loading means that the middleware itself is responsible for
finding and reading the project conf file (nova.conf for now).

Whereas the general the conf loading done in wsgi.py is aware of an
OS_PLACEMENT_CONFIG_DIR env, the filter_factory loading above is not.

This can lead to some pretty confusing situations where custom config,
for things like the database, from custom locations are visible in the
placement service, but authtoken config is not.

Only after pulling all hair from head does it become clear what's
happening.

So we should fix that. It _may_ be as simple as adding similar file
handling as used in wsgi.py, but it's not clear if the arg will get
passed through. Further experimentation required.

** Affects: nova
 Importance: Undecided
 Assignee: Chris Dent (cdent)
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1734491

Title:
  placement keystonemiddleware_authtoken ignores OS_PLACEMENT_CONFIG_DIR

Status in OpenStack Compute (nova):
  New

Bug description:
  In placement master, late November 2017, the deploy.py file is
  responsible for building the WSGI middleware stack, including loading
  the keystonemiddleware.

 auth_middleware = auth_token.filter_factory(
  {}, oslo_config_project=project_name)

  This mode of loading means that the middleware itself is responsible
  for finding and reading the project conf file (nova.conf for now).

  Whereas the general the conf loading done in wsgi.py is aware of an
  OS_PLACEMENT_CONFIG_DIR env, the filter_factory loading above is not.

  This can lead to some pretty confusing situations where custom config,
  for things like the database, from custom locations are visible in the
  placement service, but authtoken config is not.

  Only after pulling all hair from head does it become clear what's
  happening.

  So we should fix that. It _may_ be as simple as adding similar file
  handling as used in wsgi.py, but it's not clear if the arg will get
  passed through. Further experimentation required.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1734491/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1728934] [NEW] placement gabbi tests that manipulate microversion can intermittently fail

2017-10-31 Thread Chris Dent
Public bug reported:

If a gabbi file sets a default microversion by setting a header
'OpenStack-API-Version' with a value like 'placement latest' and then
later overrides that in an individual test with a header of 'openstack-
api-version' the difference in case can lead to failure.

In the best case the failure is consistent.

In the worst case it can sometimes work, because the header shows up
twice in the request, and the last header wins.

The solution is to always use the same case through the file.

Note that gabbi is case sensitive here in part because of the
implementation but also because it provides control and possibility to
test exactly this problem.

** Affects: nova
 Importance: Medium
 Assignee: Chris Dent (cdent)
 Status: Triaged


** Tags: placement testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1728934

Title:
  placement gabbi tests that manipulate microversion can intermittently
  fail

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  If a gabbi file sets a default microversion by setting a header
  'OpenStack-API-Version' with a value like 'placement latest' and then
  later overrides that in an individual test with a header of
  'openstack-api-version' the difference in case can lead to failure.

  In the best case the failure is consistent.

  In the worst case it can sometimes work, because the header shows up
  twice in the request, and the last header wins.

  The solution is to always use the same case through the file.

  Note that gabbi is case sensitive here in part because of the
  implementation but also because it provides control and possibility to
  test exactly this problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1728934/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1724065] [NEW] Requesting an out of range microversion without an accept header in placement results in a KeyError

2017-10-16 Thread Chris Dent
Public bug reported:

If the placement service (as of microversion 1.10) if you request a
microversion that is outside the acceptable range of VERSIONS and do
_not_ include an 'Accept' header in the request, there is a 500 and a
KeyError while webob tries to look up the Accept header.

The issue is in FaultWrapper, so the problem can be dealt with elsewhere
in placement WSGI stack.

** Affects: nova
 Importance: Medium
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1724065

Title:
  Requesting an out of range microversion without an accept header in
  placement results in a KeyError

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  If the placement service (as of microversion 1.10) if you request a
  microversion that is outside the acceptable range of VERSIONS and do
  _not_ include an 'Accept' header in the request, there is a 500 and a
  KeyError while webob tries to look up the Accept header.

  The issue is in FaultWrapper, so the problem can be dealt with
  elsewhere in placement WSGI stack.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1724065/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1723122] [NEW] missing gabbi coverage for empty resources query string

2017-10-12 Thread Chris Dent
Public bug reported:

This is a note to self for sake of bookkeeping.

Coverage reports show that there's zero gabbi coverage for a request
with an empty `resources` query string. As this is relatively simple to
add, may as well have it.

** Affects: nova
 Importance: Low
 Assignee: Chris Dent (cdent)
 Status: Triaged


** Tags: note-to-self placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1723122

Title:
  missing gabbi coverage for empty resources query string

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This is a note to self for sake of bookkeeping.

  Coverage reports show that there's zero gabbi coverage for a request
  with an empty `resources` query string. As this is relatively simple
  to add, may as well have it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1723122/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1723123] [NEW] missing coverage for trying to update a standard resource class in placement api

2017-10-12 Thread Chris Dent
Public bug reported:

This is a note to self for sake of bookkeeping.

Coverage reports show that there's zero gabbi coverage for a request
that attempts to PUT to update a standard resource class. This is easy
to fix, so may as well.

** Affects: nova
 Importance: Low
 Assignee: Chris Dent (cdent)
 Status: Triaged


** Tags: note-to-self placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1723123

Title:
  missing coverage for trying to update a standard resource class in
  placement api

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  This is a note to self for sake of bookkeeping.

  Coverage reports show that there's zero gabbi coverage for a request
  that attempts to PUT to update a standard resource class. This is easy
  to fix, so may as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1723123/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1692375] Re: Placement install process isn't documented

2017-10-06 Thread Chris Dent
I think the above patch got us far as we're going to get on this issue
for now, so gonna mark it done.

** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1692375

Title:
  Placement install process isn't documented

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The placement api is exposed only as a wsgi script however there is no
  documentation in nova or example configuration on how to use it. While
  this tends to fit into a standard mold providing users with this
  guidance is very useful. Right now the docs just say deploy it without
  saying how:

  https://docs.openstack.org/developer/nova/placement.html

  The only place to see the example config is in the devstack code:

  https://github.com/openstack-dev/devstack/blob/master/files/apache-
  placement-api.template

  which isn't really discoverable.

  The keystone docs for example provide a very useful guide on how to do
  this with in tree example configs:

  https://docs.openstack.org/developer/keystone/apache-httpd.html

  https://github.com/openstack/keystone/tree/master/httpd

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1692375/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1719933] [NEW] placement server needs to retry allocations, server-side

2017-09-27 Thread Chris Dent
Public bug reported:

Long time ago a todo was made in placement:
https://github.com/openstack/nova/blob/faede889d3620f8ff0131a7a4c6b9c1bc844cd06/nova/objects/resource_provider.py#L1837-L1839

We need to implement that TODO, this is a note to self.

This is related to what may be a different bug: when heavily loaded with
many single requests, the placement server is unexpectedly receiving
409's about generation problems. Discussion of that led to remembering
that this TODO needs to be fixed. Fixing the TODO needs to be done
regardless of that problem.

** Affects: nova
 Importance: Medium
 Status: Triaged


** Tags: note-to-self placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1719933

Title:
  placement server needs to retry allocations, server-side

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Long time ago a todo was made in placement:
  
https://github.com/openstack/nova/blob/faede889d3620f8ff0131a7a4c6b9c1bc844cd06/nova/objects/resource_provider.py#L1837-L1839

  We need to implement that TODO, this is a note to self.

  This is related to what may be a different bug: when heavily loaded
  with many single requests, the placement server is unexpectedly
  receiving 409's about generation problems. Discussion of that led to
  remembering that this TODO needs to be fixed. Fixing the TODO needs to
  be done regardless of that problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1719933/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1716247] [NEW] api_sample_tests.test_aggregates.AggregatesSampleJsonTest failing due to change in aggregates

2017-09-10 Thread Chris Dent
Public bug reported:

Change-Id: I811d0f219142ca435b2b206e9b11ccd5ac611997 changed the api
samples to use different availability zones, breaking lots of tests.

http://logs.openstack.org/67/502167/2/check/gate-nova-tox-functional-
py35-ubuntu-xenial/2cfb158/testr_results.html.gz

** Affects: nova
 Importance: Undecided
 Assignee: Chris Dent (cdent)
 Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1716247

Title:
  api_sample_tests.test_aggregates.AggregatesSampleJsonTest failing due
  to change in aggregates

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Change-Id: I811d0f219142ca435b2b206e9b11ccd5ac611997 changed the api
  samples to use different availability zones, breaking lots of tests.

  http://logs.openstack.org/67/502167/2/check/gate-nova-tox-functional-
  py35-ubuntu-xenial/2cfb158/testr_results.html.gz

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1716247/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1714402] [NEW] When setting an allocation with multiple resource providers and one of them does not exist the error message can be wrong

2017-08-31 Thread Chris Dent
Public bug reported:


nova master as of 20170831

The _set_allocations method used to write allocations to the placement
API will raise a 400 when a resource class results in a NotFound
exception. We want that 400. The problem is that the message associated
with the error users the resource provider uuid from whatever resource
provider was the last one in a loop, not the one that creates the error.
See:

https://github.com/openstack/nova/blob/master/nova/api/openstack/placement/handlers/allocation.py#L231-L234

and the loop prior.

This is not a huge deal because it's unlikely that people are inspecting
error responses all that much, but it would be nice to fix.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1714402

Title:
  When setting an allocation with multiple resource providers and one of
  them does not exist the error message can be wrong

Status in OpenStack Compute (nova):
  New

Bug description:
  
  nova master as of 20170831

  The _set_allocations method used to write allocations to the placement
  API will raise a 400 when a resource class results in a NotFound
  exception. We want that 400. The problem is that the message
  associated with the error users the resource provider uuid from
  whatever resource provider was the last one in a loop, not the one
  that creates the error. See:

  
https://github.com/openstack/nova/blob/master/nova/api/openstack/placement/handlers/allocation.py#L231-L234

  and the loop prior.

  This is not a huge deal because it's unlikely that people are
  inspecting error responses all that much, but it would be nice to fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1714402/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708205] Re: placement allocation representation asymetric on PUT and GET

2017-08-25 Thread Chris Dent
** Changed in: nova
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708205

Title:
  placement allocation representation asymetric on PUT and GET

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  GET /allocations/{consumer_uuid} is a dict keyed by resource provider
  uuid.

  PUT /allocations/{consumer_uuid} is an array of anonymous json objects
  with 'resource_provider' and 'resources' objects

  This asymmetry is undesirable and confusing. It's probably the result
  of failing to update one side when changing the other, earlier in the
  development of placement.

  Changing it is likely a bit of a bear, would require a microversion of
  course, but might be worth considering to make things more clear for
  the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708205/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1621709] Re: There is no allocation record for migration action

2017-08-18 Thread Chris Dent
This is effectively a duplicate of #1707071 , which has been released,
so I'm going to mark this as such.

** Changed in: nova
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1621709

Title:
  There is no allocation record for migration action

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  In the current RT, the migration case was consider as resource
  consuming. But we didn't update any allocation record to Placement
  API.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1621709/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1653122] Re: Placement API should support DELETE /resource-providers/{uuid}/inventories

2017-08-18 Thread Chris Dent
** Changed in: nova
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1653122

Title:
  Placement API should support DELETE /resource-
  providers/{uuid}/inventories

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  This is a small feature request.

  Currently (version 1.3 or before of the placement API), in order to
  delete all inventory for a resource provider, one must call PUT
  /resource_providers/{uuid}/inventories and pass in the following
  request payload:

  {
'generation': ,
'resources': {}
  }

  it would be easier and more intuitive to support DELETE
  /resource_providers/{uuid}/inventories with no request payload and
  returning a 204 No Content on success.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1653122/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1661312] Re: Evacuation will corrupt instance allocations

2017-08-18 Thread Chris Dent
https://bugs.launchpad.net/nova/+bug/1709902 duplcates this, and that
one has code, so invalidating this one.

** Changed in: nova
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1661312

Title:
  Evacuation will corrupt instance allocations

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  The following sequence of events will result in a corrupted instance
  allocation in placement:

  1. Instance running on host A, placement has allocations for instance on host 
A
  2. Host A goes down
  3. Instance is evacuated to host B, host B creates duplicated allocations in 
placement for instance
  4. Host A comes up, notices that instance is gone, deletes all allocations 
for instance on both hosts A and B
  5. Instance now has no allocations for a period
  6. Eventually, host B will re-create the allocations for the instance

  The period between #4 and #6 will have the scheduler making bad
  decisions because it thinks host B is less loaded than it is.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1661312/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1710908] [NEW] scheduler.utils.merge_resources allows zero value resources

2017-08-15 Thread Chris Dent
Public bug reported:

(master as of 2017-08-15)

If the merge of two resources of the same class can result in a sum of
zero, or one of the provided keys has a value of zero in the first place
and it is only in one of the provided resource dicts, the result dict of
resources will have a zero value entry. If this is then used directly to
produce an allocations entry, the allocation will fail.

I discovered this while manually testing resizes of servers using
flavors with no disk.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: resource-tracker scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1710908

Title:
  scheduler.utils.merge_resources allows zero value resources

Status in OpenStack Compute (nova):
  New

Bug description:
  (master as of 2017-08-15)

  If the merge of two resources of the same class can result in a sum of
  zero, or one of the provided keys has a value of zero in the first
  place and it is only in one of the provided resource dicts, the result
  dict of resources will have a zero value entry. If this is then used
  directly to produce an allocations entry, the allocation will fail.

  I discovered this while manually testing resizes of servers using
  flavors with no disk.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1710908/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1710509] [NEW] ServerMovingTests.test_evacuate sometimes fails but not always

2017-08-13 Thread Chris Dent
Public bug reported:

The newly added test_evacuate test in ServerMovingTests is lightly
racey. It seems to fail about 1 in 10 times. A recent failure is at
http://logs.openstack.org/72/489772/2/gate/gate-nova-tox-functional-py35
-ubuntu-xenial/07f4a29/console.html#_2017-08-12_12_51_52_867765

Will look into this more closely tomorrow when I've got time, and add
elastic recheck entry etc, but wanted to get it written down.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: resource-tracker scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1710509

Title:
  ServerMovingTests.test_evacuate sometimes fails but not always

Status in OpenStack Compute (nova):
  New

Bug description:
  The newly added test_evacuate test in ServerMovingTests is lightly
  racey. It seems to fail about 1 in 10 times. A recent failure is at
  http://logs.openstack.org/72/489772/2/gate/gate-nova-tox-functional-
  py35-ubuntu-xenial/07f4a29/console.html#_2017-08-12_12_51_52_867765

  Will look into this more closely tomorrow when I've got time, and add
  elastic recheck entry etc, but wanted to get it written down.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1710509/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708961] [NEW] migration of single instance from multi-instance request spec fails with IndexError

2017-08-06 Thread Chris Dent
Public bug reported:

Nova master, as of August 6th, 2017 (head is
5971dde5d945bcbe1e81b87d342887abd5d2eece).

If you make multiple instances from one request:

   openstack server create --flavor c1 --image $IMAGE --nic net-
id=$NET_ID --min 5 --max 10 x2

and then try to migrate just one of those instances:

   nova migrate --poll x2-1

The API generates a 500 because there's an IndexError in the
filter_scheduler, at line 190. `num_instances` is 9 and the loop
continues after the first allocations are claimed. On the second loop
`num` is one, but the list of instance_uuids is only one item long, so
IndexError.

At line 162 where num_instances is assigned we probably need to take the
length of instance_uuids (if it is not None) instead the num_instances
from the spec_obj.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: placement scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708961

Title:
  migration of single instance from multi-instance request spec fails
  with IndexError

Status in OpenStack Compute (nova):
  New

Bug description:
  Nova master, as of August 6th, 2017 (head is
  5971dde5d945bcbe1e81b87d342887abd5d2eece).

  If you make multiple instances from one request:

 openstack server create --flavor c1 --image $IMAGE --nic net-
  id=$NET_ID --min 5 --max 10 x2

  and then try to migrate just one of those instances:

 nova migrate --poll x2-1

  The API generates a 500 because there's an IndexError in the
  filter_scheduler, at line 190. `num_instances` is 9 and the loop
  continues after the first allocations are claimed. On the second loop
  `num` is one, but the list of instance_uuids is only one item long, so
  IndexError.

  At line 162 where num_instances is assigned we probably need to take
  the length of instance_uuids (if it is not None) instead the
  num_instances from the spec_obj.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708961/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708958] [NEW] disabling a compute service does not disable the resource provider

2017-08-06 Thread Chris Dent
Public bug reported:

If you make a multi node devstack (nova master as of August 6th, 2017),
or otherwise have multiple compute nodes, all of those compute nodes
will create resource providers and relevant inventory.

Later if you disable one of the compute nodes with a nova service-
disable {service id}, that nova-compute will be disabled, but the
associated resource provider will still exist with legit inventory in
the placement service.

This will mean that /allocation_candidates or /resource_providers  will
return results including the disabled compute node, but they will be
bogus.

It's not clear what the right behaviour is here. Should the rp of the
disabled service be deleted? Have its inventory truncated? If there are
other hosts available that satisfy the request, things go forward as
desired, so there's not a functional bug here, but the data in placement
is incorrect, which is undesirable.

(On a related note, if you delete a compute node's resource provider
from the placement service and don't restart the associated nova-
compute, the _ensure_resource_provider method does _not_ create the
resource provider anew because the _resource_providers dict still
contains the uuid. This might be expected behaviour but it surprised me
while I was messing around.)

** Affects: nova
 Importance: Low
 Status: New


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708958

Title:
  disabling a compute service does not disable the resource provider

Status in OpenStack Compute (nova):
  New

Bug description:
  If you make a multi node devstack (nova master as of August 6th,
  2017), or otherwise have multiple compute nodes, all of those compute
  nodes will create resource providers and relevant inventory.

  Later if you disable one of the compute nodes with a nova service-
  disable {service id}, that nova-compute will be disabled, but the
  associated resource provider will still exist with legit inventory in
  the placement service.

  This will mean that /allocation_candidates or /resource_providers
  will return results including the disabled compute node, but they will
  be bogus.

  It's not clear what the right behaviour is here. Should the rp of the
  disabled service be deleted? Have its inventory truncated? If there
  are other hosts available that satisfy the request, things go forward
  as desired, so there's not a functional bug here, but the data in
  placement is incorrect, which is undesirable.

  (On a related note, if you delete a compute node's resource provider
  from the placement service and don't restart the associated nova-
  compute, the _ensure_resource_provider method does _not_ create the
  resource provider anew because the _resource_providers dict still
  contains the uuid. This might be expected behaviour but it surprised
  me while I was messing around.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708958/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708260] [NEW] Sending empty allocations list on a PUT /allocations/{consumer_uuid} results in 500

2017-08-02 Thread Chris Dent
Public bug reported:

If you send an empty allocation list to the placement server:

- name: put an allocation empty list
  PUT: /allocations/599ffd2d-526a-4b2e-8683-f13ad25f9958
  request_headers:
  content-type: application/json
  data:
  allocations: []

You'll get a 500 response because of an Index error when consumer_id =
allocs[0].consumer_id.

Instead we should never reach this code. There should either be a schema
violation, because we should have at least one allocation, or if we're
willing to accept an empty list and do nothing, w should skip the call
to the database.

** Affects: nova
 Importance: Medium
 Assignee: Chris Dent (cdent)
 Status: Confirmed


** Tags: pike-rc-potential placment

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708260

Title:
  Sending empty allocations list on a PUT /allocations/{consumer_uuid}
  results in 500

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  If you send an empty allocation list to the placement server:

  - name: put an allocation empty list
PUT: /allocations/599ffd2d-526a-4b2e-8683-f13ad25f9958
request_headers:
content-type: application/json
data:
allocations: []

  You'll get a 500 response because of an Index error when consumer_id =
  allocs[0].consumer_id.

  Instead we should never reach this code. There should either be a
  schema violation, because we should have at least one allocation, or
  if we're willing to accept an empty list and do nothing, w should skip
  the call to the database.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708260/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708252] [NEW] resource tracker updates its map of aggregates too often

2017-08-02 Thread Chris Dent
Public bug reported:

As of late in the Pike cycle, the resource tracker updates its map of
aggregates associated with the resource provider it knows about
everytime it calls `_ensure_resource_provider`. This method is called
quite often, increasingly so as we do more stuff with resource providers
from both the resource tracker and scheduler (both of which use the
report client). This results in a lot of useless work that could create
undue load on both client and server.

There is a long standing TODO to have some kind of cache or timeout so
that we update the aggregate map less often, as updates of those on the
placement server side are relatively infrequent.

We need to balance between doing the updates too often and there being a
gap between when an aggregate change does happen and the map getting
updated.

** Affects: nova
 Importance: Low
 Assignee: Chris Dent (cdent)
 Status: Confirmed


** Tags: placement

** Changed in: nova
 Assignee: (unassigned) => Chris Dent (cdent)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708252

Title:
  resource tracker updates its map of aggregates too often

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  As of late in the Pike cycle, the resource tracker updates its map of
  aggregates associated with the resource provider it knows about
  everytime it calls `_ensure_resource_provider`. This method is called
  quite often, increasingly so as we do more stuff with resource
  providers from both the resource tracker and scheduler (both of which
  use the report client). This results in a lot of useless work that
  could create undue load on both client and server.

  There is a long standing TODO to have some kind of cache or timeout so
  that we update the aggregate map less often, as updates of those on
  the placement server side are relatively infrequent.

  We need to balance between doing the updates too often and there being
  a gap between when an aggregate change does happen and the map getting
  updated.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708252/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708204] [NEW] placement allocation representation asymetric on PUT and GET

2017-08-02 Thread Chris Dent
Public bug reported:

GET /allocations/{consumer_uuid} is a dict keyed by resource provider
uuid.

PUT /allocations/{consumer_uuid} is an array of anonymous json objects
with 'resource_provider' and 'resources' objects

This asymmetry is undesirable and confusing. It's probably the result of
failing to update one side when changing the other, earlier in the
development of placement.

Changing it is likely a bit of a bear, would require a microversion of
course, but might be worth considering to make things more clear for the
future.

** Affects: nova
 Importance: Low
 Status: Confirmed


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708204

Title:
  placement allocation representation asymetric on PUT and GET

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  GET /allocations/{consumer_uuid} is a dict keyed by resource provider
  uuid.

  PUT /allocations/{consumer_uuid} is an array of anonymous json objects
  with 'resource_provider' and 'resources' objects

  This asymmetry is undesirable and confusing. It's probably the result
  of failing to update one side when changing the other, earlier in the
  development of placement.

  Changing it is likely a bit of a bear, would require a microversion of
  course, but might be worth considering to make things more clear for
  the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708204/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708205] [NEW] placement allocation representation asymetric on PUT and GET

2017-08-02 Thread Chris Dent
Public bug reported:

GET /allocations/{consumer_uuid} is a dict keyed by resource provider
uuid.

PUT /allocations/{consumer_uuid} is an array of anonymous json objects
with 'resource_provider' and 'resources' objects

This asymmetry is undesirable and confusing. It's probably the result of
failing to update one side when changing the other, earlier in the
development of placement.

Changing it is likely a bit of a bear, would require a microversion of
course, but might be worth considering to make things more clear for the
future.

** Affects: nova
 Importance: Low
 Status: Confirmed


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708205

Title:
  placement allocation representation asymetric on PUT and GET

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  GET /allocations/{consumer_uuid} is a dict keyed by resource provider
  uuid.

  PUT /allocations/{consumer_uuid} is an array of anonymous json objects
  with 'resource_provider' and 'resources' objects

  This asymmetry is undesirable and confusing. It's probably the result
  of failing to update one side when changing the other, earlier in the
  development of placement.

  Changing it is likely a bit of a bear, would require a microversion of
  course, but might be worth considering to make things more clear for
  the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708205/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708167] [NEW] placement services logs 405 response as untrapped error

2017-08-02 Thread Chris Dent
Public bug reported:

When the placement service gets a bad method for an existing URL it
raises an HTTPMethodNotAllowed exception. It does this outside of the
WebOb wsgify context, meaning the the exception is caught the the
FautlWrapper middleware and perceived to be an uncaught exception and
logged as such, muddying the log files with something that is normal.

We don't see this log messaages in CI because we don't accidentally
cause 405s. Where we intentionally cause them (in gabbi tests) the log
message is recorded in the subunit data but not in the test output
because the tests pass (passing tests do not display those messages).[1]

The fix is to treat the HTTPMethodNotAllowed as a wsgi app instead of an
exception and call it. When doing that, things work as desired. Fix
forthcoming.


[1] I discovered this because the subunit files on 
https://review.openstack.org/362766 were cresting the 50M limit, because in 
that change the api sample tests were passing but having all kinds of errors 
with the placement fixture (I've since fixed the patch) generating vast amounts 
of log messagse on successful tests. Digging in there also revealed the error 
message that this bug wants to deal with.

** Affects: nova
 Importance: Low
 Assignee: Chris Dent (cdent)
 Status: In Progress


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708167

Title:
  placement services logs 405 response as untrapped error

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When the placement service gets a bad method for an existing URL it
  raises an HTTPMethodNotAllowed exception. It does this outside of the
  WebOb wsgify context, meaning the the exception is caught the the
  FautlWrapper middleware and perceived to be an uncaught exception and
  logged as such, muddying the log files with something that is normal.

  We don't see this log messaages in CI because we don't accidentally
  cause 405s. Where we intentionally cause them (in gabbi tests) the log
  message is recorded in the subunit data but not in the test output
  because the tests pass (passing tests do not display those
  messages).[1]

  The fix is to treat the HTTPMethodNotAllowed as a wsgi app instead of
  an exception and call it. When doing that, things work as desired. Fix
  forthcoming.

  
  [1] I discovered this because the subunit files on 
https://review.openstack.org/362766 were cresting the 50M limit, because in 
that change the api sample tests were passing but having all kinds of errors 
with the placement fixture (I've since fixed the patch) generating vast amounts 
of log messagse on successful tests. Digging in there also revealed the error 
message that this bug wants to deal with.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708167/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1708037] [NEW] scheduler/resource tracker report client should send accept header on all requests

2017-08-01 Thread Chris Dent
Public bug reported:

Currently the report client doesn't consisently use an accept header of
'application/json' when making requests of the placement API. This means
that sometimes the bodies of the error responses are in HTML which means
processing and inspection of the error response is unpredictable and
unstructured. If/when placement starts including error codes in
responses (as described in http://specs.openstack.org/openstack/api-
wg/guidelines/errors.html ) this will be a problem.

This misfeature is an artifact of WebOb which defaults to constructing a
response body based on what it perceives to be the client's desired
response media type. It falls back to the text/html. Strictly speaking
it is the correct behavior.

Because of the way placement is designed, it is safe to always send both
content-type and accept headers of 'application/json' so we could make
this most correct by changing the report client to always have the
headers.

** Affects: nova
 Importance: Low
 Status: Confirmed


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1708037

Title:
  scheduler/resource tracker report client should send accept header on
  all requests

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Currently the report client doesn't consisently use an accept header
  of 'application/json' when making requests of the placement API. This
  means that sometimes the bodies of the error responses are in HTML
  which means processing and inspection of the error response is
  unpredictable and unstructured. If/when placement starts including
  error codes in responses (as described in
  http://specs.openstack.org/openstack/api-wg/guidelines/errors.html )
  this will be a problem.

  This misfeature is an artifact of WebOb which defaults to constructing
  a response body based on what it perceives to be the client's desired
  response media type. It falls back to the text/html. Strictly speaking
  it is the correct behavior.

  Because of the way placement is designed, it is safe to always send
  both content-type and accept headers of 'application/json' so we could
  make this most correct by changing the report client to always have
  the headers.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1708037/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1707669] [NEW] [placement] put allocations does not do a full overwrite of existing allocations

2017-07-31 Thread Chris Dent
Public bug reported:

The presumption all along has been that when doing a PUT
/allocations/{consumer_uuid} that the body of that request would fully
replace any allocations associated with that consumer.

This has turned out not to be the case. The code [1] to clean up the
current allocations was only deleting allocations where the consumer
_and_ resource provider matched the incoming allocations. We want to
wipe the slate.

Huge hat tip to gibi for doing the necessary digging to get this going.

[1]
https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L1509-L1520

** Affects: nova
 Importance: High
 Status: Triaged


** Tags: pike-rc-potential placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1707669

Title:
  [placement] put allocations does not do a full overwrite of existing
  allocations

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  The presumption all along has been that when doing a PUT
  /allocations/{consumer_uuid} that the body of that request would fully
  replace any allocations associated with that consumer.

  This has turned out not to be the case. The code [1] to clean up the
  current allocations was only deleting allocations where the consumer
  _and_ resource provider matched the incoming allocations. We want to
  wipe the slate.

  Huge hat tip to gibi for doing the necessary digging to get this
  going.

  [1]
  
https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L1509-L1520

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1707669/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1707168] [NEW] [placement] resource provider trait-related query creates unicode warning

2017-07-28 Thread Chris Dent
Public bug reported:

Running queries for shared providers creates the following warning:


/home/cdent/src/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:340:
 OsloDBDeprecationWarning: EngineFacade is deprecated; please use 
oslo_db.sqlalchemy.enginefacade
  self._legacy_facade = LegacyEngineFacade(None, _factory=self)

/home/cdent/src/nova/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/sql/sqltypes.py:219:
 SAWarning: Unicode type received non-unicode bind param value 
'MISC_SHARES_VIA_AGGREGATE'. (this warning may be suppressed after 10 
occurrences)
  (util.ellipses_string(value),))

This is annoying when trying to evaluate test logs. It's noise.

** Affects: nova
 Importance: Low
 Assignee: Chris Dent (cdent)
 Status: In Progress


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1707168

Title:
  [placement] resource provider trait-related query creates unicode
  warning

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Running queries for shared providers creates the following warning:

  
/home/cdent/src/nova/.tox/functional/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:340:
 OsloDBDeprecationWarning: EngineFacade is deprecated; please use 
oslo_db.sqlalchemy.enginefacade
self._legacy_facade = LegacyEngineFacade(None, _factory=self)
  
/home/cdent/src/nova/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/sql/sqltypes.py:219:
 SAWarning: Unicode type received non-unicode bind param value 
'MISC_SHARES_VIA_AGGREGATE'. (this warning may be suppressed after 10 
occurrences)
(util.ellipses_string(value),))

  This is annoying when trying to evaluate test logs. It's noise.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1707168/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1705161] Re: Instance is not able to launch

2017-07-20 Thread Chris Dent
Looks like your database connection setup is not correct, either in
nova.conf or with username and password on the databasea:

2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions
OperationalError: (pymysql.err.OperationalError) (1045, u"Access denied
for user 'nova'@'controller' (using password: YES)")


If this is not the case, pleae followup with more information.

** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1705161

Title:
  Instance is not able to launch

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  I followed the procedure from https://docs.openstack.org/ocata
  /install-guide-ubuntu/InstallGuide.pdf to build OpenStack environment.

  But while launching instance I got the error as follows;

  Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ 
and attach the Nova API log if possible.
   (HTTP 500) (Request-ID: 
req-d6a9eec9-0570-4844-9434-d85d04c52883)

  After getting this error, I made a new database for Nova but it is
  using the old database and giving the same error.

  This is the nova-api.log file:
  

  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions Traceback 
(most recent call last):
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/extensions.py", line 338, 
in wrapped
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions return 
f(*args, **kwargs)
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 181, 
in wrapper
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions return 
func(*args, **kwargs)
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 181, 
in wrapper
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions return 
func(*args, **kwargs)
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/servers.py", line 
214, in detail
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions servers 
= self._get_servers(req, is_detail=True)
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/servers.py", line 
357, in _get_servers
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions 
sort_keys=sort_keys, sort_dirs=sort_dirs)
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 2466, in get_all
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions 
sort_dirs=sort_dirs)
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 2606, in 
_get_instances_by_filters
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions 
expected_attrs=fields, sort_keys=sort_keys, sort_dirs=sort_dirs)
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in 
wrapper
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions result 
= fn(cls, context, *args, **kwargs)
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/nova/objects/instance.py", line 1220, in 
get_by_filters
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions 
use_slave=use_slave, sort_keys=sort_keys, sort_dirs=sort_dirs)
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 235, in 
wrapper
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions with 
reader_mode.using(context):
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/contextlib.py", line 17, in __enter__
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions return 
self.gen.next()
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 
944, in _transaction_scope
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions 
allow_async=self._allow_async) as resource:
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions   File 
"/usr/lib/python2.7/contextlib.py", line 17, in __enter__
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions return 
self.gen.next()
  2017-07-19 02:14:14.901 24784 ERROR nova.api.openstack.extensions 

[Yahoo-eng-team] [Bug 1704574] [NEW] [placement] attempt to put allocation to resource provide that does not host class of resource causes 500

2017-07-15 Thread Chris Dent
Public bug reported:

I made a typo while writing some gabbi tests and uncovered a 500 in the
placement service. If you try to allocate to a resource provider that
does not host that class of resource it can have a KeyError during
capacity checking. given the following gabbi in microversion 1.10:

- name: create a resource provider
  POST: /resource_providers
  data:
  name: an rp
  status: 201  
  
- name: get resource provider
  GET: $LOCATION   
  status: 200
  
- name: create a resource class
  PUT: /resource_classes/CUSTOM_GOLD   
  status: 201  
  
- name: add inventory to an rp
  PUT: /resource_providers/$HISTORY['get resource 
provider'].$RESPONSE['$.uuid']/inventories
  data:
  resource_provider_generation: 0  
  inventories:
  VCPU:
  total: 24
  CUSTOM_GOLD: 
  total: 5
  status: 200
  
- name: allocate some of it
  PUT: /allocations/6d9f83db-6eb5-49f6-84b0-5d03c6aa9fc8   
  data:
  allocations:
  - resource_provider:
uuid: $HISTORY['get resource provider'].$RESPONSE['$.uuid']
resources:
DISK_GB: 5 
CUSTOM_GOLD: 1 
  project_id: 42a32c07-3eeb-4401-9373-68a8cdca6784 
  user_id: 66cb2f29-c86d-47c3-8af5-69ae7b778c70
  status: 204

when DISK_GB is checked for capacity, we get:

2017-07-15 17:41:47,224 ERROR [nova.api.openstack.placement.handler] Uncaught 
exception
Traceback (most recent call last):
  File "nova/api/openstack/placement/handler.py", line 215, in __call__
return dispatch(environ, start_response, self._map)
  File "nova/api/openstack/placement/handler.py", line 144, in dispatch
return handler(environ, start_response)
  File 
"/home/cdent/src/nova/.tox/cover/local/lib/python2.7/site-packages/webob/dec.py",
 line 131, in __call__
resp = self.call_func(req, *args, **self.kwargs)
  File "nova/api/openstack/placement/wsgi_wrapper.py", line 29, in call_func
super(PlacementWsgify, self).call_func(req, *args, **kwargs)
  File 
"/home/cdent/src/nova/.tox/cover/local/lib/python2.7/site-packages/webob/dec.py",
 line 196, in call_func
return self.func(req, *args, **kwargs)
  File "nova/api/openstack/placement/microversion.py", line 268, in 
decorated_func
return _find_method(f, version)(req, *args, **kwargs)
  File "nova/api/openstack/placement/util.py", line 138, in decorated_function
return f(req)
  File "nova/api/openstack/placement/handlers/allocation.py", line 285, in 
set_allocations
return _set_allocations(req, ALLOCATION_SCHEMA_V1_8)
  File "nova/api/openstack/placement/handlers/allocation.py", line 249, in 
_set_allocations
allocations.create_all()
  File "nova/objects/resource_provider.py", line 1851, in create_all
self._set_allocations(self._context, self.objects)
  File 
"/home/cdent/src/nova/.tox/cover/local/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py",
 line 979, in wrapper
return fn(*args, **kwargs)
  File "nova/objects/resource_provider.py", line 1811, in _set_allocations
before_gens = _check_capacity_exceeded(conn, allocs)
  File "nova/objects/resource_provider.py", line 1615, in 
_check_capacity_exceeded
usage = usage_map[key]
KeyError: ('14930a42-78df-4038-aafa-c959e18111e5', 2)

** Affects: nova
 Importance: Medium
 Status: Triaged


** Tags: placement

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1704574

Title:
  [placement] attempt to put allocation to resource provide that does
  not host class of resource causes 500

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  I made a typo while writing some gabbi tests and uncovered a 500 in
  the placement service. If you try to allocate to a resource provider
  that does not host that class of resource it can have a KeyError
  during capacity checking. given the following gabbi in microversion
  1.10:

  - name: create a resource provider
POST: /resource_providers
data:
name: an rp
status: 201 
 

  - name: get resource provider
GET: $LOCATION   

[Yahoo-eng-team] [Bug 1619723] Re: in placement api an allocation reporter sometimes needs to be able to report an allocation even if it violates capacity constraints

2017-06-24 Thread Chris Dent
** Changed in: nova
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1619723

Title:
  in placement api an allocation reporter sometimes needs to be able to
  report an allocation even if it violates capacity constraints

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  If a compute node has been reconfigured such that its allocations are
  above its available capacity, the resource tracker still needs to be
  able to report existing allocations without failure so that it doesn't
  get in a stuck state.

  To that end, we will make it so that when sending allocations via a
  PUT, if those allocations are already present in the data store, will
  respond with success but neither write the database, nor update the
  resource provider generation. This allows the resource tracker to know
  "yeah, you've got my data" and feel at peace with the state of the
  world.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1619723/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


  1   2   >