On Mon, 5 Jun 2017, Ed Leafe wrote:

One proposal is to essentially use the same logic in placement
that was used to include that host in those matching the
requirements. In other words, when it tries to allocate the amount
of disk, it would determine that that host is in a shared storage
aggregate, and be smart enough to allocate against that provider.
This was referred to in our discussion as "Plan A".

What would help for me is greater explanation of if and if so, how and
why, "Plan A" doesn't work for nested resource providers.

We can declare that allocating for shared disk is fairly deterministic
if we assume that any given compute node is only associated with one
shared disk provider.

My understanding is this determinism is not the case with nested
resource providers because there's some fairly late in the game
choosing of which pci device or which numa cell is getting used.
The existing resource tracking doesn't have this problem because the
claim of those resources is made very late in the game. < Is this
correct?

The problem comes into play when we want to claim from the scheduler
(or conductor). Additional information is required to choose which
child providers to use. <- Is this correct?

Plan B overcomes the information deficit by including more
information in the response from placement (as straw-manned in the
etherpad [1]) allowing code in the filter scheduler to make accurate
claims. <- Is this correct?

For clarity and completeness in the discussion some questions for
which we have explicit answers would be useful. Some of these may
appear ignorant or obtuse and are mostly things we've been over
before. The goal is to draw out some clear statements in the present
day to be sure we are all talking about the same thing (or get us
there if not) modified for what we know now, compared to what we
knew a week or month ago.

* We already have the information the filter scheduler needs now by
  some other means, right?  What are the reasons we don't want to
  use that anymore?

* Part of the reason for having nested resource providers is because
  it can allow affinity/anti-affinity below the compute node (e.g.,
  workloads on the same host but different numa cells). If I
  remember correctly, the modelling and tracking of this kind of
  information in this way comes out of the time when we imagined the
  placement service would be doing considerably more filtering than
  is planned now. Plan B appears to be an acknowledgement of "on
  some of this stuff, we can't actually do anything but provide you
  some info, you need to decide". If that's the case, is the
  topological modelling on the placement DB side of things solely a
  convenient place to store information? If there were some other
  way to model that topology could things currently being considered
  for modelling as nested providers be instead simply modelled as
  inventories of a particular class of resource?
  (I'm not suggesting we do this, rather that the answer that says
  why we don't want to do this is useful for understanding the
  picture.)

* Does a claim made in the scheduler need to be complete? Is there
  value in making a partial claim from the scheduler that consumes a
  vcpu and some ram, and then in the resource tracker is corrected
  to consume a specific pci device, numa cell, gpu and/or fpga?
  Would this be better or worse than what we have now? Why?

* What is lacking in placement's representation of resource providers
  that makes it difficult or impossible for an allocation against a
  parent provider to be able to determine the correct child
  providers to which to cascade some of the allocation? (And by
  extension make the earlier scheduling decision.)

That's a start. With answers to at last some of these questions I
think the straw man in the etherpad can be more effectively
evaluated. As things stand right now it is a proposed solution
without a clear problem statement. I feel like we could do with a
more clear problem statement.

Thanks.

[1] https://etherpad.openstack.org/p/placement-allocations-straw-man

--
Chris Dent                  ┬──┬◡ノ(° -°ノ)       https://anticdent.org/
freenode: cdent                                         tw: @anticdent
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to