Sorry for the top-post, but it seems that nobody has responded to this, and there are a lot of important questions that need answers. So I’m simply re-posting this so that we don’t get too ahead of ourselves, by planning implementations before we fully understand the problem and the implications of any proposed solution.
-- Ed Leafe > On Jun 6, 2017, at 9:56 AM, Chris Dent <[email protected]> wrote: > > On Mon, 5 Jun 2017, Ed Leafe wrote: > >> One proposal is to essentially use the same logic in placement >> that was used to include that host in those matching the >> requirements. In other words, when it tries to allocate the amount >> of disk, it would determine that that host is in a shared storage >> aggregate, and be smart enough to allocate against that provider. >> This was referred to in our discussion as "Plan A". > > What would help for me is greater explanation of if and if so, how and > why, "Plan A" doesn't work for nested resource providers. > > We can declare that allocating for shared disk is fairly deterministic > if we assume that any given compute node is only associated with one > shared disk provider. > > My understanding is this determinism is not the case with nested > resource providers because there's some fairly late in the game > choosing of which pci device or which numa cell is getting used. > The existing resource tracking doesn't have this problem because the > claim of those resources is made very late in the game. < Is this > correct? > > The problem comes into play when we want to claim from the scheduler > (or conductor). Additional information is required to choose which > child providers to use. <- Is this correct? > > Plan B overcomes the information deficit by including more > information in the response from placement (as straw-manned in the > etherpad [1]) allowing code in the filter scheduler to make accurate > claims. <- Is this correct? > > For clarity and completeness in the discussion some questions for > which we have explicit answers would be useful. Some of these may > appear ignorant or obtuse and are mostly things we've been over > before. The goal is to draw out some clear statements in the present > day to be sure we are all talking about the same thing (or get us > there if not) modified for what we know now, compared to what we > knew a week or month ago. > > * We already have the information the filter scheduler needs now by > some other means, right? What are the reasons we don't want to > use that anymore? > > * Part of the reason for having nested resource providers is because > it can allow affinity/anti-affinity below the compute node (e.g., > workloads on the same host but different numa cells). If I > remember correctly, the modelling and tracking of this kind of > information in this way comes out of the time when we imagined the > placement service would be doing considerably more filtering than > is planned now. Plan B appears to be an acknowledgement of "on > some of this stuff, we can't actually do anything but provide you > some info, you need to decide". If that's the case, is the > topological modelling on the placement DB side of things solely a > convenient place to store information? If there were some other > way to model that topology could things currently being considered > for modelling as nested providers be instead simply modelled as > inventories of a particular class of resource? > (I'm not suggesting we do this, rather that the answer that says > why we don't want to do this is useful for understanding the > picture.) > > * Does a claim made in the scheduler need to be complete? Is there > value in making a partial claim from the scheduler that consumes a > vcpu and some ram, and then in the resource tracker is corrected > to consume a specific pci device, numa cell, gpu and/or fpga? > Would this be better or worse than what we have now? Why? > > * What is lacking in placement's representation of resource providers > that makes it difficult or impossible for an allocation against a > parent provider to be able to determine the correct child > providers to which to cascade some of the allocation? (And by > extension make the earlier scheduling decision.) > > That's a start. With answers to at last some of these questions I > think the straw man in the etherpad can be more effectively > evaluated. As things stand right now it is a proposed solution > without a clear problem statement. I feel like we could do with a > more clear problem statement. > > Thanks. > > [1] https://etherpad.openstack.org/p/placement-allocations-straw-man > > -- > Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ > freenode: cdent tw: > @anticdent__________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
