I have a feeling that we really need to make whatever this selection process has clearly defined API boundaries, so that various 'implementation experiments' can be used (and researched on).

Those API boundaries will be what scheduling entities must provide but the implementations could be many things. I have feeling that this is really an on-going area of research and no solution will likely be optimal 'yet' (maybe someday...).

Without even defined API boundaries I start to wonder if this whole exploring will end up just burning out people (when said people find a possible solution but the code won't be accepted due to lack of API boundaries in the first place); I believe gantt was trying to fix this (but I'm not sure of the status of that)?

-Josh

Chris Friesen wrote:
On 07/20/2015 02:04 PM, Clint Byrum wrote:
Excerpts from Chris Friesen's message of 2015-07-20 12:17:29 -0700:

Some questions:

1) Could you elaborate a bit on how this would work? I don't quite
understand
how you would handle a request for booting an instance with a certain
set of
resources--would you queue up a message for each resource?


Please be concrete on what you mean by resource.

I'm suggesting if you only have flavors, which have cpu, ram, disk,
and rx/tx ratios,
then each flavor is a queue. Thats the easiest problem to solve. Then if
you have a single special thing that can only have one VM per host (lets
say, a PCI pass through thing), then thats another iteration of each
flavor. So assuming 3
flavors:

1=tiny cpu=1,ram=1024m,disk=5gb,rxtx=1
2=medium cpu=2,ram=4096m,disk=100gb,rxtx=2
3=large cpu=8,ram=16384,disk=200gb,rxtx=2

This means you have these queues:

reserve
release
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1,pci=1
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2,pci=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2
compute,cpu=8,ram=16384,disk=200gb,rxtx=2pci=1
compute,cpu=8,ram=16384,disk=200gb,rxtx=2

<snip>

Now, I've made this argument in the past, and people have pointed out
that the permutations can get into the tens of thousands very easily
if you start adding lots of dimensions and/or flavors. I suggest that
is no big deal, but maybe I'm biased because I have done something like
that in Gearman and it was, in fact, no big deal.

Yeah, that's what I was worried about. We have things that can be
specified per flavor, and things that can be specified per image, and
things that can be specified per instance, and they all multiply
together.

2) How would it handle stuff like weight functions where you could
have multiple
compute nodes that *could* satisfy the requirement but some of them
would be
"better" than others by some arbitrary criteria.


Can you provide a concrete example? Feels like I'm asking for a straw
man to be built. ;)

Well, as an example we have a cluster that is aimed at high-performance
network processing and so all else being equal they will choose the
compute node with the least network traffic. You might also try to pack
instances together for power efficiency (allowing you to turn off unused
compute nodes), or choose the compute node that results in the tightest
packing (to minimize unused resources).

3) The biggest improvement I'd like to see is in group scheduling.
Suppose I
want to schedule multiple instances, each with their own resource
requirements,
but also with interdependency between them (these ones on the same
node, these
ones not on the same node, these ones with this provider network,
etc.) The
scheduler could then look at the whole request all at once and
optimize it
rather than looking at each piece separately. That could also allow
relocating
multiple instances that want to be co-located on the same compute node.


So, if the grouping is arbitrary, then there's no way to pre-calculate
the
group size, I agree. I am wont to pursue something like this though,
as I
don't really think this is the kind of optimization that cloud workloads
should be built on top of. If you need two processes to have low
latency,
why not just boot a bigger machine and do it all in one VM? There are a
few reasons I can think of, but I wonder how many are in the general
case?

It's a fair question. :) I honestly don't know...I was just thinking
that we allow the expression of affinity/anti-affinity policies via
server groups, but the scheduler doesn't really do a good job of
actually scheduling those groups.

Chris

__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to