Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

Sylvain Bauza Wed, 13 Aug 2014 09:06:58 -0700


Le 13/08/2014 12:21, Sylvain Bauza a écrit :

Le 12/08/2014 22:06, Sylvain Bauza a écrit :
Le 12/08/2014 18:54, Nikola Đipanov a écrit :
On 08/12/2014 04:49 PM, Sylvain Bauza wrote:
(sorry for reposting, missed 2 links...)

Hi Nikola,

Le 12/08/2014 12:21, Nikola Đipanov a écrit :
Hey Nova-istas,

While I was hacking on [1] I was considering how to approach the fact
that we now need to track one more thing (NUMA node utilization)in ourresources. I went with - "I'll add it to compute nodes table"thinkingit's a fundamental enough property of a compute host that itdeserves tobe there, although I was considering Extensible Resource Trackerat onepoint (ERT from now on - see [2]) but looking at the code - it didnotseem to provide anything I desperately needed, so I went withkeeping it
simple.
So fast-forward a few days, and I caught myself solving a problemthat I
kept thinking ERT should have solved - but apparently hasn't, and I
think it is fundamentally a broken design without it - so I'd really
like to see it re-visited.
The problem can be described by the following lemma (if you take'lemma'
to mean 'a sentence I came up with just now' :)):

"""
Due to the way scheduling works in Nova (roughly: pick a hostbased onstale(ish) data, rely on claims to trigger a re-schedule), _sameexact_
information that scheduling service used when making a placement
decision, needs to be available to the compute service whentesting the
placement.
"""
This is not the case right now, and the ERT does not propose anyway to
solve it - (see how I hacked around needing to be able to get
extra_specs when making claims in [3], without hammering the DB). The
result will be that any resource that we add and needs user supplied
info for scheduling an instance against it, will need a buggy
re-implementation of gathering all the bits from the request that
scheduler sees, to be able to work properly.
Well, ERT does provide a plugin mechanism for testing resources at the
claim level. This is the plugin responsibility to implement a test()
method [2.1] which will be called when test_claim() [2.2]
So, provided this method is implemented, a local host check can bedone
based on the host's view of resources.
Yes - the problem is there is no clear API to get all the neededbits to
do so - especially the user supplied one from image and flavors.
On top of that, in current implementation we only pass a hand-wavy
'usage' blob in. This makes anyone wanting to use this in conjunction
with some of the user supplied bits roll their own
'extract_data_from_instance_metadata_flavor_image' or similar which is
horrible and also likely bad for performance.
I see your concern where there is no interface for user-facingresources like flavor or image metadata.I also think indeed that the big 'usage' blob is not a good choicefor long-term vision.
That said, I don't think as we say in French to throw the bathwater... ie. the problem is with the RT, not the ERT (apart themention of third-party API that you noted - I'll go to it later below)
This is obviously a bigger concern when we want to allow users topassdata (through image or flavor) that can affect scheduling, butstill a
huge concern IMHO.
And here is where I agree with you : at the moment, ResourceTracker(andconsequently Extensible RT) only provides the view of the resourcesthe
host is knowing (see my point above) and possibly some other resources
are missing.
So, whatever your choice of going with or without ERT, your patch [3]
still deserves it if we want not to lookup DB each time a claim goes.
As I see that there are already BPs proposing to use this IMHO broken
ERT ([4] for example), which will surely add to the proliferation of
code that hacks around these design shortcomings in what is already a
messy, but also crucial (for perf as well as features) bit of Novacode.
Two distinct implementations of that spec (ie. instances and flavors)
have been proposed [2.3] [2.4] so reviews are welcome. If you see the
test() method, it's no-op thing for both plugins. I'm open to comments
because I have the stated problem : how can we define a limit onjust a
counter of instances and flavors ?
Will look at these - but none of them seem to hit the issue I am
complaining about, and that is that it will need to consider other
request data for claims, not only data available for on instances.

Also - the fact that you don't implement test() in flavor ones tells me
that the implementation is indeed racy (but it is racy atm as well) and
two requests can indeed race for the same host, and since no claims are
done, both can succeed. This is I believe (at least in case of single
flavor hosts) unlikely to happen in practice, but you get the idea.
Agreed, these 2 patches probably require another iteration, inparticular how we make sure that it won't be racy. So I need anotherrun to think about what to test() for these 2 examples.Another patch has to be done for aggregates, but it's still WIP sonot mentioned here.
Anyway, as discussed during today's meeting, these 2 patches will notbe based on ERT because of the risk it goes for Juno so let's scopethem out of this thread.
I propose to revert [2] ASAP since it is still fresh, and see howwe can
come up with a cleaner design.

Would like to hear opinions on this, before I propose the patch tho!
IMHO, I think the problem is more likely that the regular RT missessomeinformation for each host so it requires to handle it on acase-by-case
basis, but I don't think ERT either increases complexity or creates
another issue.
RT does not miss info about the host, but about the particular request
which we have to fish out of different places like image_metadata
extra_specs etc, yet - it can't really work without them. This is
definitely a RT issue that is not specific to ERT.
+1, I agree with you, that's an API issue for RT : how do we pass outuser-defined metrics ?I still need to figure out which kind of usecases are requiring suchexamples, albeit the NUMA usecase of course.
After a night and morning of thinking about this problem, I'm writingdown some ideas that could help fixing the problem (although I don'tthink it's a long-term scenario, rather a workaround) :
Considering you need the metadata of the flavor asked by user whenclaiming a resource :1. Compute Manager knows request_spec when building an instance, so itcould pass out request_spec as paramater (optional or not) tort.instance_claim() in the claim context2. As ERT does, when claiming, it calls the ERT plugin .test() methodso with request_spec as extra parameter3. On the ERT side, a plugin is responsible for calling (and caching)all flavors metadata, or even only the key you need to compare4. Scheduler knows flavors metadata thanks to ERT plugin and blueprintisolate-scheduler-db so it makes decisions on the same subset as the RT5. When claiming in ERT plugin test(), it extracts the flavor_id,looks at the internal in-memory representation of flavors to get themetadata, and calls the NUMA _test_numa_topology() method for checkingwrt this flavor metadata
The idea is that so filters and claims are doing separate decisions onthe same source of knowledge, here being ERT.
-Sylvain

Just made a draft over my thoughts here :https://review.openstack.org/113936

-S

However, I still see several issues with the current ERTimplementation,
but the one I am getting at here, and why I think we should revert it,
is that we are building a 3rd party plugin API that is tightly coupled
to an obviously flawed internal API (RT and Claims).

We have no policy AFIK about what guarantees we provide to 3rd party
plugins, but if we are going to be breaking them all the time, or in
this case providing very limited usefulness - I see little value in the
current implementation of ERT, and see issues with it staying, which
means future work will be based on it.

N.
*This* is to me IMHO the biggest problem with ERT : if we say that weexternalize an API for 3rd-party plugins, we need to make sure thateverything is handled.
That said, instead of reverting the whole patch, could we justconsider to only accept changes that wouldn't require user-facingmetrics ?The existing VCPU implementation still sounds good to me, so we canjust consider a clear communication on what is acceptable and whatnot (ie. a docstring in the plugin API or so., plus -2/-1 reviews ifneeded)
-Sylvain
Thanks,
-Sylvain
Thanks all,

Nikola

[1]
https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
[2] https://review.openstack.org/#/c/109643/
[3] https://review.openstack.org/#/c/111782/
[4] https://review.openstack.org/#/c/89893

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[2.1]
https://github.com/openstack/nova/blob/master/nova/compute/resources/__init__.py#L75
[2.2]
https://github.com/openstack/nova/blob/master/nova/compute/claims.py#L134
[2.3] https://review.openstack.org/112578
[2.4] https://review.openstack.org/113373



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

Reply via email to