Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

Murray, Paul (HP Cloud) Fri, 09 Jan 2015 06:10:28 -0800

>There is bug when running nova with ironic 
>https://bugs.launchpad.net/nova/+bug/1402658


I filed this bug – it has been a problem for us.

>The problem is at scheduler side the IronicHostManager will consume all the 
>resources for that node whatever
>how much resource the instance used. But at compute node side, the 
>ResourceTracker won't consume resources
>like that, just consume like normal virtual instance. And ResourceTracker will 
>update the resource usage once the
>instance resource claimed, then scheduler will know there are some free 
>resource on that node, then will try to
>schedule other new instance to that node

You have summed up the problem nicely – i.e.: the resource availability is 
calculated incorrectly for ironic nodes.

>I take look at that, there is NumInstanceFilter, it will limit how many 
>instance can schedule to one host. So can
>we just use this filter to finish the goal? The max instance is configured by 
>option 'max_instances_per_host', we
>can make the virt driver to report how many instances it supported. The ironic 
>driver can just report max_instances_per_host=1.
>And libvirt driver can report max_instance_per_host=-1, that means no limit. 
>And then we can just remove the
>IronicHostManager, then make the scheduler side is more simpler. Does make 
>sense? or there are more trap?


Makes sense, but solves the wrong problem. The problem is what you said above – 
i.e.: the resource availability is calculated incorrectly for ironic nodes.
The right solution would be to fix the resource tracker. The ram resource on an 
ironic node has different allocation behavior to a regular node. The test to 
see if a new instance fits is the same, but instead of deducting the requested 
amount to get the remaining availability it should simply return 0. This should 
be dealt with in the new resource objects ([2] below) by either having 
different version of the resource object for ironic nodes (certainly doable and 
the most sensible option – resources should be presented according to the 
resources on the host). Alternatively the ram resource object should cater for 
the difference in its calculations.
I have a local fix for this that I was too shy to propose upstream because it’s 
a bit hacky and will hopefully be obsolete soon. I could share it if you like.
Paul
[2] https://review.openstack.org/#/c/127609/


From: Sylvain Bauza <sba...@redhat.com<mailto:sba...@redhat.com>>
Date: 9 January 2015 at 09:17
Subject: Re: [openstack-dev] [Nova][Ironic] Question about scheduling two 
instances to same baremetal node
To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>


Le 09/01/2015 09:01, Alex Xu a écrit :
Hi, All

There is bug when running nova with ironic 
https://bugs.launchpad.net/nova/+bug/1402658

The case is simple: one baremetal node with 1024MB ram, then boot two instances 
with 512MB ram flavor.
Those two instances will be scheduling to same baremetal node.

The problem is at scheduler side the IronicHostManager will consume all the 
resources for that node whatever
how much resource the instance used. But at compute node side, the 
ResourceTracker won't consume resources
like that, just consume like normal virtual instance. And ResourceTracker will 
update the resource usage once the
instance resource claimed, then scheduler will know there are some free 
resource on that node, then will try to
schedule other new instance to that node.

I take look at that, there is NumInstanceFilter, it will limit how many 
instance can schedule to one host. So can
we just use this filter to finish the goal? The max instance is configured by 
option 'max_instances_per_host', we
can make the virt driver to report how many instances it supported. The ironic 
driver can just report max_instances_per_host=1.
And libvirt driver can report max_instance_per_host=-1, that means no limit. 
And then we can just remove the
IronicHostManager, then make the scheduler side is more simpler. Does make 
sense? or there are more trap?

Thanks in advance for any feedback and suggestion.


Mmm, I think I disagree with your proposal. Let me explain by the best I can 
why :

tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong

The ResourceTracker should be only a module for providing stats about compute 
nodes to the Scheduler.
How the Scheduler is consuming these resources for making a decision should 
only be a Scheduler thing.

Here, the problem is that the decision making is also shared with the 
ResourceTracker because of the claiming system managed by the context manager 
when booting an instance. It means that we have 2 distinct decision makers for 
validating a resource.

Let's stop to be realistic for a moment and discuss about what could mean a 
decision for something else than a compute node. Ok, let say a volume.
Provided that *something* would report the volume statistics to the Scheduler, 
that would be the Scheduler which would manage if a volume manager could accept 
a volume request. There is no sense to validate the decision of the Scheduler 
on the volume manager, just maybe doing some error management.

We know that the current model is kinda racy with Ironic because there is a 
2-stage validation (see [1]). I'm not in favor of complexifying the model, but 
rather put all the claiming logic in the scheduler, which is a longer path to 
win, but a safier one.

-Sylvain

[1]  https://bugs.launchpad.net/nova/+bug/1341420


Thanks
Alex


_______________________________________________

OpenStack-dev mailing list

OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

Reply via email to