Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

2015-01-09 Thread Alex Xu
2015-01-09 22:22 GMT+08:00 Sylvain Bauza sba...@redhat.com:


 Le 09/01/2015 14:58, Alex Xu a écrit :



 2015-01-09 17:17 GMT+08:00 Sylvain Bauza sba...@redhat.com:


 Le 09/01/2015 09:01, Alex Xu a écrit :

 Hi, All

  There is bug when running nova with ironic
 https://bugs.launchpad.net/nova/+bug/1402658

  The case is simple: one baremetal node with 1024MB ram, then boot two
 instances with 512MB ram flavor.
 Those two instances will be scheduling to same baremetal node.

  The problem is at scheduler side the IronicHostManager will consume all
 the resources for that node whatever
 how much resource the instance used. But at compute node side, the
 ResourceTracker won't consume resources
 like that, just consume like normal virtual instance. And ResourceTracker
 will update the resource usage once the
 instance resource claimed, then scheduler will know there are some free
 resource on that node, then will try to
 schedule other new instance to that node.

  I take look at that, there is NumInstanceFilter, it will limit how many
 instance can schedule to one host. So can
 we just use this filter to finish the goal? The max instance is
 configured by option 'max_instances_per_host', we
 can make the virt driver to report how many instances it supported. The
 ironic driver can just report max_instances_per_host=1.
 And libvirt driver can report max_instance_per_host=-1, that means no
 limit. And then we can just remove the
 IronicHostManager, then make the scheduler side is more simpler. Does
 make sense? or there are more trap?

  Thanks in advance for any feedback and suggestion.



  Mmm, I think I disagree with your proposal. Let me explain by the best
 I can why :

 tl;dr: Any proposal unless claiming at the scheduler level tends to be
 wrong

 The ResourceTracker should be only a module for providing stats about
 compute nodes to the Scheduler.
 How the Scheduler is consuming these resources for making a decision
 should only be a Scheduler thing.


  agreed, but we can't implement this for now, the reason is you described
 as below.



 Here, the problem is that the decision making is also shared with the
 ResourceTracker because of the claiming system managed by the context
 manager when booting an instance. It means that we have 2 distinct decision
 makers for validating a resource.


  Totally agreed! This is the root cause.


  Let's stop to be realistic for a moment and discuss about what could
 mean a decision for something else than a compute node. Ok, let say a
 volume.
 Provided that *something* would report the volume statistics to the
 Scheduler, that would be the Scheduler which would manage if a volume
 manager could accept a volume request. There is no sense to validate the
 decision of the Scheduler on the volume manager, just maybe doing some
 error management.

 We know that the current model is kinda racy with Ironic because there is
 a 2-stage validation (see [1]). I'm not in favor of complexifying the
 model, but rather put all the claiming logic in the scheduler, which is a
 longer path to win, but a safier one.


  Yea, I have thought about add same resource consume at compute manager
 side, but it's ugly because we implement ironic's resource consuming method
 in two places. If we move the claiming in the scheduler the thing will
 become easy, we can just provide some extension for different consuming
 method (If I understand right the discussion in the IRC). As gantt will be
 standalone service, so validating a resource shouldn't spread into
 different service. So I agree with you.

  But for now, as you said this is long term plan. We can't provide
 different resource consuming in compute manager side now, also can't move
 the claiming into scheduler now. So the method I proposed is more easy for
 now, at least we won't have different resource consuming way between
 scheduler(IonricHostManger) and compute(ResourceTracker) for ironic. And
 ironic can works fine.

  The method I propose have a little problem. When all the node allocated,
 we still can saw there are some resource are free if the flavor's resource
 is less than baremetal's resource. But it can be done by expose
 max_instance to hypervisor api(running instances already exposed), then
 user will now why can't allocated more instance. And if we can configure
 max_instance for each node, sounds like useful for operator also :)



 I think that if you don't want to wait for the claiming system to happen
 in the Scheduler, then at least you need to fix the current way of using
 the ResourceTracker, like what Jay Pipes is working on in his spec.


I'm with your guys at same line now :)





 -Sylvain


 -Sylvain

 [1]  https://bugs.launchpad.net/nova/+bug/1341420

  Thanks
 Alex


 ___
 OpenStack-dev mailing 
 listOpenStack-dev@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 

[openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

2015-01-09 Thread Alex Xu
Hi, All

There is bug when running nova with ironic
https://bugs.launchpad.net/nova/+bug/1402658

The case is simple: one baremetal node with 1024MB ram, then boot two
instances with 512MB ram flavor.
Those two instances will be scheduling to same baremetal node.

The problem is at scheduler side the IronicHostManager will consume all the
resources for that node whatever
how much resource the instance used. But at compute node side, the
ResourceTracker won't consume resources
like that, just consume like normal virtual instance. And ResourceTracker
will update the resource usage once the
instance resource claimed, then scheduler will know there are some free
resource on that node, then will try to
schedule other new instance to that node.

I take look at that, there is NumInstanceFilter, it will limit how many
instance can schedule to one host. So can
we just use this filter to finish the goal? The max instance is configured
by option 'max_instances_per_host', we
can make the virt driver to report how many instances it supported. The
ironic driver can just report max_instances_per_host=1.
And libvirt driver can report max_instance_per_host=-1, that means no
limit. And then we can just remove the
IronicHostManager, then make the scheduler side is more simpler. Does make
sense? or there are more trap?

Thanks in advance for any feedback and suggestion.

Thanks
Alex
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

2015-01-09 Thread Sylvain Bauza


Le 09/01/2015 09:01, Alex Xu a écrit :

Hi, All

There is bug when running nova with ironic 
https://bugs.launchpad.net/nova/+bug/1402658


The case is simple: one baremetal node with 1024MB ram, then boot two 
instances with 512MB ram flavor.

Those two instances will be scheduling to same baremetal node.

The problem is at scheduler side the IronicHostManager will consume 
all the resources for that node whatever
how much resource the instance used. But at compute node side, the 
ResourceTracker won't consume resources
like that, just consume like normal virtual instance. And 
ResourceTracker will update the resource usage once the
instance resource claimed, then scheduler will know there are some 
free resource on that node, then will try to

schedule other new instance to that node.

I take look at that, there is NumInstanceFilter, it will limit how 
many instance can schedule to one host. So can
we just use this filter to finish the goal? The max instance is 
configured by option 'max_instances_per_host', we
can make the virt driver to report how many instances it supported. 
The ironic driver can just report max_instances_per_host=1.
And libvirt driver can report max_instance_per_host=-1, that means no 
limit. And then we can just remove the
IronicHostManager, then make the scheduler side is more simpler. Does 
make sense? or there are more trap?


Thanks in advance for any feedback and suggestion.




Mmm, I think I disagree with your proposal. Let me explain by the best I 
can why :


tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong

The ResourceTracker should be only a module for providing stats about 
compute nodes to the Scheduler.
How the Scheduler is consuming these resources for making a decision 
should only be a Scheduler thing.


Here, the problem is that the decision making is also shared with the 
ResourceTracker because of the claiming system managed by the context 
manager when booting an instance. It means that we have 2 distinct 
decision makers for validating a resource.


Let's stop to be realistic for a moment and discuss about what could 
mean a decision for something else than a compute node. Ok, let say a 
volume.
Provided that *something* would report the volume statistics to the 
Scheduler, that would be the Scheduler which would manage if a volume 
manager could accept a volume request. There is no sense to validate the 
decision of the Scheduler on the volume manager, just maybe doing some 
error management.


We know that the current model is kinda racy with Ironic because there 
is a 2-stage validation (see [1]). I'm not in favor of complexifying the 
model, but rather put all the claiming logic in the scheduler, which is 
a longer path to win, but a safier one.


-Sylvain

[1]  https://bugs.launchpad.net/nova/+bug/1341420


Thanks
Alex


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

2015-01-09 Thread Murray, Paul (HP Cloud)
There is bug when running nova with ironic 
https://bugs.launchpad.net/nova/+bug/1402658

I filed this bug – it has been a problem for us.

The problem is at scheduler side the IronicHostManager will consume all the 
resources for that node whatever
how much resource the instance used. But at compute node side, the 
ResourceTracker won't consume resources
like that, just consume like normal virtual instance. And ResourceTracker will 
update the resource usage once the
instance resource claimed, then scheduler will know there are some free 
resource on that node, then will try to
schedule other new instance to that node

You have summed up the problem nicely – i.e.: the resource availability is 
calculated incorrectly for ironic nodes.

I take look at that, there is NumInstanceFilter, it will limit how many 
instance can schedule to one host. So can
we just use this filter to finish the goal? The max instance is configured by 
option 'max_instances_per_host', we
can make the virt driver to report how many instances it supported. The ironic 
driver can just report max_instances_per_host=1.
And libvirt driver can report max_instance_per_host=-1, that means no limit. 
And then we can just remove the
IronicHostManager, then make the scheduler side is more simpler. Does make 
sense? or there are more trap?


Makes sense, but solves the wrong problem. The problem is what you said above – 
i.e.: the resource availability is calculated incorrectly for ironic nodes.
The right solution would be to fix the resource tracker. The ram resource on an 
ironic node has different allocation behavior to a regular node. The test to 
see if a new instance fits is the same, but instead of deducting the requested 
amount to get the remaining availability it should simply return 0. This should 
be dealt with in the new resource objects ([2] below) by either having 
different version of the resource object for ironic nodes (certainly doable and 
the most sensible option – resources should be presented according to the 
resources on the host). Alternatively the ram resource object should cater for 
the difference in its calculations.
I have a local fix for this that I was too shy to propose upstream because it’s 
a bit hacky and will hopefully be obsolete soon. I could share it if you like.
Paul
[2] https://review.openstack.org/#/c/127609/


From: Sylvain Bauza sba...@redhat.commailto:sba...@redhat.com
Date: 9 January 2015 at 09:17
Subject: Re: [openstack-dev] [Nova][Ironic] Question about scheduling two 
instances to same baremetal node
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org


Le 09/01/2015 09:01, Alex Xu a écrit :
Hi, All

There is bug when running nova with ironic 
https://bugs.launchpad.net/nova/+bug/1402658

The case is simple: one baremetal node with 1024MB ram, then boot two instances 
with 512MB ram flavor.
Those two instances will be scheduling to same baremetal node.

The problem is at scheduler side the IronicHostManager will consume all the 
resources for that node whatever
how much resource the instance used. But at compute node side, the 
ResourceTracker won't consume resources
like that, just consume like normal virtual instance. And ResourceTracker will 
update the resource usage once the
instance resource claimed, then scheduler will know there are some free 
resource on that node, then will try to
schedule other new instance to that node.

I take look at that, there is NumInstanceFilter, it will limit how many 
instance can schedule to one host. So can
we just use this filter to finish the goal? The max instance is configured by 
option 'max_instances_per_host', we
can make the virt driver to report how many instances it supported. The ironic 
driver can just report max_instances_per_host=1.
And libvirt driver can report max_instance_per_host=-1, that means no limit. 
And then we can just remove the
IronicHostManager, then make the scheduler side is more simpler. Does make 
sense? or there are more trap?

Thanks in advance for any feedback and suggestion.


Mmm, I think I disagree with your proposal. Let me explain by the best I can 
why :

tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong

The ResourceTracker should be only a module for providing stats about compute 
nodes to the Scheduler.
How the Scheduler is consuming these resources for making a decision should 
only be a Scheduler thing.

Here, the problem is that the decision making is also shared with the 
ResourceTracker because of the claiming system managed by the context manager 
when booting an instance. It means that we have 2 distinct decision makers for 
validating a resource.

Let's stop to be realistic for a moment and discuss about what could mean a 
decision for something else than a compute node. Ok, let say a volume.
Provided that *something* would report the volume statistics to the Scheduler, 
that would

Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

2015-01-09 Thread Sylvain Bauza


Le 09/01/2015 15:07, Murray, Paul (HP Cloud) a écrit :


There is bug when running nova with ironic 
https://bugs.launchpad.net/nova/+bug/1402658


I filed this bug – it has been a problem for us.

The problem is at scheduler side the IronicHostManager will consume 
all the resources for that node whatever


how much resource the instance used. But at compute node side, the 
ResourceTracker won't consume resources


like that, just consume like normal virtual instance. And 
ResourceTracker will update the resource usage once the


instance resource claimed, then scheduler will know there are some 
free resource on that node, then will try to


schedule other new instance to that node

You have summed up the problem nicely – i.e.: the resource 
availability is calculated incorrectly for ironic nodes.


I take look at that, there is NumInstanceFilter, it will limit how 
many instance can schedule to one host. So can


we just use this filter to finish the goal? The max instance is 
configured by option 'max_instances_per_host', we


can make the virt driver to report how many instances it supported. 
The ironic driver can just report max_instances_per_host=1.


And libvirt driver can report max_instance_per_host=-1, that means no 
limit. And then we can just remove the


IronicHostManager, then make the scheduler side is more simpler. Does 
make sense? or there are more trap?


Makes sense, but solves the wrong problem. The problem is what you 
said above – i.e.: the resource availability is calculated incorrectly 
for ironic nodes.


The right solution would be to fix the resource tracker. The ram 
resource on an ironic node has different allocation behavior to a 
regular node. The test to see if a new instance fits is the same, but 
instead of deducting the requested amount to get the remaining 
availability it should simply return 0. This should be dealt with in 
the new resource objects ([2] below) by either having different 
version of the resource object for ironic nodes (certainly doable and 
the most sensible option – resources should be presented according to 
the resources on the host). Alternatively the ram resource object 
should cater for the difference in its calculations.


I have a local fix for this that I was too shy to propose upstream 
because it’s a bit hacky and will hopefully be obsolete soon. I could 
share it if you like.


Paul

[2] https://review.openstack.org/#/c/127609/



Agreed, I think that [2] will help a lot. Until it's done, are we really 
sure we want to fix the bug ? It can be workarounded by creating flavors 
being at least half the compute nodes and I really would like adding 
more tech debt.


-Sylvain


From: *Sylvain Bauza* sba...@redhat.com mailto:sba...@redhat.com
Date: 9 January 2015 at 09:17
Subject: Re: [openstack-dev] [Nova][Ironic] Question about scheduling 
two instances to same baremetal node
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.org 
mailto:openstack-dev@lists.openstack.org


Le 09/01/2015 09:01, Alex Xu a écrit :

Hi, All

There is bug when running nova with ironic
https://bugs.launchpad.net/nova/+bug/1402658

The case is simple: one baremetal node with 1024MB ram, then boot
two instances with 512MB ram flavor.

Those two instances will be scheduling to same baremetal node.

The problem is at scheduler side the IronicHostManager will
consume all the resources for that node whatever

how much resource the instance used. But at compute node side, the
ResourceTracker won't consume resources

like that, just consume like normal virtual instance. And
ResourceTracker will update the resource usage once the

instance resource claimed, then scheduler will know there are some
free resource on that node, then will try to

schedule other new instance to that node.

I take look at that, there is NumInstanceFilter, it will limit how
many instance can schedule to one host. So can

we just use this filter to finish the goal? The max instance is
configured by option 'max_instances_per_host', we

can make the virt driver to report how many instances it
supported. The ironic driver can just report max_instances_per_host=1.

And libvirt driver can report max_instance_per_host=-1, that means
no limit. And then we can just remove the

IronicHostManager, then make the scheduler side is more simpler.
Does make sense? or there are more trap?

Thanks in advance for any feedback and suggestion.

Mmm, I think I disagree with your proposal. Let me explain by the best 
I can why :


tl;dr: Any proposal unless claiming at the scheduler level tends to be 
wrong


The ResourceTracker should be only a module for providing stats about 
compute nodes to the Scheduler.
How the Scheduler is consuming these resources for making a decision 
should only be a Scheduler thing.


Here, the problem is that the decision making is also

Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

2015-01-09 Thread Sylvain Bauza


Le 09/01/2015 14:58, Alex Xu a écrit :



2015-01-09 17:17 GMT+08:00 Sylvain Bauza sba...@redhat.com 
mailto:sba...@redhat.com:



Le 09/01/2015 09:01, Alex Xu a écrit :

Hi, All

There is bug when running nova with ironic
https://bugs.launchpad.net/nova/+bug/1402658

The case is simple: one baremetal node with 1024MB ram, then boot
two instances with 512MB ram flavor.
Those two instances will be scheduling to same baremetal node.

The problem is at scheduler side the IronicHostManager will
consume all the resources for that node whatever
how much resource the instance used. But at compute node side,
the ResourceTracker won't consume resources
like that, just consume like normal virtual instance. And
ResourceTracker will update the resource usage once the
instance resource claimed, then scheduler will know there are
some free resource on that node, then will try to
schedule other new instance to that node.

I take look at that, there is NumInstanceFilter, it will limit
how many instance can schedule to one host. So can
we just use this filter to finish the goal? The max instance is
configured by option 'max_instances_per_host', we
can make the virt driver to report how many instances it
supported. The ironic driver can just report
max_instances_per_host=1.
And libvirt driver can report max_instance_per_host=-1, that
means no limit. And then we can just remove the
IronicHostManager, then make the scheduler side is more simpler.
Does make sense? or there are more trap?

Thanks in advance for any feedback and suggestion.




Mmm, I think I disagree with your proposal. Let me explain by the
best I can why :

tl;dr: Any proposal unless claiming at the scheduler level tends
to be wrong

The ResourceTracker should be only a module for providing stats
about compute nodes to the Scheduler.
How the Scheduler is consuming these resources for making a
decision should only be a Scheduler thing.


agreed, but we can't implement this for now, the reason is you 
described as below.



Here, the problem is that the decision making is also shared with
the ResourceTracker because of the claiming system managed by the
context manager when booting an instance. It means that we have 2
distinct decision makers for validating a resource.


Totally agreed! This is the root cause.

Let's stop to be realistic for a moment and discuss about what
could mean a decision for something else than a compute node. Ok,
let say a volume.
Provided that *something* would report the volume statistics to
the Scheduler, that would be the Scheduler which would manage if a
volume manager could accept a volume request. There is no sense to
validate the decision of the Scheduler on the volume manager, just
maybe doing some error management.

We know that the current model is kinda racy with Ironic because
there is a 2-stage validation (see [1]). I'm not in favor of
complexifying the model, but rather put all the claiming logic in
the scheduler, which is a longer path to win, but a safier one.


Yea, I have thought about add same resource consume at compute manager 
side, but it's ugly because we implement ironic's resource consuming 
method in two places. If we move the claiming in the scheduler the 
thing will become easy, we can just provide some extension for 
different consuming method (If I understand right the discussion in 
the IRC). As gantt will be standalone service, so validating a 
resource shouldn't spread into different service. So I agree with you.


But for now, as you said this is long term plan. We can't provide 
different resource consuming in compute manager side now, also can't 
move the claiming into scheduler now. So the method I proposed is more 
easy for now, at least we won't have different resource consuming way 
between scheduler(IonricHostManger) and compute(ResourceTracker) for 
ironic. And ironic can works fine.


The method I propose have a little problem. When all the node 
allocated, we still can saw there are some resource are free if the 
flavor's resource is less than baremetal's resource. But it can be 
done by expose max_instance to hypervisor api(running instances 
already exposed), then user will now why can't allocated more 
instance. And if we can configure max_instance for each node, sounds 
like useful for operator also :)


I think that if you don't want to wait for the claiming system to happen 
in the Scheduler, then at least you need to fix the current way of using 
the ResourceTracker, like what Jay Pipes is working on in his spec.



-Sylvain



-Sylvain

[1] https://bugs.launchpad.net/nova/+bug/1341420


Thanks
Alex


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org  
mailto:OpenStack-dev@lists.openstack.org

Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

2015-01-09 Thread Alex Xu
2015-01-09 17:17 GMT+08:00 Sylvain Bauza sba...@redhat.com:


 Le 09/01/2015 09:01, Alex Xu a écrit :

 Hi, All

  There is bug when running nova with ironic
 https://bugs.launchpad.net/nova/+bug/1402658

  The case is simple: one baremetal node with 1024MB ram, then boot two
 instances with 512MB ram flavor.
 Those two instances will be scheduling to same baremetal node.

  The problem is at scheduler side the IronicHostManager will consume all
 the resources for that node whatever
 how much resource the instance used. But at compute node side, the
 ResourceTracker won't consume resources
 like that, just consume like normal virtual instance. And ResourceTracker
 will update the resource usage once the
 instance resource claimed, then scheduler will know there are some free
 resource on that node, then will try to
 schedule other new instance to that node.

  I take look at that, there is NumInstanceFilter, it will limit how many
 instance can schedule to one host. So can
 we just use this filter to finish the goal? The max instance is configured
 by option 'max_instances_per_host', we
 can make the virt driver to report how many instances it supported. The
 ironic driver can just report max_instances_per_host=1.
 And libvirt driver can report max_instance_per_host=-1, that means no
 limit. And then we can just remove the
 IronicHostManager, then make the scheduler side is more simpler. Does make
 sense? or there are more trap?

  Thanks in advance for any feedback and suggestion.



 Mmm, I think I disagree with your proposal. Let me explain by the best I
 can why :

 tl;dr: Any proposal unless claiming at the scheduler level tends to be
 wrong

 The ResourceTracker should be only a module for providing stats about
 compute nodes to the Scheduler.
 How the Scheduler is consuming these resources for making a decision
 should only be a Scheduler thing.


agreed, but we can't implement this for now, the reason is you described as
below.



 Here, the problem is that the decision making is also shared with the
 ResourceTracker because of the claiming system managed by the context
 manager when booting an instance. It means that we have 2 distinct decision
 makers for validating a resource.


Totally agreed! This is the root cause.


 Let's stop to be realistic for a moment and discuss about what could mean
 a decision for something else than a compute node. Ok, let say a volume.
 Provided that *something* would report the volume statistics to the
 Scheduler, that would be the Scheduler which would manage if a volume
 manager could accept a volume request. There is no sense to validate the
 decision of the Scheduler on the volume manager, just maybe doing some
 error management.

 We know that the current model is kinda racy with Ironic because there is
 a 2-stage validation (see [1]). I'm not in favor of complexifying the
 model, but rather put all the claiming logic in the scheduler, which is a
 longer path to win, but a safier one.


Yea, I have thought about add same resource consume at compute manager
side, but it's ugly because we implement ironic's resource consuming method
in two places. If we move the claiming in the scheduler the thing will
become easy, we can just provide some extension for different consuming
method (If I understand right the discussion in the IRC). As gantt will be
standalone service, so validating a resource shouldn't spread into
different service. So I agree with you.

But for now, as you said this is long term plan. We can't provide different
resource consuming in compute manager side now, also can't move the
claiming into scheduler now. So the method I proposed is more easy for now,
at least we won't have different resource consuming way between
scheduler(IonricHostManger) and compute(ResourceTracker) for ironic. And
ironic can works fine.

The method I propose have a little problem. When all the node allocated, we
still can saw there are some resource are free if the flavor's resource is
less than baremetal's resource. But it can be done by expose max_instance
to hypervisor api(running instances already exposed), then user will now
why can't allocated more instance. And if we can configure max_instance for
each node, sounds like useful for operator also :)



 -Sylvain

 [1]  https://bugs.launchpad.net/nova/+bug/1341420

  Thanks
 Alex


 ___
 OpenStack-dev mailing 
 listOpenStack-dev@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

2015-01-09 Thread Alex Xu
2015-01-09 22:07 GMT+08:00 Murray, Paul (HP Cloud) pmur...@hp.com:

   There is bug when running nova with ironic
 https://bugs.launchpad.net/nova/+bug/1402658



 I filed this bug – it has been a problem for us.



 The problem is at scheduler side the IronicHostManager will consume all
 the resources for that node whatever

 how much resource the instance used. But at compute node side, the
 ResourceTracker won't consume resources

 like that, just consume like normal virtual instance. And ResourceTracker
 will update the resource usage once the

 instance resource claimed, then scheduler will know there are some free
 resource on that node, then will try to

 schedule other new instance to that node



 You have summed up the problem nicely – i.e.: the resource availability is
 calculated incorrectly for ironic nodes.



 I take look at that, there is NumInstanceFilter, it will limit how many
 instance can schedule to one host. So can

 we just use this filter to finish the goal? The max instance is
 configured by option 'max_instances_per_host', we

 can make the virt driver to report how many instances it supported. The
 ironic driver can just report max_instances_per_host=1.

 And libvirt driver can report max_instance_per_host=-1, that means no
 limit. And then we can just remove the

 IronicHostManager, then make the scheduler side is more simpler. Does
 make sense? or there are more trap?





 Makes sense, but solves the wrong problem. The problem is what you said
 above – i.e.: the resource availability is calculated incorrectly for
 ironic nodes.

 The right solution would be to fix the resource tracker. The ram resource
 on an ironic node has different allocation behavior to a regular node. The
 test to see if a new instance fits is the same, but instead of deducting
 the requested amount to get the remaining availability it should simply
 return 0. This should be dealt with in the new resource objects ([2] below)
 by either having different version of the resource object for ironic nodes
 (certainly doable and the most sensible option – resources should be
 presented according to the resources on the host). Alternatively the ram
 resource object should cater for the difference in its calculations.

 Dang it, I reviewed that specwhy I didn't found that :( Totally beat
me!

  I have a local fix for this that I was too shy to propose upstream
 because it’s a bit hacky and will hopefully be obsolete soon. I could share
 it if you like.

 Paul

 [2] https://review.openstack.org/#/c/127609/





 From: *Sylvain Bauza* sba...@redhat.com
 Date: 9 January 2015 at 09:17
 Subject: Re: [openstack-dev] [Nova][Ironic] Question about scheduling two
 instances to same baremetal node
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org



 Le 09/01/2015 09:01, Alex Xu a écrit :

  Hi, All



 There is bug when running nova with ironic
 https://bugs.launchpad.net/nova/+bug/1402658



 The case is simple: one baremetal node with 1024MB ram, then boot two
 instances with 512MB ram flavor.

 Those two instances will be scheduling to same baremetal node.



 The problem is at scheduler side the IronicHostManager will consume all
 the resources for that node whatever

 how much resource the instance used. But at compute node side, the
 ResourceTracker won't consume resources

 like that, just consume like normal virtual instance. And ResourceTracker
 will update the resource usage once the

 instance resource claimed, then scheduler will know there are some free
 resource on that node, then will try to

 schedule other new instance to that node.



 I take look at that, there is NumInstanceFilter, it will limit how many
 instance can schedule to one host. So can

 we just use this filter to finish the goal? The max instance is configured
 by option 'max_instances_per_host', we

 can make the virt driver to report how many instances it supported. The
 ironic driver can just report max_instances_per_host=1.

 And libvirt driver can report max_instance_per_host=-1, that means no
 limit. And then we can just remove the

 IronicHostManager, then make the scheduler side is more simpler. Does make
 sense? or there are more trap?



 Thanks in advance for any feedback and suggestion.





 Mmm, I think I disagree with your proposal. Let me explain by the best I
 can why :

 tl;dr: Any proposal unless claiming at the scheduler level tends to be
 wrong

 The ResourceTracker should be only a module for providing stats about
 compute nodes to the Scheduler.
 How the Scheduler is consuming these resources for making a decision
 should only be a Scheduler thing.

 Here, the problem is that the decision making is also shared with the
 ResourceTracker because of the claiming system managed by the context
 manager when booting an instance. It means that we have 2 distinct decision
 makers for validating a resource.

 Let's stop to be realistic for a moment and discuss about what