Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node
2015-01-09 22:22 GMT+08:00 Sylvain Bauza sba...@redhat.com: Le 09/01/2015 14:58, Alex Xu a écrit : 2015-01-09 17:17 GMT+08:00 Sylvain Bauza sba...@redhat.com: Le 09/01/2015 09:01, Alex Xu a écrit : Hi, All There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 The case is simple: one baremetal node with 1024MB ram, then boot two instances with 512MB ram flavor. Those two instances will be scheduling to same baremetal node. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Thanks in advance for any feedback and suggestion. Mmm, I think I disagree with your proposal. Let me explain by the best I can why : tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong The ResourceTracker should be only a module for providing stats about compute nodes to the Scheduler. How the Scheduler is consuming these resources for making a decision should only be a Scheduler thing. agreed, but we can't implement this for now, the reason is you described as below. Here, the problem is that the decision making is also shared with the ResourceTracker because of the claiming system managed by the context manager when booting an instance. It means that we have 2 distinct decision makers for validating a resource. Totally agreed! This is the root cause. Let's stop to be realistic for a moment and discuss about what could mean a decision for something else than a compute node. Ok, let say a volume. Provided that *something* would report the volume statistics to the Scheduler, that would be the Scheduler which would manage if a volume manager could accept a volume request. There is no sense to validate the decision of the Scheduler on the volume manager, just maybe doing some error management. We know that the current model is kinda racy with Ironic because there is a 2-stage validation (see [1]). I'm not in favor of complexifying the model, but rather put all the claiming logic in the scheduler, which is a longer path to win, but a safier one. Yea, I have thought about add same resource consume at compute manager side, but it's ugly because we implement ironic's resource consuming method in two places. If we move the claiming in the scheduler the thing will become easy, we can just provide some extension for different consuming method (If I understand right the discussion in the IRC). As gantt will be standalone service, so validating a resource shouldn't spread into different service. So I agree with you. But for now, as you said this is long term plan. We can't provide different resource consuming in compute manager side now, also can't move the claiming into scheduler now. So the method I proposed is more easy for now, at least we won't have different resource consuming way between scheduler(IonricHostManger) and compute(ResourceTracker) for ironic. And ironic can works fine. The method I propose have a little problem. When all the node allocated, we still can saw there are some resource are free if the flavor's resource is less than baremetal's resource. But it can be done by expose max_instance to hypervisor api(running instances already exposed), then user will now why can't allocated more instance. And if we can configure max_instance for each node, sounds like useful for operator also :) I think that if you don't want to wait for the claiming system to happen in the Scheduler, then at least you need to fix the current way of using the ResourceTracker, like what Jay Pipes is working on in his spec. I'm with your guys at same line now :) -Sylvain -Sylvain [1] https://bugs.launchpad.net/nova/+bug/1341420 Thanks Alex ___ OpenStack-dev mailing listOpenStack-dev@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___
[openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node
Hi, All There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 The case is simple: one baremetal node with 1024MB ram, then boot two instances with 512MB ram flavor. Those two instances will be scheduling to same baremetal node. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Thanks in advance for any feedback and suggestion. Thanks Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node
Le 09/01/2015 09:01, Alex Xu a écrit : Hi, All There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 The case is simple: one baremetal node with 1024MB ram, then boot two instances with 512MB ram flavor. Those two instances will be scheduling to same baremetal node. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Thanks in advance for any feedback and suggestion. Mmm, I think I disagree with your proposal. Let me explain by the best I can why : tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong The ResourceTracker should be only a module for providing stats about compute nodes to the Scheduler. How the Scheduler is consuming these resources for making a decision should only be a Scheduler thing. Here, the problem is that the decision making is also shared with the ResourceTracker because of the claiming system managed by the context manager when booting an instance. It means that we have 2 distinct decision makers for validating a resource. Let's stop to be realistic for a moment and discuss about what could mean a decision for something else than a compute node. Ok, let say a volume. Provided that *something* would report the volume statistics to the Scheduler, that would be the Scheduler which would manage if a volume manager could accept a volume request. There is no sense to validate the decision of the Scheduler on the volume manager, just maybe doing some error management. We know that the current model is kinda racy with Ironic because there is a 2-stage validation (see [1]). I'm not in favor of complexifying the model, but rather put all the claiming logic in the scheduler, which is a longer path to win, but a safier one. -Sylvain [1] https://bugs.launchpad.net/nova/+bug/1341420 Thanks Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node
There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 I filed this bug – it has been a problem for us. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node You have summed up the problem nicely – i.e.: the resource availability is calculated incorrectly for ironic nodes. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Makes sense, but solves the wrong problem. The problem is what you said above – i.e.: the resource availability is calculated incorrectly for ironic nodes. The right solution would be to fix the resource tracker. The ram resource on an ironic node has different allocation behavior to a regular node. The test to see if a new instance fits is the same, but instead of deducting the requested amount to get the remaining availability it should simply return 0. This should be dealt with in the new resource objects ([2] below) by either having different version of the resource object for ironic nodes (certainly doable and the most sensible option – resources should be presented according to the resources on the host). Alternatively the ram resource object should cater for the difference in its calculations. I have a local fix for this that I was too shy to propose upstream because it’s a bit hacky and will hopefully be obsolete soon. I could share it if you like. Paul [2] https://review.openstack.org/#/c/127609/ From: Sylvain Bauza sba...@redhat.commailto:sba...@redhat.com Date: 9 January 2015 at 09:17 Subject: Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Le 09/01/2015 09:01, Alex Xu a écrit : Hi, All There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 The case is simple: one baremetal node with 1024MB ram, then boot two instances with 512MB ram flavor. Those two instances will be scheduling to same baremetal node. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Thanks in advance for any feedback and suggestion. Mmm, I think I disagree with your proposal. Let me explain by the best I can why : tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong The ResourceTracker should be only a module for providing stats about compute nodes to the Scheduler. How the Scheduler is consuming these resources for making a decision should only be a Scheduler thing. Here, the problem is that the decision making is also shared with the ResourceTracker because of the claiming system managed by the context manager when booting an instance. It means that we have 2 distinct decision makers for validating a resource. Let's stop to be realistic for a moment and discuss about what could mean a decision for something else than a compute node. Ok, let say a volume. Provided that *something* would report the volume statistics to the Scheduler, that would
Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node
Le 09/01/2015 15:07, Murray, Paul (HP Cloud) a écrit : There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 I filed this bug – it has been a problem for us. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node You have summed up the problem nicely – i.e.: the resource availability is calculated incorrectly for ironic nodes. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Makes sense, but solves the wrong problem. The problem is what you said above – i.e.: the resource availability is calculated incorrectly for ironic nodes. The right solution would be to fix the resource tracker. The ram resource on an ironic node has different allocation behavior to a regular node. The test to see if a new instance fits is the same, but instead of deducting the requested amount to get the remaining availability it should simply return 0. This should be dealt with in the new resource objects ([2] below) by either having different version of the resource object for ironic nodes (certainly doable and the most sensible option – resources should be presented according to the resources on the host). Alternatively the ram resource object should cater for the difference in its calculations. I have a local fix for this that I was too shy to propose upstream because it’s a bit hacky and will hopefully be obsolete soon. I could share it if you like. Paul [2] https://review.openstack.org/#/c/127609/ Agreed, I think that [2] will help a lot. Until it's done, are we really sure we want to fix the bug ? It can be workarounded by creating flavors being at least half the compute nodes and I really would like adding more tech debt. -Sylvain From: *Sylvain Bauza* sba...@redhat.com mailto:sba...@redhat.com Date: 9 January 2015 at 09:17 Subject: Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org mailto:openstack-dev@lists.openstack.org Le 09/01/2015 09:01, Alex Xu a écrit : Hi, All There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 The case is simple: one baremetal node with 1024MB ram, then boot two instances with 512MB ram flavor. Those two instances will be scheduling to same baremetal node. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Thanks in advance for any feedback and suggestion. Mmm, I think I disagree with your proposal. Let me explain by the best I can why : tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong The ResourceTracker should be only a module for providing stats about compute nodes to the Scheduler. How the Scheduler is consuming these resources for making a decision should only be a Scheduler thing. Here, the problem is that the decision making is also
Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node
Le 09/01/2015 14:58, Alex Xu a écrit : 2015-01-09 17:17 GMT+08:00 Sylvain Bauza sba...@redhat.com mailto:sba...@redhat.com: Le 09/01/2015 09:01, Alex Xu a écrit : Hi, All There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 The case is simple: one baremetal node with 1024MB ram, then boot two instances with 512MB ram flavor. Those two instances will be scheduling to same baremetal node. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Thanks in advance for any feedback and suggestion. Mmm, I think I disagree with your proposal. Let me explain by the best I can why : tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong The ResourceTracker should be only a module for providing stats about compute nodes to the Scheduler. How the Scheduler is consuming these resources for making a decision should only be a Scheduler thing. agreed, but we can't implement this for now, the reason is you described as below. Here, the problem is that the decision making is also shared with the ResourceTracker because of the claiming system managed by the context manager when booting an instance. It means that we have 2 distinct decision makers for validating a resource. Totally agreed! This is the root cause. Let's stop to be realistic for a moment and discuss about what could mean a decision for something else than a compute node. Ok, let say a volume. Provided that *something* would report the volume statistics to the Scheduler, that would be the Scheduler which would manage if a volume manager could accept a volume request. There is no sense to validate the decision of the Scheduler on the volume manager, just maybe doing some error management. We know that the current model is kinda racy with Ironic because there is a 2-stage validation (see [1]). I'm not in favor of complexifying the model, but rather put all the claiming logic in the scheduler, which is a longer path to win, but a safier one. Yea, I have thought about add same resource consume at compute manager side, but it's ugly because we implement ironic's resource consuming method in two places. If we move the claiming in the scheduler the thing will become easy, we can just provide some extension for different consuming method (If I understand right the discussion in the IRC). As gantt will be standalone service, so validating a resource shouldn't spread into different service. So I agree with you. But for now, as you said this is long term plan. We can't provide different resource consuming in compute manager side now, also can't move the claiming into scheduler now. So the method I proposed is more easy for now, at least we won't have different resource consuming way between scheduler(IonricHostManger) and compute(ResourceTracker) for ironic. And ironic can works fine. The method I propose have a little problem. When all the node allocated, we still can saw there are some resource are free if the flavor's resource is less than baremetal's resource. But it can be done by expose max_instance to hypervisor api(running instances already exposed), then user will now why can't allocated more instance. And if we can configure max_instance for each node, sounds like useful for operator also :) I think that if you don't want to wait for the claiming system to happen in the Scheduler, then at least you need to fix the current way of using the ResourceTracker, like what Jay Pipes is working on in his spec. -Sylvain -Sylvain [1] https://bugs.launchpad.net/nova/+bug/1341420 Thanks Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node
2015-01-09 17:17 GMT+08:00 Sylvain Bauza sba...@redhat.com: Le 09/01/2015 09:01, Alex Xu a écrit : Hi, All There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 The case is simple: one baremetal node with 1024MB ram, then boot two instances with 512MB ram flavor. Those two instances will be scheduling to same baremetal node. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Thanks in advance for any feedback and suggestion. Mmm, I think I disagree with your proposal. Let me explain by the best I can why : tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong The ResourceTracker should be only a module for providing stats about compute nodes to the Scheduler. How the Scheduler is consuming these resources for making a decision should only be a Scheduler thing. agreed, but we can't implement this for now, the reason is you described as below. Here, the problem is that the decision making is also shared with the ResourceTracker because of the claiming system managed by the context manager when booting an instance. It means that we have 2 distinct decision makers for validating a resource. Totally agreed! This is the root cause. Let's stop to be realistic for a moment and discuss about what could mean a decision for something else than a compute node. Ok, let say a volume. Provided that *something* would report the volume statistics to the Scheduler, that would be the Scheduler which would manage if a volume manager could accept a volume request. There is no sense to validate the decision of the Scheduler on the volume manager, just maybe doing some error management. We know that the current model is kinda racy with Ironic because there is a 2-stage validation (see [1]). I'm not in favor of complexifying the model, but rather put all the claiming logic in the scheduler, which is a longer path to win, but a safier one. Yea, I have thought about add same resource consume at compute manager side, but it's ugly because we implement ironic's resource consuming method in two places. If we move the claiming in the scheduler the thing will become easy, we can just provide some extension for different consuming method (If I understand right the discussion in the IRC). As gantt will be standalone service, so validating a resource shouldn't spread into different service. So I agree with you. But for now, as you said this is long term plan. We can't provide different resource consuming in compute manager side now, also can't move the claiming into scheduler now. So the method I proposed is more easy for now, at least we won't have different resource consuming way between scheduler(IonricHostManger) and compute(ResourceTracker) for ironic. And ironic can works fine. The method I propose have a little problem. When all the node allocated, we still can saw there are some resource are free if the flavor's resource is less than baremetal's resource. But it can be done by expose max_instance to hypervisor api(running instances already exposed), then user will now why can't allocated more instance. And if we can configure max_instance for each node, sounds like useful for operator also :) -Sylvain [1] https://bugs.launchpad.net/nova/+bug/1341420 Thanks Alex ___ OpenStack-dev mailing listOpenStack-dev@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node
2015-01-09 22:07 GMT+08:00 Murray, Paul (HP Cloud) pmur...@hp.com: There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 I filed this bug – it has been a problem for us. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node You have summed up the problem nicely – i.e.: the resource availability is calculated incorrectly for ironic nodes. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Makes sense, but solves the wrong problem. The problem is what you said above – i.e.: the resource availability is calculated incorrectly for ironic nodes. The right solution would be to fix the resource tracker. The ram resource on an ironic node has different allocation behavior to a regular node. The test to see if a new instance fits is the same, but instead of deducting the requested amount to get the remaining availability it should simply return 0. This should be dealt with in the new resource objects ([2] below) by either having different version of the resource object for ironic nodes (certainly doable and the most sensible option – resources should be presented according to the resources on the host). Alternatively the ram resource object should cater for the difference in its calculations. Dang it, I reviewed that specwhy I didn't found that :( Totally beat me! I have a local fix for this that I was too shy to propose upstream because it’s a bit hacky and will hopefully be obsolete soon. I could share it if you like. Paul [2] https://review.openstack.org/#/c/127609/ From: *Sylvain Bauza* sba...@redhat.com Date: 9 January 2015 at 09:17 Subject: Re: [openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Le 09/01/2015 09:01, Alex Xu a écrit : Hi, All There is bug when running nova with ironic https://bugs.launchpad.net/nova/+bug/1402658 The case is simple: one baremetal node with 1024MB ram, then boot two instances with 512MB ram flavor. Those two instances will be scheduling to same baremetal node. The problem is at scheduler side the IronicHostManager will consume all the resources for that node whatever how much resource the instance used. But at compute node side, the ResourceTracker won't consume resources like that, just consume like normal virtual instance. And ResourceTracker will update the resource usage once the instance resource claimed, then scheduler will know there are some free resource on that node, then will try to schedule other new instance to that node. I take look at that, there is NumInstanceFilter, it will limit how many instance can schedule to one host. So can we just use this filter to finish the goal? The max instance is configured by option 'max_instances_per_host', we can make the virt driver to report how many instances it supported. The ironic driver can just report max_instances_per_host=1. And libvirt driver can report max_instance_per_host=-1, that means no limit. And then we can just remove the IronicHostManager, then make the scheduler side is more simpler. Does make sense? or there are more trap? Thanks in advance for any feedback and suggestion. Mmm, I think I disagree with your proposal. Let me explain by the best I can why : tl;dr: Any proposal unless claiming at the scheduler level tends to be wrong The ResourceTracker should be only a module for providing stats about compute nodes to the Scheduler. How the Scheduler is consuming these resources for making a decision should only be a Scheduler thing. Here, the problem is that the decision making is also shared with the ResourceTracker because of the claiming system managed by the context manager when booting an instance. It means that we have 2 distinct decision makers for validating a resource. Let's stop to be realistic for a moment and discuss about what