Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Hi Nikola, Firstly, thanks for waiting for me to respond and sorry I was absent for the last couple of weeks. The extensible resource tracker bp deals with two distinct information flows: 1. information about resources that is passed from the compute node to the scheduler, 2. information about resource requirements passed to the scheduler and the compute node. If I understand your email below correctly, you are saying that information such as extra_specs, is not made available to the compute node or the resource plugins. This is specifically about the second item (2.) above. The patch that you propose to revert addresses the first item (1.), i.e. it provides a means to select which resources are tracked and to pass that information to the scheduler. It gives us two things: we can add resource plugins and pass information to the scheduler without having to change the resource tracker or scheduler. We can also pick and chose which resource plugins to use, and so what information we want to write to the database and pass to the scheduler. The ability to omit resource information is as useful as the ability to add it. So if new a resource plugin is added, operators that do not use that information do not need to configure it. As an operator myself, I would be happy to omit the proliferation of compute node details that are coming, while benefiting from those that are of use to me. The interface for the plugins does not need to be considered a fixed external interface, it is not. It is ok to add necessary parameters if there is no other sensible way to pass information you need. So in short, the ability to add resource information without impacting everybody is the value that the patch you want to revert brings. If in the future another design is settled on for resource tracking and scheduling it will still have to face the same requirement. The compute node will have a set of resource information that could be tracked and used, but not everyone will want the overhead of discovering and reporting all of it, so they should not need to have all of it. Paul On 19 August 2014 10:11, Nikola Đipanov ndipa...@redhat.com wrote: Since after a week of discussing it I see no compelling argument against reverting it - here's the proposal: https://review.openstack.org/115218 Thanks, N. On 08/12/2014 12:21 PM, Nikola Đipanov wrote: Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. I propose to revert [2] ASAP since it is still fresh, and see how we can come up with a cleaner design. Would like to hear opinions on this, before I propose the patch tho! Thanks all, Nikola [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement [2] https://review.openstack.org/#/c/109643/ [3] https://review.openstack.org/#/c/111782/ [4] https://review.openstack.org/#/c/89893
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Le 15/08/2014 15:35, Andrew Laski a écrit : On 08/14/2014 03:21 AM, Nikola Đipanov wrote: On 08/13/2014 06:05 PM, Sylvain Bauza wrote: Le 13/08/2014 12:21, Sylvain Bauza a écrit : Le 12/08/2014 22:06, Sylvain Bauza a écrit : Le 12/08/2014 18:54, Nikola Đipanov a écrit : On 08/12/2014 04:49 PM, Sylvain Bauza wrote: (sorry for reposting, missed 2 links...) Hi Nikola, Le 12/08/2014 12:21, Nikola Đipanov a écrit : Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Well, ERT does provide a plugin mechanism for testing resources at the claim level. This is the plugin responsibility to implement a test() method [2.1] which will be called when test_claim() [2.2] So, provided this method is implemented, a local host check can be done based on the host's view of resources. Yes - the problem is there is no clear API to get all the needed bits to do so - especially the user supplied one from image and flavors. On top of that, in current implementation we only pass a hand-wavy 'usage' blob in. This makes anyone wanting to use this in conjunction with some of the user supplied bits roll their own 'extract_data_from_instance_metadata_flavor_image' or similar which is horrible and also likely bad for performance. I see your concern where there is no interface for user-facing resources like flavor or image metadata. I also think indeed that the big 'usage' blob is not a good choice for long-term vision. That said, I don't think as we say in French to throw the bath water... ie. the problem is with the RT, not the ERT (apart the mention of third-party API that you noted - I'll go to it later below) This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. And here is where I agree with you : at the moment, ResourceTracker (and consequently Extensible RT) only provides the view of the resources the host is knowing (see my point above) and possibly some other resources are missing. So, whatever your choice of going with or without ERT, your patch [3] still deserves it if we want not to lookup DB each time a claim goes. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. Two distinct implementations of that spec (ie. instances and flavors) have been proposed [2.3] [2.4] so reviews are welcome. If you see the test() method, it's no-op thing for both plugins. I'm open to comments because I have the stated problem : how can we define a limit on just a counter of instances and flavors ? Will look at these - but none of them seem to hit the issue I am complaining about, and that is that it will need to consider other request data for claims, not only data available for on instances. Also - the fact that you don't implement test() in flavor ones tells me that the implementation is indeed racy (but it is racy atm as well) and two requests can indeed race for the same host, and since no claims are done, both can succeed. This is I believe (at least in case of single flavor hosts) unlikely to happen in practice, but you get the idea. Agreed, these 2 patches probably require another iteration, in
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Since after a week of discussing it I see no compelling argument against reverting it - here's the proposal: https://review.openstack.org/115218 Thanks, N. On 08/12/2014 12:21 PM, Nikola Đipanov wrote: Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. I propose to revert [2] ASAP since it is still fresh, and see how we can come up with a cleaner design. Would like to hear opinions on this, before I propose the patch tho! Thanks all, Nikola [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement [2] https://review.openstack.org/#/c/109643/ [3] https://review.openstack.org/#/c/111782/ [4] https://review.openstack.org/#/c/89893 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
On 08/14/2014 10:25 PM, Sylvain Bauza wrote: Hi mikal, Le 14 août 2014 01:49, Michael Still mi...@stillhq.com mailto:mi...@stillhq.com a écrit : So, there's been a lot of email in the last few days and I feel I am not keeping up. Sylvain, can you summarise for me what the plan is here? Can we roll forward or do we need to revert? Well, as we agreed with Nikola, the problem is not with ERT but RT, as the request data needs to be passed when claiming a resource. I'm proposing to keep ERT and only consider plugins that are not needing request_spec when claiming, but here there is no agreement yet. Yes - we could do this, I still see no benefit in this. FWIW - Jay Pipes made a comment that highlights much of the same issues I did in this thread even before I started it, on the patch itself (scroll down). https://review.openstack.org/#/c/109643/ It's easy to miss since it was added post merge. Unfortunately, I'm on PTO till Tuesday, and Paul Murray this week as well. So I propose to delay the discussion by these days as that's not impacted by FPF. In the meantime, I created a patch for discussing a workaround [1] for Juno until we correctly figure out how to fix that issue, as it deserves a spec. Time is running out for Juno. Indeed, I'm mostly concerned by the example exception spec that Nikola mentioned [2] (isolate-scheduler-db) as it still needs a second +2 while FPF is in 1 week... I'm planning to deliver an alternative implementation without ERT wrt this discussion. Ripping it out will make it more difficult for the Gantt team to go ahead with the current plan for the split - yes, but maybe that actually means you might want to re-visit some of your decision (did not follow all of it, so don't want to comment in depth at this point, but throwing it out there)? N. -Sylvain [1] https://review.openstack.org/#/c/113936/ [2] https://review.openstack.org/#/c/89893 Thanks, Michael ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Le 15 août 2014 08:16, Nikola Đipanov ndipa...@redhat.com a écrit : On 08/14/2014 10:25 PM, Sylvain Bauza wrote: Hi mikal, Le 14 août 2014 01:49, Michael Still mi...@stillhq.com mailto:mi...@stillhq.com a écrit : So, there's been a lot of email in the last few days and I feel I am not keeping up. Sylvain, can you summarise for me what the plan is here? Can we roll forward or do we need to revert? Well, as we agreed with Nikola, the problem is not with ERT but RT, as the request data needs to be passed when claiming a resource. I'm proposing to keep ERT and only consider plugins that are not needing request_spec when claiming, but here there is no agreement yet. Yes - we could do this, I still see no benefit in this. FWIW - Jay Pipes made a comment that highlights much of the same issues I did in this thread even before I started it, on the patch itself (scroll down). https://review.openstack.org/#/c/109643/ It's easy to miss since it was added post merge. As said previously, I'm not saying that the current interface would not have to change because of some issues. My personal concern is that I just don't want to see technical debt blocking all temptatives to move forward and treat this technical debt more easily because of a separate project with new velocity. As a summary, I don't trust in big-bangs in Nova and prefer to do small iterations with the current state. So that's why I'm pro a negociative approach. Take the filters and the job we do by removing direct access to DB : as anyone can propose a patch breaking that (and you know how it's easy to propose a filter and how many people are doing that at the moment...), that's reviewers duty - and me in particular - to say their voice in order to make sure it's not going to be merged. Here, same idea. That's not a REST API that anyone can consume, new plugins still need to be merged if they want to go upstream. Unfortunately, I'm on PTO till Tuesday, and Paul Murray this week as well. So I propose to delay the discussion by these days as that's not impacted by FPF. In the meantime, I created a patch for discussing a workaround [1] for Juno until we correctly figure out how to fix that issue, as it deserves a spec. Time is running out for Juno. Indeed, I'm mostly concerned by the example exception spec that Nikola mentioned [2] (isolate-scheduler-db) as it still needs a second +2 while FPF is in 1 week... I'm planning to deliver an alternative implementation without ERT wrt this discussion. Ripping it out will make it more difficult for the Gantt team to go ahead with the current plan for the split - yes, but maybe that actually means you might want to re-visit some of your decision (did not follow all of it, so don't want to comment in depth at this point, but throwing it out there)? N. Well, you hit a good point : what alternative if so ? This spec had many proposals as many solutions can be found but we decided to go with ERT because of its good integration with the existing RT. I'm rally open to discussion in the spec itself, as I really like hearing other voices. -Sylvain [1] https://review.openstack.org/#/c/113936/ [2] https://review.openstack.org/#/c/89893 Thanks, Michael ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
On 08/14/2014 03:21 AM, Nikola Đipanov wrote: On 08/13/2014 06:05 PM, Sylvain Bauza wrote: Le 13/08/2014 12:21, Sylvain Bauza a écrit : Le 12/08/2014 22:06, Sylvain Bauza a écrit : Le 12/08/2014 18:54, Nikola Đipanov a écrit : On 08/12/2014 04:49 PM, Sylvain Bauza wrote: (sorry for reposting, missed 2 links...) Hi Nikola, Le 12/08/2014 12:21, Nikola Đipanov a écrit : Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Well, ERT does provide a plugin mechanism for testing resources at the claim level. This is the plugin responsibility to implement a test() method [2.1] which will be called when test_claim() [2.2] So, provided this method is implemented, a local host check can be done based on the host's view of resources. Yes - the problem is there is no clear API to get all the needed bits to do so - especially the user supplied one from image and flavors. On top of that, in current implementation we only pass a hand-wavy 'usage' blob in. This makes anyone wanting to use this in conjunction with some of the user supplied bits roll their own 'extract_data_from_instance_metadata_flavor_image' or similar which is horrible and also likely bad for performance. I see your concern where there is no interface for user-facing resources like flavor or image metadata. I also think indeed that the big 'usage' blob is not a good choice for long-term vision. That said, I don't think as we say in French to throw the bath water... ie. the problem is with the RT, not the ERT (apart the mention of third-party API that you noted - I'll go to it later below) This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. And here is where I agree with you : at the moment, ResourceTracker (and consequently Extensible RT) only provides the view of the resources the host is knowing (see my point above) and possibly some other resources are missing. So, whatever your choice of going with or without ERT, your patch [3] still deserves it if we want not to lookup DB each time a claim goes. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. Two distinct implementations of that spec (ie. instances and flavors) have been proposed [2.3] [2.4] so reviews are welcome. If you see the test() method, it's no-op thing for both plugins. I'm open to comments because I have the stated problem : how can we define a limit on just a counter of instances and flavors ? Will look at these - but none of them seem to hit the issue I am complaining about, and that is that it will need to consider other request data for claims, not only data available for on instances. Also - the fact that you don't implement test() in flavor ones tells me that the implementation is indeed racy (but it is racy atm as well) and two requests can indeed race for the same host, and since no claims are done, both can succeed. This is I believe (at least in case of single flavor hosts) unlikely to happen in practice, but you get the idea. Agreed, these 2 patches probably require another iteration, in particular how we make sure that it won't be racy. So I need another run to think
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
On 08/13/2014 06:05 PM, Sylvain Bauza wrote: Le 13/08/2014 12:21, Sylvain Bauza a écrit : Le 12/08/2014 22:06, Sylvain Bauza a écrit : Le 12/08/2014 18:54, Nikola Đipanov a écrit : On 08/12/2014 04:49 PM, Sylvain Bauza wrote: (sorry for reposting, missed 2 links...) Hi Nikola, Le 12/08/2014 12:21, Nikola Đipanov a écrit : Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Well, ERT does provide a plugin mechanism for testing resources at the claim level. This is the plugin responsibility to implement a test() method [2.1] which will be called when test_claim() [2.2] So, provided this method is implemented, a local host check can be done based on the host's view of resources. Yes - the problem is there is no clear API to get all the needed bits to do so - especially the user supplied one from image and flavors. On top of that, in current implementation we only pass a hand-wavy 'usage' blob in. This makes anyone wanting to use this in conjunction with some of the user supplied bits roll their own 'extract_data_from_instance_metadata_flavor_image' or similar which is horrible and also likely bad for performance. I see your concern where there is no interface for user-facing resources like flavor or image metadata. I also think indeed that the big 'usage' blob is not a good choice for long-term vision. That said, I don't think as we say in French to throw the bath water... ie. the problem is with the RT, not the ERT (apart the mention of third-party API that you noted - I'll go to it later below) This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. And here is where I agree with you : at the moment, ResourceTracker (and consequently Extensible RT) only provides the view of the resources the host is knowing (see my point above) and possibly some other resources are missing. So, whatever your choice of going with or without ERT, your patch [3] still deserves it if we want not to lookup DB each time a claim goes. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. Two distinct implementations of that spec (ie. instances and flavors) have been proposed [2.3] [2.4] so reviews are welcome. If you see the test() method, it's no-op thing for both plugins. I'm open to comments because I have the stated problem : how can we define a limit on just a counter of instances and flavors ? Will look at these - but none of them seem to hit the issue I am complaining about, and that is that it will need to consider other request data for claims, not only data available for on instances. Also - the fact that you don't implement test() in flavor ones tells me that the implementation is indeed racy (but it is racy atm as well) and two requests can indeed race for the same host, and since no claims are done, both can succeed. This is I believe (at least in case of single flavor hosts) unlikely to happen in practice, but you get the idea. Agreed, these 2 patches probably require another iteration, in particular how we make sure that it
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Hi mikal, Le 14 août 2014 01:49, Michael Still mi...@stillhq.com a écrit : So, there's been a lot of email in the last few days and I feel I am not keeping up. Sylvain, can you summarise for me what the plan is here? Can we roll forward or do we need to revert? Well, as we agreed with Nikola, the problem is not with ERT but RT, as the request data needs to be passed when claiming a resource. I'm proposing to keep ERT and only consider plugins that are not needing request_spec when claiming, but here there is no agreement yet. Unfortunately, I'm on PTO till Tuesday, and Paul Murray this week as well. So I propose to delay the discussion by these days as that's not impacted by FPF. In the meantime, I created a patch for discussing a workaround [1] for Juno until we correctly figure out how to fix that issue, as it deserves a spec. Time is running out for Juno. Indeed, I'm mostly concerned by the example exception spec that Nikola mentioned [2] (isolate-scheduler-db) as it still needs a second +2 while FPF is in 1 week... I'm planning to deliver an alternative implementation without ERT wrt this discussion. -Sylvain [1] https://review.openstack.org/#/c/113936/ [2] https://review.openstack.org/#/c/89893 Thanks, Michael On Thu, Aug 14, 2014 at 3:40 AM, Sylvain Bauza sba...@redhat.com wrote: Le 13/08/2014 18:40, Brian Elliott a écrit : On Aug 12, 2014, at 5:21 AM, Nikola Đipanov ndipa...@redhat.com wrote: Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. “ Correct This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Agreed, ERT does not attempt to solve this problem of ensuring RT has an identical set of information for testing claims. I don’t think it was intended to. ERT does solve the issue of bloat in the RT with adding just-one-more-thing to test usage-wise. It gives a nice hook for inserting your claim logic for your specific use case. I think Nikola and I agreed on the fact that ERT is not responsible for this design. That said I can talk on behalf of Nikola... This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. I think passing additional data through to compute just wasn’t a problem that ERT aimed to solve. (Paul Murray?) That being said, coordinating the passing of any extra data required to test a claim that is *not* sourced from the host itself would be a very nice addition. You are working around it with some caching in your flavor db lookup use case, although one could of course cook up a cleaner patch to pass such data through on the “build this” request to the compute. Indeed, and that's why I think the problem can be resolved thanks to 2 different things : 1. Filters need to look at what ERT is giving them, that's what isolate-scheduler-db is trying to do (see my patches [2.3 and 2.4] on the previous emails 2. Some extra user request needs to be checked in the test() method of ERT plugins (where claims are done), so I provided a WIP patch for discussing it : https://review.openstack.org/#/c/113936/ As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Le 12/08/2014 22:06, Sylvain Bauza a écrit : Le 12/08/2014 18:54, Nikola Đipanov a écrit : On 08/12/2014 04:49 PM, Sylvain Bauza wrote: (sorry for reposting, missed 2 links...) Hi Nikola, Le 12/08/2014 12:21, Nikola Đipanov a écrit : Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Well, ERT does provide a plugin mechanism for testing resources at the claim level. This is the plugin responsibility to implement a test() method [2.1] which will be called when test_claim() [2.2] So, provided this method is implemented, a local host check can be done based on the host's view of resources. Yes - the problem is there is no clear API to get all the needed bits to do so - especially the user supplied one from image and flavors. On top of that, in current implementation we only pass a hand-wavy 'usage' blob in. This makes anyone wanting to use this in conjunction with some of the user supplied bits roll their own 'extract_data_from_instance_metadata_flavor_image' or similar which is horrible and also likely bad for performance. I see your concern where there is no interface for user-facing resources like flavor or image metadata. I also think indeed that the big 'usage' blob is not a good choice for long-term vision. That said, I don't think as we say in French to throw the bath water... ie. the problem is with the RT, not the ERT (apart the mention of third-party API that you noted - I'll go to it later below) This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. And here is where I agree with you : at the moment, ResourceTracker (and consequently Extensible RT) only provides the view of the resources the host is knowing (see my point above) and possibly some other resources are missing. So, whatever your choice of going with or without ERT, your patch [3] still deserves it if we want not to lookup DB each time a claim goes. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. Two distinct implementations of that spec (ie. instances and flavors) have been proposed [2.3] [2.4] so reviews are welcome. If you see the test() method, it's no-op thing for both plugins. I'm open to comments because I have the stated problem : how can we define a limit on just a counter of instances and flavors ? Will look at these - but none of them seem to hit the issue I am complaining about, and that is that it will need to consider other request data for claims, not only data available for on instances. Also - the fact that you don't implement test() in flavor ones tells me that the implementation is indeed racy (but it is racy atm as well) and two requests can indeed race for the same host, and since no claims are done, both can succeed. This is I believe (at least in case of single flavor hosts) unlikely to happen in practice, but you get the idea. Agreed, these 2 patches probably require another iteration, in particular how we make sure that it won't be racy. So I need another run to think about what to test() for these 2 examples. Another patch has to be done for aggregates, but it's still WIP so
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Le 13/08/2014 12:21, Sylvain Bauza a écrit : Le 12/08/2014 22:06, Sylvain Bauza a écrit : Le 12/08/2014 18:54, Nikola Đipanov a écrit : On 08/12/2014 04:49 PM, Sylvain Bauza wrote: (sorry for reposting, missed 2 links...) Hi Nikola, Le 12/08/2014 12:21, Nikola Đipanov a écrit : Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Well, ERT does provide a plugin mechanism for testing resources at the claim level. This is the plugin responsibility to implement a test() method [2.1] which will be called when test_claim() [2.2] So, provided this method is implemented, a local host check can be done based on the host's view of resources. Yes - the problem is there is no clear API to get all the needed bits to do so - especially the user supplied one from image and flavors. On top of that, in current implementation we only pass a hand-wavy 'usage' blob in. This makes anyone wanting to use this in conjunction with some of the user supplied bits roll their own 'extract_data_from_instance_metadata_flavor_image' or similar which is horrible and also likely bad for performance. I see your concern where there is no interface for user-facing resources like flavor or image metadata. I also think indeed that the big 'usage' blob is not a good choice for long-term vision. That said, I don't think as we say in French to throw the bath water... ie. the problem is with the RT, not the ERT (apart the mention of third-party API that you noted - I'll go to it later below) This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. And here is where I agree with you : at the moment, ResourceTracker (and consequently Extensible RT) only provides the view of the resources the host is knowing (see my point above) and possibly some other resources are missing. So, whatever your choice of going with or without ERT, your patch [3] still deserves it if we want not to lookup DB each time a claim goes. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. Two distinct implementations of that spec (ie. instances and flavors) have been proposed [2.3] [2.4] so reviews are welcome. If you see the test() method, it's no-op thing for both plugins. I'm open to comments because I have the stated problem : how can we define a limit on just a counter of instances and flavors ? Will look at these - but none of them seem to hit the issue I am complaining about, and that is that it will need to consider other request data for claims, not only data available for on instances. Also - the fact that you don't implement test() in flavor ones tells me that the implementation is indeed racy (but it is racy atm as well) and two requests can indeed race for the same host, and since no claims are done, both can succeed. This is I believe (at least in case of single flavor hosts) unlikely to happen in practice, but you get the idea. Agreed, these 2 patches probably require another iteration, in particular how we make sure that it won't be racy. So I need another run to think about what to test() for these 2 examples. Another
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
On Aug 12, 2014, at 5:21 AM, Nikola Đipanov ndipa...@redhat.com wrote: Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. “ Correct This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Agreed, ERT does not attempt to solve this problem of ensuring RT has an identical set of information for testing claims. I don’t think it was intended to. ERT does solve the issue of bloat in the RT with adding just-one-more-thing to test usage-wise. It gives a nice hook for inserting your claim logic for your specific use case. This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. I think passing additional data through to compute just wasn’t a problem that ERT aimed to solve. (Paul Murray?) That being said, coordinating the passing of any extra data required to test a claim that is *not* sourced from the host itself would be a very nice addition. You are working around it with some caching in your flavor db lookup use case, although one could of course cook up a cleaner patch to pass such data through on the “build this” request to the compute. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. I propose to revert [2] ASAP since it is still fresh, and see how we can come up with a cleaner design. I think the ERT is forward-progress here, but am willing to review patches/specs on improvements/replacements. Would like to hear opinions on this, before I propose the patch tho! Thanks all, Nikola [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement [2] https://review.openstack.org/#/c/109643/ [3] https://review.openstack.org/#/c/111782/ [4] https://review.openstack.org/#/c/89893 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Le 13/08/2014 18:40, Brian Elliott a écrit : On Aug 12, 2014, at 5:21 AM, Nikola Đipanov ndipa...@redhat.com wrote: Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. “ Correct This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Agreed, ERT does not attempt to solve this problem of ensuring RT has an identical set of information for testing claims. I don’t think it was intended to. ERT does solve the issue of bloat in the RT with adding just-one-more-thing to test usage-wise. It gives a nice hook for inserting your claim logic for your specific use case. I think Nikola and I agreed on the fact that ERT is not responsible for this design. That said I can talk on behalf of Nikola... This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. I think passing additional data through to compute just wasn’t a problem that ERT aimed to solve. (Paul Murray?) That being said, coordinating the passing of any extra data required to test a claim that is *not* sourced from the host itself would be a very nice addition. You are working around it with some caching in your flavor db lookup use case, although one could of course cook up a cleaner patch to pass such data through on the “build this” request to the compute. Indeed, and that's why I think the problem can be resolved thanks to 2 different things : 1. Filters need to look at what ERT is giving them, that's what isolate-scheduler-db is trying to do (see my patches [2.3 and 2.4] on the previous emails 2. Some extra user request needs to be checked in the test() method of ERT plugins (where claims are done), so I provided a WIP patch for discussing it : https://review.openstack.org/#/c/113936/ As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. I propose to revert [2] ASAP since it is still fresh, and see how we can come up with a cleaner design. I think the ERT is forward-progress here, but am willing to review patches/specs on improvements/replacements. Sure, your comments are welcome on https://review.openstack.org/#/c/113373/ You can find an example where TypeAffinity filter is modified to look at HostState and where ERT is being used for updating HostState and for claiming resource. Would like to hear opinions on this, before I propose the patch tho! Thanks all, Nikola [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement [2] https://review.openstack.org/#/c/109643/ [3] https://review.openstack.org/#/c/111782/ [4] https://review.openstack.org/#/c/89893 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
So, there's been a lot of email in the last few days and I feel I am not keeping up. Sylvain, can you summarise for me what the plan is here? Can we roll forward or do we need to revert? Time is running out for Juno. Thanks, Michael On Thu, Aug 14, 2014 at 3:40 AM, Sylvain Bauza sba...@redhat.com wrote: Le 13/08/2014 18:40, Brian Elliott a écrit : On Aug 12, 2014, at 5:21 AM, Nikola Đipanov ndipa...@redhat.com wrote: Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. “ Correct This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Agreed, ERT does not attempt to solve this problem of ensuring RT has an identical set of information for testing claims. I don’t think it was intended to. ERT does solve the issue of bloat in the RT with adding just-one-more-thing to test usage-wise. It gives a nice hook for inserting your claim logic for your specific use case. I think Nikola and I agreed on the fact that ERT is not responsible for this design. That said I can talk on behalf of Nikola... This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. I think passing additional data through to compute just wasn’t a problem that ERT aimed to solve. (Paul Murray?) That being said, coordinating the passing of any extra data required to test a claim that is *not* sourced from the host itself would be a very nice addition. You are working around it with some caching in your flavor db lookup use case, although one could of course cook up a cleaner patch to pass such data through on the “build this” request to the compute. Indeed, and that's why I think the problem can be resolved thanks to 2 different things : 1. Filters need to look at what ERT is giving them, that's what isolate-scheduler-db is trying to do (see my patches [2.3 and 2.4] on the previous emails 2. Some extra user request needs to be checked in the test() method of ERT plugins (where claims are done), so I provided a WIP patch for discussing it : https://review.openstack.org/#/c/113936/ As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. I propose to revert [2] ASAP since it is still fresh, and see how we can come up with a cleaner design. I think the ERT is forward-progress here, but am willing to review patches/specs on improvements/replacements. Sure, your comments are welcome on https://review.openstack.org/#/c/113373/ You can find an example where TypeAffinity filter is modified to look at HostState and where ERT is being used for updating HostState and for claiming resource. Would like to hear opinions on this, before I propose the patch tho! Thanks all, Nikola [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement [2] https://review.openstack.org/#/c/109643/ [3] https://review.openstack.org/#/c/111782/ [4] https://review.openstack.org/#/c/89893 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
-Original Message- From: Nikola Đipanov [mailto:ndipa...@redhat.com] Sent: Tuesday, August 12, 2014 3:22 AM To: OpenStack Development Mailing List Subject: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe? Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. I'd think this is not ERT itself, but more a RT issue. And the issue happens to PCI also, which has to save the PCI request in the system metadata (No ERT at that time). It will be great to have a more generic solution. Thanks -jyh ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Hi Nikola, Le 12/08/2014 12:21, Nikola Đipanov a écrit : Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Well, ERT does provide a plugin mechanism for testing resources at the claim level. This is the plugin responsibility to implement a test() method [2.1] which will be called when test_claim() [2.2] So, provided this method is implemented, a local host check can be done based on the host's view of resources. This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. And here is where I agree with you : at the moment, ResourceTracker (and consequently Extensible RT) only provides the view of the resources the host is knowing (see my point above) and possibly some other resources are missing. So, whatever your choice of going with or without ERT, your patch [3] still deserves it if we want not to lookup DB each time a claim goes. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. Two distinct implementations of that spec (ie. instances and flavors) have been proposed [2.3] [2.4] so reviews are welcome. If you see the test() method, it's no-op thing for both plugins. I'm open to comments because I have the stated problem : how can we define a limit on just a counter of instances and flavors ? I propose to revert [2] ASAP since it is still fresh, and see how we can come up with a cleaner design. Would like to hear opinions on this, before I propose the patch tho! IMHO, I think the problem is more likely that the regular RT misses some information for each host so it requires to handle it on a case-by-case basis, but I don't think ERT either increases complexity or creates another issue. Thanks, -Sylvain Thanks all, Nikola [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement [2] https://review.openstack.org/#/c/109643/ [3] https://review.openstack.org/#/c/111782/ [4] https://review.openstack.org/#/c/89893 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev [2.1] https://github.com/openstack/nova/blob/master/nova/compute/resources/__init__.py#L75 [2.2] https://github.com/openstack/nova/blob/master/nova/compute/claims.py#L134 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
(sorry for reposting, missed 2 links...) Hi Nikola, Le 12/08/2014 12:21, Nikola Đipanov a écrit : Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Well, ERT does provide a plugin mechanism for testing resources at the claim level. This is the plugin responsibility to implement a test() method [2.1] which will be called when test_claim() [2.2] So, provided this method is implemented, a local host check can be done based on the host's view of resources. This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. And here is where I agree with you : at the moment, ResourceTracker (and consequently Extensible RT) only provides the view of the resources the host is knowing (see my point above) and possibly some other resources are missing. So, whatever your choice of going with or without ERT, your patch [3] still deserves it if we want not to lookup DB each time a claim goes. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. Two distinct implementations of that spec (ie. instances and flavors) have been proposed [2.3] [2.4] so reviews are welcome. If you see the test() method, it's no-op thing for both plugins. I'm open to comments because I have the stated problem : how can we define a limit on just a counter of instances and flavors ? I propose to revert [2] ASAP since it is still fresh, and see how we can come up with a cleaner design. Would like to hear opinions on this, before I propose the patch tho! IMHO, I think the problem is more likely that the regular RT misses some information for each host so it requires to handle it on a case-by-case basis, but I don't think ERT either increases complexity or creates another issue. Thanks, -Sylvain Thanks all, Nikola [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement [2] https://review.openstack.org/#/c/109643/ [3] https://review.openstack.org/#/c/111782/ [4] https://review.openstack.org/#/c/89893 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev [2.1] https://github.com/openstack/nova/blob/master/nova/compute/resources/__init__.py#L75 [2.2] https://github.com/openstack/nova/blob/master/nova/compute/claims.py#L134 [2.3] https://review.openstack.org/112578 [2.4] https://review.openstack.org/113373 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
On 08/12/2014 04:49 PM, Sylvain Bauza wrote: (sorry for reposting, missed 2 links...) Hi Nikola, Le 12/08/2014 12:21, Nikola Đipanov a écrit : Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Well, ERT does provide a plugin mechanism for testing resources at the claim level. This is the plugin responsibility to implement a test() method [2.1] which will be called when test_claim() [2.2] So, provided this method is implemented, a local host check can be done based on the host's view of resources. Yes - the problem is there is no clear API to get all the needed bits to do so - especially the user supplied one from image and flavors. On top of that, in current implementation we only pass a hand-wavy 'usage' blob in. This makes anyone wanting to use this in conjunction with some of the user supplied bits roll their own 'extract_data_from_instance_metadata_flavor_image' or similar which is horrible and also likely bad for performance. This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. And here is where I agree with you : at the moment, ResourceTracker (and consequently Extensible RT) only provides the view of the resources the host is knowing (see my point above) and possibly some other resources are missing. So, whatever your choice of going with or without ERT, your patch [3] still deserves it if we want not to lookup DB each time a claim goes. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. Two distinct implementations of that spec (ie. instances and flavors) have been proposed [2.3] [2.4] so reviews are welcome. If you see the test() method, it's no-op thing for both plugins. I'm open to comments because I have the stated problem : how can we define a limit on just a counter of instances and flavors ? Will look at these - but none of them seem to hit the issue I am complaining about, and that is that it will need to consider other request data for claims, not only data available for on instances. Also - the fact that you don't implement test() in flavor ones tells me that the implementation is indeed racy (but it is racy atm as well) and two requests can indeed race for the same host, and since no claims are done, both can succeed. This is I believe (at least in case of single flavor hosts) unlikely to happen in practice, but you get the idea. I propose to revert [2] ASAP since it is still fresh, and see how we can come up with a cleaner design. Would like to hear opinions on this, before I propose the patch tho! IMHO, I think the problem is more likely that the regular RT misses some information for each host so it requires to handle it on a case-by-case basis, but I don't think ERT either increases complexity or creates another issue. RT does not miss info about the host, but about the particular request which we have to fish out of different places like image_metadata extra_specs etc, yet - it can't really work without them. This is definitely a RT issue that is not specific to ERT. However, I still see several issues
Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?
Le 12/08/2014 18:54, Nikola Đipanov a écrit : On 08/12/2014 04:49 PM, Sylvain Bauza wrote: (sorry for reposting, missed 2 links...) Hi Nikola, Le 12/08/2014 12:21, Nikola Đipanov a écrit : Hey Nova-istas, While I was hacking on [1] I was considering how to approach the fact that we now need to track one more thing (NUMA node utilization) in our resources. I went with - I'll add it to compute nodes table thinking it's a fundamental enough property of a compute host that it deserves to be there, although I was considering Extensible Resource Tracker at one point (ERT from now on - see [2]) but looking at the code - it did not seem to provide anything I desperately needed, so I went with keeping it simple. So fast-forward a few days, and I caught myself solving a problem that I kept thinking ERT should have solved - but apparently hasn't, and I think it is fundamentally a broken design without it - so I'd really like to see it re-visited. The problem can be described by the following lemma (if you take 'lemma' to mean 'a sentence I came up with just now' :)): Due to the way scheduling works in Nova (roughly: pick a host based on stale(ish) data, rely on claims to trigger a re-schedule), _same exact_ information that scheduling service used when making a placement decision, needs to be available to the compute service when testing the placement. This is not the case right now, and the ERT does not propose any way to solve it - (see how I hacked around needing to be able to get extra_specs when making claims in [3], without hammering the DB). The result will be that any resource that we add and needs user supplied info for scheduling an instance against it, will need a buggy re-implementation of gathering all the bits from the request that scheduler sees, to be able to work properly. Well, ERT does provide a plugin mechanism for testing resources at the claim level. This is the plugin responsibility to implement a test() method [2.1] which will be called when test_claim() [2.2] So, provided this method is implemented, a local host check can be done based on the host's view of resources. Yes - the problem is there is no clear API to get all the needed bits to do so - especially the user supplied one from image and flavors. On top of that, in current implementation we only pass a hand-wavy 'usage' blob in. This makes anyone wanting to use this in conjunction with some of the user supplied bits roll their own 'extract_data_from_instance_metadata_flavor_image' or similar which is horrible and also likely bad for performance. I see your concern where there is no interface for user-facing resources like flavor or image metadata. I also think indeed that the big 'usage' blob is not a good choice for long-term vision. That said, I don't think as we say in French to throw the bath water... ie. the problem is with the RT, not the ERT (apart the mention of third-party API that you noted - I'll go to it later below) This is obviously a bigger concern when we want to allow users to pass data (through image or flavor) that can affect scheduling, but still a huge concern IMHO. And here is where I agree with you : at the moment, ResourceTracker (and consequently Extensible RT) only provides the view of the resources the host is knowing (see my point above) and possibly some other resources are missing. So, whatever your choice of going with or without ERT, your patch [3] still deserves it if we want not to lookup DB each time a claim goes. As I see that there are already BPs proposing to use this IMHO broken ERT ([4] for example), which will surely add to the proliferation of code that hacks around these design shortcomings in what is already a messy, but also crucial (for perf as well as features) bit of Nova code. Two distinct implementations of that spec (ie. instances and flavors) have been proposed [2.3] [2.4] so reviews are welcome. If you see the test() method, it's no-op thing for both plugins. I'm open to comments because I have the stated problem : how can we define a limit on just a counter of instances and flavors ? Will look at these - but none of them seem to hit the issue I am complaining about, and that is that it will need to consider other request data for claims, not only data available for on instances. Also - the fact that you don't implement test() in flavor ones tells me that the implementation is indeed racy (but it is racy atm as well) and two requests can indeed race for the same host, and since no claims are done, both can succeed. This is I believe (at least in case of single flavor hosts) unlikely to happen in practice, but you get the idea. Agreed, these 2 patches probably require another iteration, in particular how we make sure that it won't be racy. So I need another run to think about what to test() for these 2 examples. Another patch has to be done for aggregates, but it's still WIP so not mentioned here. Anyway, as discussed during today's