Re: [openstack-dev] [nova][placement] scheduling with custom resouce classes
On Wed, Jul 19, 2017 at 3:54 PM, Chris Dentwrote: On Wed, 19 Jul 2017, Balazs Gibizer wrote: I added more info to the bug report and the review as it seems the test is fluctuating. (Reflecting some conversation gibi and I have had in IRC) I've made a gabbi-based replication of the desired functionality. It also flaps, with a >50% failure rate: https://review.openstack.org/#/c/485209/ Sorry copy pasted the wrong link, the correct link is https://bugs.launchpad.net/nova/+bug/1705231 This has been updated (by gibi) to show that the generated SQL is different between the failure and success cases. Thanks Jay for proposing the fix https://review.openstack.org/#/c/485088/ . It works for me both in the functional env and in devstack. cheers, gibi -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][placement] scheduling with custom resouce classes
On Wed, 19 Jul 2017, Balazs Gibizer wrote: I added more info to the bug report and the review as it seems the test is fluctuating. (Reflecting some conversation gibi and I have had in IRC) I've made a gabbi-based replication of the desired functionality. It also flaps, with a >50% failure rate: https://review.openstack.org/#/c/485209/ Sorry copy pasted the wrong link, the correct link is https://bugs.launchpad.net/nova/+bug/1705231 This has been updated (by gibi) to show that the generated SQL is different between the failure and success cases. -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent__ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][placement] scheduling with custom resouce classes
On Wed, Jul 19, 2017 at 1:13 PM, Chris Dentwrote: On Wed, 19 Jul 2017, Balazs Gibizer wrote: We are trying to get some help from the related functional test [5] but honestly we still need some time to digest that LOCs. So any direct help is appreciated. I managed to create a functional test case that reproduces the above problem https://review.openstack.org/#/c/485088/ Excellent, thank you. I was planning to look into repeating this today, will first look at this test and see what I can see. Your experimentation is exactly the sort of stuff we need right now, so thank you very much. I added more info to the bug report and the review as it seems the test is fluctuating. BTW, should I open a bug for it? I also filed a bug so that we can track this work https://bugs.launchpad.net/nova/+bug/1705071 I guess Jay and Matt have already fixed a part of this, but not the whole thing. Sorry copy pasted the wrong link, the correct link is https://bugs.launchpad.net/nova/+bug/1705231 Cheers, gibi -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][placement] scheduling with custom resouce classes
On Wed, 19 Jul 2017, Balazs Gibizer wrote: We are trying to get some help from the related functional test [5] but honestly we still need some time to digest that LOCs. So any direct help is appreciated. I managed to create a functional test case that reproduces the above problem https://review.openstack.org/#/c/485088/ Excellent, thank you. I was planning to look into repeating this today, will first look at this test and see what I can see. Your experimentation is exactly the sort of stuff we need right now, so thank you very much. BTW, should I open a bug for it? I also filed a bug so that we can track this work https://bugs.launchpad.net/nova/+bug/1705071 I guess Jay and Matt have already fixed a part of this, but not the whole thing. -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent__ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][placement] scheduling with custom resouce classes
On Tue, Jul 18, 2017 at 2:39 PM, Balazs Gibizerwrote: On Mon, Jul 17, 2017 at 6:40 PM, Jay Pipes wrote: > On 07/17/2017 11:31 AM, Balazs Gibizer wrote: > > On Thu, Jul 13, 2017 at 11:37 AM, Chris Dent > > > wrote: > >> On Thu, 13 Jul 2017, Balazs Gibizer wrote: > >> > >>> > /placement/allocation_candidates?resources=CUSTOM_MAGIC%3A512%2CMEMORY_MB%3A64%2CVCPU%3A1" > >>> but placement returns an empty response. Then nova scheduler falls > >>> back to legacy behavior [4] and places the instance without > >>> considering the custom resource request. > >> > >> As far as I can tell at least one missing piece of the puzzle here > >> is that your MAGIC provider does not have the > >> 'MISC_SHARES_VIA_AGGREGATE' trait. It's not enough for the compute > >> and MAGIC to be in the same aggregate, the MAGIC needs to announce > >> that its inventory is for sharing. The comments here have a bit > more > >> on that: > >> > >> > https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L663-L678 > > > > Thanks a lot for the detailed answer. Yes, this was the missing > piece. > > However I had to add that trait both the the MAGIC provider and to > my > > compute provider to make it work. Is it intentional that the compute > > also has to have that trait? > > No. The compute node doesn't need that trait. It only needs to be > associated to an aggregate that is associated to the provider that is > marked with the MISC_SHARES_VIA_AGGREGATE trait. > > In other words, you need to do this: > > 1) Create the provider record for the thing that is going to share the > CUSTOM_MAGIC resources > > 2) Create an inventory record on that provider > > 3) Set the MISC_SHARES_VIA_AGGREGATE trait on that provider > > 4) Create an aggregate > > 5) Associate both the above provider and the compute node provider > with > the aggregate > > That's it. The compute node provider will now have access to the > CUSTOM_MAGIC resources that the other provider has in inventory. Something doesn't add up. We tried exactly your order of actions (see the script [1]) but placement returns an empty result (see the logs of the script[2], of the scheduler[3], of the placement[4]). However as soon as we add the MISC_SHARES_VIA_AGGREGATE trait to the compute provider as well then placement-api returns allocation candidates as expected. We are trying to get some help from the related functional test [5] but honestly we still need some time to digest that LOCs. So any direct help is appreciated. I managed to create a functional test case that reproduces the above problem https://review.openstack.org/#/c/485088/ BTW, should I open a bug for it? I also filed a bug so that we can track this work https://bugs.launchpad.net/nova/+bug/1705071 Cheers, gibi As a related question. I looked at the claim in the scheduler patch https://review.openstack.org/#/c/483566 and I wondering if that patch wants to claim not just the resources a compute provider provides but also custom resources like MAGIC at [6]. In the meantime I will go and test that patch to see what it actually does with some MAGIC. :) Thanks for the help! Cheers, gibi [1] http://paste.openstack.org/show/615707/ [2] http://paste.openstack.org/show/615708/ [3] http://paste.openstack.org/show/615709/ [4] http://paste.openstack.org/show/615710/ [5] https://github.com/openstack/nova/blob/0e6cac5fde830f1de0ebdd4eebc130de1eb0198d/nova/tests/functional/db/test_resource_provider.py#L1969 [6] https://review.openstack.org/#/c/483566/3/nova/scheduler/filter_scheduler.py@167 > > > Magic. :) > > Best, > -jay > > > I updated my script with the trait. [3] > > > >> > >> It's quite likely this is not well documented yet as this style of > >> declaring that something is shared was a later development. The > >> initial code that added the support for GET /resource_providers > >> was around, it was later reused for GET /allocation_candidates: > >> > >> https://review.openstack.org/#/c/460798/ > > > > What would be a good place to document this? I think I can help with > > enhancing the documentation from this perspective. > > > > Thanks again. > > Cheers, > > gibi > > > >> > >> -- > >> Chris Dent ┬──┬◡ノ(° -°ノ) > https://anticdent.org/ > >> freenode: cdent tw: > @anticdent > > > > [3] http://paste.openstack.org/show/615629/ > > > > > > > > > > > __ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: >
Re: [openstack-dev] [nova][placement] scheduling with custom resouce classes
On Mon, Jul 17, 2017 at 6:40 PM, Jay Pipeswrote: On 07/17/2017 11:31 AM, Balazs Gibizer wrote: > On Thu, Jul 13, 2017 at 11:37 AM, Chris Dent > wrote: >> On Thu, 13 Jul 2017, Balazs Gibizer wrote: >> >>> /placement/allocation_candidates?resources=CUSTOM_MAGIC%3A512%2CMEMORY_MB%3A64%2CVCPU%3A1" >>> but placement returns an empty response. Then nova scheduler falls >>> back to legacy behavior [4] and places the instance without >>> considering the custom resource request. >> >> As far as I can tell at least one missing piece of the puzzle here >> is that your MAGIC provider does not have the >> 'MISC_SHARES_VIA_AGGREGATE' trait. It's not enough for the compute >> and MAGIC to be in the same aggregate, the MAGIC needs to announce >> that its inventory is for sharing. The comments here have a bit more >> on that: >> >> https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L663-L678 > > Thanks a lot for the detailed answer. Yes, this was the missing piece. > However I had to add that trait both the the MAGIC provider and to my > compute provider to make it work. Is it intentional that the compute > also has to have that trait? No. The compute node doesn't need that trait. It only needs to be associated to an aggregate that is associated to the provider that is marked with the MISC_SHARES_VIA_AGGREGATE trait. In other words, you need to do this: 1) Create the provider record for the thing that is going to share the CUSTOM_MAGIC resources 2) Create an inventory record on that provider 3) Set the MISC_SHARES_VIA_AGGREGATE trait on that provider 4) Create an aggregate 5) Associate both the above provider and the compute node provider with the aggregate That's it. The compute node provider will now have access to the CUSTOM_MAGIC resources that the other provider has in inventory. Something doesn't add up. We tried exactly your order of actions (see the script [1]) but placement returns an empty result (see the logs of the script[2], of the scheduler[3], of the placement[4]). However as soon as we add the MISC_SHARES_VIA_AGGREGATE trait to the compute provider as well then placement-api returns allocation candidates as expected. We are trying to get some help from the related functional test [5] but honestly we still need some time to digest that LOCs. So any direct help is appreciated. BTW, should I open a bug for it? As a related question. I looked at the claim in the scheduler patch https://review.openstack.org/#/c/483566 and I wondering if that patch wants to claim not just the resources a compute provider provides but also custom resources like MAGIC at [6]. In the meantime I will go and test that patch to see what it actually does with some MAGIC. :) Thanks for the help! Cheers, gibi [1] http://paste.openstack.org/show/615707/ [2] http://paste.openstack.org/show/615708/ [3] http://paste.openstack.org/show/615709/ [4] http://paste.openstack.org/show/615710/ [5] https://github.com/openstack/nova/blob/0e6cac5fde830f1de0ebdd4eebc130de1eb0198d/nova/tests/functional/db/test_resource_provider.py#L1969 [6] https://review.openstack.org/#/c/483566/3/nova/scheduler/filter_scheduler.py@167 Magic. :) Best, -jay > I updated my script with the trait. [3] > >> >> It's quite likely this is not well documented yet as this style of >> declaring that something is shared was a later development. The >> initial code that added the support for GET /resource_providers >> was around, it was later reused for GET /allocation_candidates: >> >> https://review.openstack.org/#/c/460798/ > > What would be a good place to document this? I think I can help with > enhancing the documentation from this perspective. > > Thanks again. > Cheers, > gibi > >> >> -- >> Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ >> freenode: cdent tw: @anticdent > > [3] http://paste.openstack.org/show/615629/ > > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][placement] scheduling with custom resouce classes
On 07/17/2017 11:31 AM, Balazs Gibizer wrote: On Thu, Jul 13, 2017 at 11:37 AM, Chris Dentwrote: On Thu, 13 Jul 2017, Balazs Gibizer wrote: /placement/allocation_candidates?resources=CUSTOM_MAGIC%3A512%2CMEMORY_MB%3A64%2CVCPU%3A1" but placement returns an empty response. Then nova scheduler falls back to legacy behavior [4] and places the instance without considering the custom resource request. As far as I can tell at least one missing piece of the puzzle here is that your MAGIC provider does not have the 'MISC_SHARES_VIA_AGGREGATE' trait. It's not enough for the compute and MAGIC to be in the same aggregate, the MAGIC needs to announce that its inventory is for sharing. The comments here have a bit more on that: https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L663-L678 Thanks a lot for the detailed answer. Yes, this was the missing piece. However I had to add that trait both the the MAGIC provider and to my compute provider to make it work. Is it intentional that the compute also has to have that trait? No. The compute node doesn't need that trait. It only needs to be associated to an aggregate that is associated to the provider that is marked with the MISC_SHARES_VIA_AGGREGATE trait. In other words, you need to do this: 1) Create the provider record for the thing that is going to share the CUSTOM_MAGIC resources 2) Create an inventory record on that provider 3) Set the MISC_SHARES_VIA_AGGREGATE trait on that provider 4) Create an aggregate 5) Associate both the above provider and the compute node provider with the aggregate That's it. The compute node provider will now have access to the CUSTOM_MAGIC resources that the other provider has in inventory. Magic. :) Best, -jay I updated my script with the trait. [3] It's quite likely this is not well documented yet as this style of declaring that something is shared was a later development. The initial code that added the support for GET /resource_providers was around, it was later reused for GET /allocation_candidates: https://review.openstack.org/#/c/460798/ What would be a good place to document this? I think I can help with enhancing the documentation from this perspective. Thanks again. Cheers, gibi -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent [3] http://paste.openstack.org/show/615629/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][placement] scheduling with custom resouce classes
On Thu, Jul 13, 2017 at 11:37 AM, Chris Dentwrote: On Thu, 13 Jul 2017, Balazs Gibizer wrote: /placement/allocation_candidates?resources=CUSTOM_MAGIC%3A512%2CMEMORY_MB%3A64%2CVCPU%3A1" but placement returns an empty response. Then nova scheduler falls back to legacy behavior [4] and places the instance without considering the custom resource request. As far as I can tell at least one missing piece of the puzzle here is that your MAGIC provider does not have the 'MISC_SHARES_VIA_AGGREGATE' trait. It's not enough for the compute and MAGIC to be in the same aggregate, the MAGIC needs to announce that its inventory is for sharing. The comments here have a bit more on that: https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L663-L678 Thanks a lot for the detailed answer. Yes, this was the missing piece. However I had to add that trait both the the MAGIC provider and to my compute provider to make it work. Is it intentional that the compute also has to have that trait? I updated my script with the trait. [3] It's quite likely this is not well documented yet as this style of declaring that something is shared was a later development. The initial code that added the support for GET /resource_providers was around, it was later reused for GET /allocation_candidates: https://review.openstack.org/#/c/460798/ What would be a good place to document this? I think I can help with enhancing the documentation from this perspective. Thanks again. Cheers, gibi -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent [3] http://paste.openstack.org/show/615629/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][placement] scheduling with custom resouce classes
On Thu, 13 Jul 2017, Balazs Gibizer wrote: /placement/allocation_candidates?resources=CUSTOM_MAGIC%3A512%2CMEMORY_MB%3A64%2CVCPU%3A1" but placement returns an empty response. Then nova scheduler falls back to legacy behavior [4] and places the instance without considering the custom resource request. As far as I can tell at least one missing piece of the puzzle here is that your MAGIC provider does not have the 'MISC_SHARES_VIA_AGGREGATE' trait. It's not enough for the compute and MAGIC to be in the same aggregate, the MAGIC needs to announce that its inventory is for sharing. The comments here have a bit more on that: https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L663-L678 It's quite likely this is not well documented yet as this style of declaring that something is shared was a later development. The initial code that added the support for GET /resource_providers was around, it was later reused for GET /allocation_candidates: https://review.openstack.org/#/c/460798/ The other thing to be aware of is that GET /allocation_candidates is in flight. It should be stable on the placement service side, but the way the data is being used on the scheduler side is undergoing change as we speak: https://review.openstack.org/#/c/482381/ Then I tried to connect the compute provider and the MAGIC provider to the same aggregate via the placement API but the above placement request still resulted in empty response. See my exact steps in [5]. This still needs to happen, but you also need to put the trait mentioned above on the magic provider, the docs for that are in progress on this review https://review.openstack.org/#/c/474550/ and a rendered version: http://docs-draft.openstack.org/50/474550/8/check/gate-placement-api-ref-nv/2d2a7ea//placement-api-ref/build/html/#update-resource-provider-traits Do I still missing some environment setup on my side to make it work? Is the work in [1] incomplete? Are the missing pieces in [2] needed to make this use case work? If more implementation is needed then I can offer some help during Queens cycle. There's definitely more to do and your help would be greatly appreciated. It's _fantastic_ that you are experimenting with this and sharing what's happening. To make the above use case fully functional I realized that I need a service that periodically updates the placement service with the state of the MAGIC resource like the resource tracker in Nova. Is there any existing plans creating a generic service or framework that can be used for the tracking and reporting purposes? As you've probably discovered from your experiments with curl, updating inventory is pretty straightforward (if you have a TOKEN) so we decided to forego making a framework at this point. I had some code long ago that demonstrated one way to do it, but it didn't get any traction: https://review.openstack.org/#/c/382613/ That tried to be a simple python script using requests that did the bare minimum and would be amenable to cron jobs and other simple scripts. I hope some of the above is helpful. Jay, Ed, Sylvain or Dan may come along with additional info. -- Chris Dent ┬──┬◡ノ(° -°ノ) https://anticdent.org/ freenode: cdent tw: @anticdent__ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][placement] scheduling with custom resouce classes
Dear Placement developers, I'm trying to build on top of the custom resource class implementation [1][2] from the current master [3]. I'd like to place instances based on normal resources (cpu, ram) and based on a custom resource I will call MAGIC for this discussion. So far I managed to use the placement API to define the CUSTOM_MAGIC resource, create a provider and report some inventory of MAGIC from that provider. Then I added the 'resources:CUSTOM_MAGIC=512' to the flavor's extra_specs. During server boot the scheduler builds a seemingly good placement request "GET /placement/allocation_candidates?resources=CUSTOM_MAGIC%3A512%2CMEMORY_MB%3A64%2CVCPU%3A1" but placement returns an empty response. Then nova scheduler falls back to legacy behavior [4] and places the instance without considering the custom resource request. Then I tried to connect the compute provider and the MAGIC provider to the same aggregate via the placement API but the above placement request still resulted in empty response. See my exact steps in [5]. Do I still missing some environment setup on my side to make it work? Is the work in [1] incomplete? Are the missing pieces in [2] needed to make this use case work? If more implementation is needed then I can offer some help during Queens cycle. To make the above use case fully functional I realized that I need a service that periodically updates the placement service with the state of the MAGIC resource like the resource tracker in Nova. Is there any existing plans creating a generic service or framework that can be used for the tracking and reporting purposes? Cheers, gibi [1] https://review.openstack.org/#/q/topic:bp/custom-resource-classes-pike [2] https://review.openstack.org/#/q/topic:bp/custom-resource-classes-in-flavors [3] 0ffe7b27892fde243fc1006f800f309c10d66028 [4] https://github.com/openstack/nova/blob/48268c73e3f43fa763d071422816942942987f4a/nova/scheduler/manager.py#L116 [5] http://paste.openstack.org/show/615152/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev