Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
Le 22/04/2016 16:14, Matt Riedemann a écrit : On 4/22/2016 2:48 AM, Sylvain Bauza wrote: Le 22/04/2016 02:49, Jay Pipes a écrit : On 04/20/2016 06:40 PM, Matt Riedemann wrote: Note that I think the only time Nova gets details about ports in the API during a server create request is when doing the network request validation, and that's only if there is a fixed IP address or specific port(s) in the request, otherwise Nova just gets the networks. [1] [1] https://github.com/openstack/nova/blob/ee7a01982611cdf8012a308fa49722146c51497f/nova/network/neutronv2/api.py#L1123 Actually, nova.network.neutronv3.api.API.allocate_for_instance() is *never* called by the Compute API service (though, strangely, deallocate_for_instance() *is* called by the Compute API service. allocate_for_instance() is *only* ever called in the nova-compute service: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/compute/manager.py#L1388 I was actually on a hangout today with Carl, Miguel and Dan Smith talking about just this particular section of code with regards to routed networks IPAM handling. What I believe we'd like to do is move to a model where we call out to Neutron here in the conductor: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/conductor/manager.py#L397 and ask Neutron to give us as much information about available subnet allocation pools and segment IDs as it can *before* we end up calling the scheduler here: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/conductor/manager.py#L415 Not only will the segment IDs allow us to more properly use network affinity in placement decisions, but doing this kind of "probing" for network information in the conductor is inherently more scalable than doing this all in allocate_for_instance() on the compute node while holding the giant COMPUTE_NODE_SEMAPHORE lock. I totally agree with that plan. I never replied to Ajo's point (thanks Matt for doing that) but I was struggling to figure out an allocation call in the Compute API service. Thanks Jay for clarifying this. Funny, we do *deallocate* if an exception is raised when trying to find a destination in the conductor, but since the port is not allocated yet, I guess it's a no-op at the moment. https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/conductor/manager.py#L423-L424 Is this here for rebuilds where we setup networks on a compute node but something else failed, maybe setting up block devices? Although we have a lot of checks in the build flow in the compute manager for deallocating the network on failure. Yeah, after git blaming, the reason is told in the commit msg : https://review.openstack.org/#/c/243477/ Fair enough, I just think it's another good reason to discuss where and when we should allocate and deallocate networks because I'm not super comfortable with the above. Or one other thing could be to trace that a port was already allocated for a specific instance and prevent doing that deallocation if not done yet instead of just doing what was necessary there https://review.openstack.org/#/c/269462/1/nova/conductor/manager.py ? -Sylvain Clarifying the above and making the conductor responsible for placing calls to Neutron is something I'd love to see before moving further with the routed networks and the QoS specs, and yes doing that in the conductor seems to me the best fit. -Sylvain Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
On 4/22/2016 2:48 AM, Sylvain Bauza wrote: Le 22/04/2016 02:49, Jay Pipes a écrit : On 04/20/2016 06:40 PM, Matt Riedemann wrote: Note that I think the only time Nova gets details about ports in the API during a server create request is when doing the network request validation, and that's only if there is a fixed IP address or specific port(s) in the request, otherwise Nova just gets the networks. [1] [1] https://github.com/openstack/nova/blob/ee7a01982611cdf8012a308fa49722146c51497f/nova/network/neutronv2/api.py#L1123 Actually, nova.network.neutronv3.api.API.allocate_for_instance() is *never* called by the Compute API service (though, strangely, deallocate_for_instance() *is* called by the Compute API service. allocate_for_instance() is *only* ever called in the nova-compute service: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/compute/manager.py#L1388 I was actually on a hangout today with Carl, Miguel and Dan Smith talking about just this particular section of code with regards to routed networks IPAM handling. What I believe we'd like to do is move to a model where we call out to Neutron here in the conductor: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/conductor/manager.py#L397 and ask Neutron to give us as much information about available subnet allocation pools and segment IDs as it can *before* we end up calling the scheduler here: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/conductor/manager.py#L415 Not only will the segment IDs allow us to more properly use network affinity in placement decisions, but doing this kind of "probing" for network information in the conductor is inherently more scalable than doing this all in allocate_for_instance() on the compute node while holding the giant COMPUTE_NODE_SEMAPHORE lock. I totally agree with that plan. I never replied to Ajo's point (thanks Matt for doing that) but I was struggling to figure out an allocation call in the Compute API service. Thanks Jay for clarifying this. Funny, we do *deallocate* if an exception is raised when trying to find a destination in the conductor, but since the port is not allocated yet, I guess it's a no-op at the moment. https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/conductor/manager.py#L423-L424 Is this here for rebuilds where we setup networks on a compute node but something else failed, maybe setting up block devices? Although we have a lot of checks in the build flow in the compute manager for deallocating the network on failure. Clarifying the above and making the conductor responsible for placing calls to Neutron is something I'd love to see before moving further with the routed networks and the QoS specs, and yes doing that in the conductor seems to me the best fit. -Sylvain Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Thanks, Matt Riedemann __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
Le 22/04/2016 02:49, Jay Pipes a écrit : On 04/20/2016 06:40 PM, Matt Riedemann wrote: Note that I think the only time Nova gets details about ports in the API during a server create request is when doing the network request validation, and that's only if there is a fixed IP address or specific port(s) in the request, otherwise Nova just gets the networks. [1] [1] https://github.com/openstack/nova/blob/ee7a01982611cdf8012a308fa49722146c51497f/nova/network/neutronv2/api.py#L1123 Actually, nova.network.neutronv3.api.API.allocate_for_instance() is *never* called by the Compute API service (though, strangely, deallocate_for_instance() *is* called by the Compute API service. allocate_for_instance() is *only* ever called in the nova-compute service: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/compute/manager.py#L1388 I was actually on a hangout today with Carl, Miguel and Dan Smith talking about just this particular section of code with regards to routed networks IPAM handling. What I believe we'd like to do is move to a model where we call out to Neutron here in the conductor: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/conductor/manager.py#L397 and ask Neutron to give us as much information about available subnet allocation pools and segment IDs as it can *before* we end up calling the scheduler here: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/conductor/manager.py#L415 Not only will the segment IDs allow us to more properly use network affinity in placement decisions, but doing this kind of "probing" for network information in the conductor is inherently more scalable than doing this all in allocate_for_instance() on the compute node while holding the giant COMPUTE_NODE_SEMAPHORE lock. I totally agree with that plan. I never replied to Ajo's point (thanks Matt for doing that) but I was struggling to figure out an allocation call in the Compute API service. Thanks Jay for clarifying this. Funny, we do *deallocate* if an exception is raised when trying to find a destination in the conductor, but since the port is not allocated yet, I guess it's a no-op at the moment. https://github.com/openstack/nova/blob/d57a4e8be9147bd79be12d3f5adccc9289a375b6/nova/conductor/manager.py#L423-L424 Clarifying the above and making the conductor responsible for placing calls to Neutron is something I'd love to see before moving further with the routed networks and the QoS specs, and yes doing that in the conductor seems to me the best fit. -Sylvain Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
On 04/20/2016 06:40 PM, Matt Riedemann wrote: Note that I think the only time Nova gets details about ports in the API during a server create request is when doing the network request validation, and that's only if there is a fixed IP address or specific port(s) in the request, otherwise Nova just gets the networks. [1] [1] https://github.com/openstack/nova/blob/ee7a01982611cdf8012a308fa49722146c51497f/nova/network/neutronv2/api.py#L1123 Actually, nova.network.neutronv3.api.API.allocate_for_instance() is *never* called by the Compute API service (though, strangely, deallocate_for_instance() *is* called by the Compute API service. allocate_for_instance() is *only* ever called in the nova-compute service: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/compute/manager.py#L1388 I was actually on a hangout today with Carl, Miguel and Dan Smith talking about just this particular section of code with regards to routed networks IPAM handling. What I believe we'd like to do is move to a model where we call out to Neutron here in the conductor: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/conductor/manager.py#L397 and ask Neutron to give us as much information about available subnet allocation pools and segment IDs as it can *before* we end up calling the scheduler here: https://github.com/openstack/nova/blob/7be945b53944a44b26e49892e8a685815bf0cacb/nova/conductor/manager.py#L415 Not only will the segment IDs allow us to more properly use network affinity in placement decisions, but doing this kind of "probing" for network information in the conductor is inherently more scalable than doing this all in allocate_for_instance() on the compute node while holding the giant COMPUTE_NODE_SEMAPHORE lock. Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
On 4/20/2016 8:25 AM, Miguel Angel Ajo Pelayo wrote: Inline update. On Mon, Apr 11, 2016 at 4:22 PM, Miguel Angel Ajo Pelayowrote: On Mon, Apr 11, 2016 at 1:46 PM, Jay Pipes wrote: On 04/08/2016 09:17 AM, Miguel Angel Ajo Pelayo wrote: [...] Yes, Nova's conductor gathers information about the requested networks *before* asking the scheduler where to place hosts: https://github.com/openstack/nova/blob/stable/mitaka/nova/conductor/manager.py#L362 That would require identifying that the port has a "qos_policy_id" attached to it, and then, asking neutron for the specific QoS policy [3], then look out for a minimum bandwidth rule (still to be defined), and extract the required bandwidth from it. Yep, exactly correct. That moves, again some of the responsibility to examine and understand external resources to nova. Yep, it does. The alternative is more retries for placement decisions because accurate decisions cannot be made until the compute node is already selected and the claim happens on the compute node. Could it make sense to make that part pluggable via stevedore?, so we would provide something that takes the "resource id" (for a port in this case) and returns the requirements translated to resource classes (NIC_BW_KB in this case). Not sure Stevedore makes sense in this context. Really, we want *less* extensibility and *more* consistency. So, I would envision rather a system where Nova would call to Neutron before scheduling when it has received a port or network ID in the boot request and ask Neutron whether the port or network has any resource constraints on it. Neutron would return a standardized response containing each resource class and the amount requested in a dictionary (or better yet, an os_vif.objects.* object, serialized). Something like: { 'resources': { '': { 'NIC_BW_KB': 2048, 'IPV4_ADDRESS': 1 } } } Oh, true, that's a great idea, having some API that translates a neutron resource, to scheduling constraints. The external call will be still required, but the coupling issue is removed. I had a talk yesterday with @iharchys, @dansmith, and @sbauzas about this, and we believe the synthesis of resource usage / scheduling constraints from neutron makes sense. We should probably look into providing those details in a read only dictionary during port creation/update/show in general, in that way, we would not be adding an extra API call to neutron from the nova scheduler to figure out any of those details. That extra optimization is something we may need to discuss with the neutron community. Note that I think the only time Nova gets details about ports in the API during a server create request is when doing the network request validation, and that's only if there is a fixed IP address or specific port(s) in the request, otherwise Nova just gets the networks. [1] In the case of the NIC_BW_KB resource class, Nova's scheduler would look for compute nodes that had a NIC with that amount of bandwidth still available. In the case of the IPV4_ADDRESS resource class, Nova's scheduler would use the generic-resource-pools interface to find a resource pool of IPV4_ADDRESS resources (i.e. a Neutron routed network or subnet allocation pool) that has available IP space for the request. Not sure about the IPV4_ADDRESS part because I still didn't look on how they resolve routed networks with this new framework, but for other constraints makes perfect sense to me. Best, -jay Best regards, Miguel Ángel Ajo [1] http://lists.openstack.org/pipermail/openstack-dev/2016-February/086371.html [2] https://bugs.launchpad.net/neutron/+bug/1560963 [3] http://developer.openstack.org/api-ref-networking-v2-ext.html#showPolicy __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev [1] https://github.com/openstack/nova/blob/ee7a01982611cdf8012a308fa49722146c51497f/nova/network/neutronv2/api.py#L1123 -- Thanks, Matt Riedemann __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
On Wed, Apr 20, 2016 at 4:25 PM, Miguel Angel Ajo Pelayo < majop...@redhat.com> wrote: > Inline update. > > On Mon, Apr 11, 2016 at 4:22 PM, Miguel Angel Ajo Pelayo >wrote: > > On Mon, Apr 11, 2016 at 1:46 PM, Jay Pipes wrote: > >> On 04/08/2016 09:17 AM, Miguel Angel Ajo Pelayo wrote: > [...] > >> Yes, Nova's conductor gathers information about the requested networks > >> *before* asking the scheduler where to place hosts: > >> > >> > https://github.com/openstack/nova/blob/stable/mitaka/nova/conductor/manager.py#L362 > >> > >>> That would require identifying that the port has a "qos_policy_id" > >>> attached to it, and then, asking neutron for the specific QoS policy > >>> [3], then look out for a minimum bandwidth rule (still to be > defined), > >>> and extract the required bandwidth from it. > >> > >> > >> Yep, exactly correct. > >> > >>> That moves, again some of the responsibility to examine and > >>> understand external resources to nova. > >> > >> > >> Yep, it does. The alternative is more retries for placement decisions > >> because accurate decisions cannot be made until the compute node is > already > >> selected and the claim happens on the compute node. > >> > >>> Could it make sense to make that part pluggable via stevedore?, so > >>> we would provide something that takes the "resource id" (for a port in > >>> this case) and returns the requirements translated to resource classes > >>> (NIC_BW_KB in this case). > >> > >> > >> Not sure Stevedore makes sense in this context. Really, we want *less* > >> extensibility and *more* consistency. So, I would envision rather a > system > >> where Nova would call to Neutron before scheduling when it has received > a > >> port or network ID in the boot request and ask Neutron whether the port > or > >> network has any resource constraints on it. Neutron would return a > >> standardized response containing each resource class and the amount > >> requested in a dictionary (or better yet, an os_vif.objects.* object, > >> serialized). Something like: > >> > >> { > >> 'resources': { > >> '': { > >> 'NIC_BW_KB': 2048, > >> 'IPV4_ADDRESS': 1 > >> } > >> } > >> } > >> > > > > Oh, true, that's a great idea, having some API that translates a > > neutron resource, to scheduling constraints. The external call will be > > still required, but the coupling issue is removed. > > > > > > > I had a talk yesterday with @iharchys, @dansmith, and @sbauzas about > this, and we believe the synthesis of resource usage / scheduling > constraints from neutron makes sense. > > We should probably look into providing those details in a read only > dictionary during port creation/update/show in general, in that way, > we would not be adding an extra API call to neutron from the nova > scheduler to figure out any of those details. That extra optimization > is something we may need to discuss with the neutron community. > What about the caller context? I believe these details should be visible for admin user only. > > > >> In the case of the NIC_BW_KB resource class, Nova's scheduler would > look for > >> compute nodes that had a NIC with that amount of bandwidth still > available. > >> In the case of the IPV4_ADDRESS resource class, Nova's scheduler would > use > >> the generic-resource-pools interface to find a resource pool of > IPV4_ADDRESS > >> resources (i.e. a Neutron routed network or subnet allocation pool) > that has > >> available IP space for the request. > >> > > > > Not sure about the IPV4_ADDRESS part because I still didn't look on > > how they resolve routed networks with this new framework, but for > > other constraints makes perfect sense to me. > > > >> Best, > >> -jay > >> > >> > >>> Best regards, > >>> Miguel Ángel Ajo > >>> > >>> > >>> [1] > >>> > >>> > http://lists.openstack.org/pipermail/openstack-dev/2016-February/086371.html > >>> [2] https://bugs.launchpad.net/neutron/+bug/1560963 > >>> [3] > >>> > http://developer.openstack.org/api-ref-networking-v2-ext.html#showPolicy > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
Inline update. On Mon, Apr 11, 2016 at 4:22 PM, Miguel Angel Ajo Pelayowrote: > On Mon, Apr 11, 2016 at 1:46 PM, Jay Pipes wrote: >> On 04/08/2016 09:17 AM, Miguel Angel Ajo Pelayo wrote: [...] >> Yes, Nova's conductor gathers information about the requested networks >> *before* asking the scheduler where to place hosts: >> >> https://github.com/openstack/nova/blob/stable/mitaka/nova/conductor/manager.py#L362 >> >>> That would require identifying that the port has a "qos_policy_id" >>> attached to it, and then, asking neutron for the specific QoS policy >>> [3], then look out for a minimum bandwidth rule (still to be defined), >>> and extract the required bandwidth from it. >> >> >> Yep, exactly correct. >> >>> That moves, again some of the responsibility to examine and >>> understand external resources to nova. >> >> >> Yep, it does. The alternative is more retries for placement decisions >> because accurate decisions cannot be made until the compute node is already >> selected and the claim happens on the compute node. >> >>> Could it make sense to make that part pluggable via stevedore?, so >>> we would provide something that takes the "resource id" (for a port in >>> this case) and returns the requirements translated to resource classes >>> (NIC_BW_KB in this case). >> >> >> Not sure Stevedore makes sense in this context. Really, we want *less* >> extensibility and *more* consistency. So, I would envision rather a system >> where Nova would call to Neutron before scheduling when it has received a >> port or network ID in the boot request and ask Neutron whether the port or >> network has any resource constraints on it. Neutron would return a >> standardized response containing each resource class and the amount >> requested in a dictionary (or better yet, an os_vif.objects.* object, >> serialized). Something like: >> >> { >> 'resources': { >> '': { >> 'NIC_BW_KB': 2048, >> 'IPV4_ADDRESS': 1 >> } >> } >> } >> > > Oh, true, that's a great idea, having some API that translates a > neutron resource, to scheduling constraints. The external call will be > still required, but the coupling issue is removed. > > I had a talk yesterday with @iharchys, @dansmith, and @sbauzas about this, and we believe the synthesis of resource usage / scheduling constraints from neutron makes sense. We should probably look into providing those details in a read only dictionary during port creation/update/show in general, in that way, we would not be adding an extra API call to neutron from the nova scheduler to figure out any of those details. That extra optimization is something we may need to discuss with the neutron community. >> In the case of the NIC_BW_KB resource class, Nova's scheduler would look for >> compute nodes that had a NIC with that amount of bandwidth still available. >> In the case of the IPV4_ADDRESS resource class, Nova's scheduler would use >> the generic-resource-pools interface to find a resource pool of IPV4_ADDRESS >> resources (i.e. a Neutron routed network or subnet allocation pool) that has >> available IP space for the request. >> > > Not sure about the IPV4_ADDRESS part because I still didn't look on > how they resolve routed networks with this new framework, but for > other constraints makes perfect sense to me. > >> Best, >> -jay >> >> >>> Best regards, >>> Miguel Ángel Ajo >>> >>> >>> [1] >>> >>> http://lists.openstack.org/pipermail/openstack-dev/2016-February/086371.html >>> [2] https://bugs.launchpad.net/neutron/+bug/1560963 >>> [3] >>> http://developer.openstack.org/api-ref-networking-v2-ext.html#showPolicy __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
On Mon, Apr 11, 2016 at 1:46 PM, Jay Pipeswrote: > Hi Miguel Angel, comments/answers inline :) > > On 04/08/2016 09:17 AM, Miguel Angel Ajo Pelayo wrote: >> >> Hi!, >> >> In the context of [1] (generic resource pools / scheduling in nova) >> and [2] (minimum bandwidth guarantees -egress- in neutron), I had a talk >> a few weeks ago with Jay Pipes, >> >> The idea was leveraging the generic resource pools and scheduling >> mechanisms defined in [1] to find the right hosts and track the total >> available bandwidth per host (and per host "physical network"), >> something in neutron (still to be defined where) would notify the new >> API about the total amount of "NIC_BW_KB" available on every host/physnet. > > > Yes, what we discussed was making it initially per host, meaning the host > would advertise a total aggregate bandwidth amount for all NICs that it uses > for the data plane as a single amount. > > The other way to track this resource class (NIC_BW_KB) would be to make the > NICs themselves be resource providers and then the scheduler could pick a > specific NIC to bind the port to based on available NIC_BW_KB on a > particular NIC. > > The former method makes things conceptually easier at the expense of > introducing greater potential for retrying placement decisions (since the > specific NIC to bind a port to wouldn't be known until the claim is made on > the compute host). The latter method adds complexity to the filtering and > scheduler in order to make more accurate placement decisions that would > result in fewer retries. > >> That part is quite clear to me, >> >> From [1] I'm not sure which blueprint introduces the ability to >> schedule based on the resource allocation/availability itself, >> ("resource-providers-scheduler" seems more like an optimization to the >> schedule/DB interaction, right?) > > > Yes, you are correct about the above blueprint; it's only for moving the > Python-side filters to be a DB query. > > The resource-providers-allocations blueprint: > > https://review.openstack.org/300177 > > Is the one where we convert the various consumed resource amount fields to > live in the single allocations table that may be queried for usage > information. > > We aim to use the ComputeNode object as a facade that hides the migration of > these data fields as much as possible so that the scheduler actually does > not need to know that the schema has changed underneath it. Of course, this > only works for *existing* resource classes, like vCPU, RAM, etc. It won't > work for *new* resource classes like the discussed NET_BW_KB because, > clearly, we don't have an existing field in the instance_extra or other > tables that contain that usage amount and therefore can't use ComputeNode > object as a facade over a non-existing piece of data. > > Eventually, the intent is to change the ComputeNode object to return a new > AllocationList object that would contain all of the compute node's resources > in a tabular format (mimicking the underlying allocations table): > > https://review.openstack.org/#/c/282442/20/nova/objects/resource_provider.py > > Once this is done, the scheduler can be fitted to query this AllocationList > object to make resource usage and placement decisions in the Python-side > filters. > > We are still debating on the resource-providers-scheduler-db-filters > blueprint: > > https://review.openstack.org/#/c/300178/ > > Whether to change the existing FilterScheduler or create a brand new > scheduler driver. I could go either way, frankly. If we made a brand new > scheduler driver, it would do a query against the compute_nodes table in the > DB directly. The legacy FilterScheduler would manipulate the AllocationList > object returned by the ComputeNode.allocations attribute. Either way we get > to where we want to go: representing all quantitative resources in a > standardized and consistent fashion. > >> And, that brings me to another point: at the moment of filtering >> hosts, nova I guess, will have the neutron port information, it has to >> somehow identify if the port is tied to a minimum bandwidth QoS policy. > > > Yes, Nova's conductor gathers information about the requested networks > *before* asking the scheduler where to place hosts: > > https://github.com/openstack/nova/blob/stable/mitaka/nova/conductor/manager.py#L362 > >> That would require identifying that the port has a "qos_policy_id" >> attached to it, and then, asking neutron for the specific QoS policy >> [3], then look out for a minimum bandwidth rule (still to be defined), >> and extract the required bandwidth from it. > > > Yep, exactly correct. > >> That moves, again some of the responsibility to examine and >> understand external resources to nova. > > > Yep, it does. The alternative is more retries for placement decisions > because accurate decisions cannot be made until the compute node is already > selected and the claim happens on the compute
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
Hi Miguel Angel, comments/answers inline :) On 04/08/2016 09:17 AM, Miguel Angel Ajo Pelayo wrote: Hi!, In the context of [1] (generic resource pools / scheduling in nova) and [2] (minimum bandwidth guarantees -egress- in neutron), I had a talk a few weeks ago with Jay Pipes, The idea was leveraging the generic resource pools and scheduling mechanisms defined in [1] to find the right hosts and track the total available bandwidth per host (and per host "physical network"), something in neutron (still to be defined where) would notify the new API about the total amount of "NIC_BW_KB" available on every host/physnet. Yes, what we discussed was making it initially per host, meaning the host would advertise a total aggregate bandwidth amount for all NICs that it uses for the data plane as a single amount. The other way to track this resource class (NIC_BW_KB) would be to make the NICs themselves be resource providers and then the scheduler could pick a specific NIC to bind the port to based on available NIC_BW_KB on a particular NIC. The former method makes things conceptually easier at the expense of introducing greater potential for retrying placement decisions (since the specific NIC to bind a port to wouldn't be known until the claim is made on the compute host). The latter method adds complexity to the filtering and scheduler in order to make more accurate placement decisions that would result in fewer retries. That part is quite clear to me, From [1] I'm not sure which blueprint introduces the ability to schedule based on the resource allocation/availability itself, ("resource-providers-scheduler" seems more like an optimization to the schedule/DB interaction, right?) Yes, you are correct about the above blueprint; it's only for moving the Python-side filters to be a DB query. The resource-providers-allocations blueprint: https://review.openstack.org/300177 Is the one where we convert the various consumed resource amount fields to live in the single allocations table that may be queried for usage information. We aim to use the ComputeNode object as a facade that hides the migration of these data fields as much as possible so that the scheduler actually does not need to know that the schema has changed underneath it. Of course, this only works for *existing* resource classes, like vCPU, RAM, etc. It won't work for *new* resource classes like the discussed NET_BW_KB because, clearly, we don't have an existing field in the instance_extra or other tables that contain that usage amount and therefore can't use ComputeNode object as a facade over a non-existing piece of data. Eventually, the intent is to change the ComputeNode object to return a new AllocationList object that would contain all of the compute node's resources in a tabular format (mimicking the underlying allocations table): https://review.openstack.org/#/c/282442/20/nova/objects/resource_provider.py Once this is done, the scheduler can be fitted to query this AllocationList object to make resource usage and placement decisions in the Python-side filters. We are still debating on the resource-providers-scheduler-db-filters blueprint: https://review.openstack.org/#/c/300178/ Whether to change the existing FilterScheduler or create a brand new scheduler driver. I could go either way, frankly. If we made a brand new scheduler driver, it would do a query against the compute_nodes table in the DB directly. The legacy FilterScheduler would manipulate the AllocationList object returned by the ComputeNode.allocations attribute. Either way we get to where we want to go: representing all quantitative resources in a standardized and consistent fashion. And, that brings me to another point: at the moment of filtering hosts, nova I guess, will have the neutron port information, it has to somehow identify if the port is tied to a minimum bandwidth QoS policy. Yes, Nova's conductor gathers information about the requested networks *before* asking the scheduler where to place hosts: https://github.com/openstack/nova/blob/stable/mitaka/nova/conductor/manager.py#L362 That would require identifying that the port has a "qos_policy_id" attached to it, and then, asking neutron for the specific QoS policy [3], then look out for a minimum bandwidth rule (still to be defined), and extract the required bandwidth from it. Yep, exactly correct. That moves, again some of the responsibility to examine and understand external resources to nova. Yep, it does. The alternative is more retries for placement decisions because accurate decisions cannot be made until the compute node is already selected and the claim happens on the compute node. Could it make sense to make that part pluggable via stevedore?, so we would provide something that takes the "resource id" (for a port in this case) and returns the requirements translated to resource classes (NIC_BW_KB
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
On Sun, Apr 10, 2016 at 10:07 AM, Moshe Levi <mosh...@mellanox.com> wrote: > > > > > *From:* Miguel Angel Ajo Pelayo [mailto:majop...@redhat.com] > *Sent:* Friday, April 08, 2016 4:17 PM > *To:* OpenStack Development Mailing List (not for usage questions) < > openstack-dev@lists.openstack.org> > *Subject:* [openstack-dev] [neutron] [nova] scheduling bandwidth > resources / NIC_BW_KB resource class > > > > > > Hi!, > > > >In the context of [1] (generic resource pools / scheduling in nova) and > [2] (minimum bandwidth guarantees -egress- in neutron), I had a talk a few > weeks ago with Jay Pipes, > > > >The idea was leveraging the generic resource pools and scheduling > mechanisms defined in [1] to find the right hosts and track the total > available bandwidth per host (and per host "physical network"), something > in neutron (still to be defined where) would notify the new API about the > total amount of "NIC_BW_KB" available on every host/physnet. > > I believe that NIC bandwidth can be taken from Libvirt see [4] and the > only piece that is missing is to tell nova the mapping of physnet to > network interface name. (In case of SR-IOV this is already known) > > I see bandwidth (speed) as one of many capabilities of NIC, therefore I > think we should take all of them in the same way in this case libvirt. I > was think of adding a new NIC as new resource to nova. > Yes, at the low level, thats one way to do it. We may need neutron agents or plugins to collect such information, since, in some cases one devices will be tied to one physical network, other devices will be tied to other physical networks, or even some devices could be connected to the same physnet. In some cases, connectivity depends on L3 tunnels, and in that case, bandwidth calculation is more complicated (depending on routes, etc.. -I'm not even looking at that case yet-) > > > [4] - > > net_enp129s0_e4_1d_2d_2d_8c_41 > > > /sys/devices/pci:80/:80:01.0/:81:00.0/net/enp129s0 > > pci__81_00_0 > > > > enp129s0 > > e4:1d:2d:2d:8c:41 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >That part is quite clear to me, > > > >From [1] I'm not sure which blueprint introduces the ability to > schedule based on the resource allocation/availability itself, > ("resource-providers-scheduler" seems more like an optimization to the > schedule/DB interaction, right?) > > My understating is that the resource provider blueprint is just a rough > filter of compute nodes before passing them to the scheduler filters. The > existing filters here [6] will do the accurate filtering of resources. > > see [5] > > > > [5] - > http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2016-04-04.log.html#t2016-04-04T16:24:10 > > > [6] - http://docs.openstack.org/developer/nova/filter_scheduler.html > > > Thanks, yes, if those filters can operate on the generic resource pools, then, great, we will just need to write the right filters. > And, that brings me to another point: at the moment of filtering > hosts, nova I guess, will have the neutron port information, it has to > somehow identify if the port is tied to a minimum bandwidth QoS policy. > > > > That would require identifying that the port has a "qos_policy_id" > attached to it, and then, asking neutron for the specific QoS policy [3], > then look out for a minimum bandwidth rule (still to be defined), and > extract the required bandwidth from it. > > I am not sure if that is the correct way to do it, but you can create NIC > bandwidth filter (or NIC capabilities filter) and in it you can implement > the way to retrieve Qos policy information by using neutron client. > That's my concern, that logic would have to live on the nova side, again, and it's tightly couple to the neutron models. I'd be glad to find a way to uncouple nova from that as much as possible. And, even better if we could find a way to avoid the need for nova to retrieve policies as it discovers ports. > > >That moves, again some of the responsibility to examine and understand > external resources to nova. > > > > Could it make sense to make that part pluggable via stevedore?, so we > would provide something that takes the "resource id" (for a port in this > case) and returns the requirements translated to resource classes > (NIC_BW_KB in this case). > > > > > > Best regards, > > Migue
Re: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
From: Miguel Angel Ajo Pelayo [mailto:majop...@redhat.com] Sent: Friday, April 08, 2016 4:17 PM To: OpenStack Development Mailing List (not for usage questions) <openstack-dev@lists.openstack.org> Subject: [openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class Hi!, In the context of [1] (generic resource pools / scheduling in nova) and [2] (minimum bandwidth guarantees -egress- in neutron), I had a talk a few weeks ago with Jay Pipes, The idea was leveraging the generic resource pools and scheduling mechanisms defined in [1] to find the right hosts and track the total available bandwidth per host (and per host "physical network"), something in neutron (still to be defined where) would notify the new API about the total amount of "NIC_BW_KB" available on every host/physnet. I believe that NIC bandwidth can be taken from Libvirt see [4] and the only piece that is missing is to tell nova the mapping of physnet to network interface name. (In case of SR-IOV this is already known) I see bandwidth (speed) as one of many capabilities of NIC, therefore I think we should take all of them in the same way in this case libvirt. I was think of adding a new NIC as new resource to nova. [4] - net_enp129s0_e4_1d_2d_2d_8c_41 /sys/devices/pci:80/:80:01.0/:81:00.0/net/enp129s0 pci__81_00_0 enp129s0 e4:1d:2d:2d:8c:41 That part is quite clear to me, From [1] I'm not sure which blueprint introduces the ability to schedule based on the resource allocation/availability itself, ("resource-providers-scheduler" seems more like an optimization to the schedule/DB interaction, right?) My understating is that the resource provider blueprint is just a rough filter of compute nodes before passing them to the scheduler filters. The existing filters here [6] will do the accurate filtering of resources. see [5] [5] - http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2016-04-04.log.html#t2016-04-04T16:24:10 [6] - http://docs.openstack.org/developer/nova/filter_scheduler.html And, that brings me to another point: at the moment of filtering hosts, nova I guess, will have the neutron port information, it has to somehow identify if the port is tied to a minimum bandwidth QoS policy. That would require identifying that the port has a "qos_policy_id" attached to it, and then, asking neutron for the specific QoS policy [3], then look out for a minimum bandwidth rule (still to be defined), and extract the required bandwidth from it. I am not sure if that is the correct way to do it, but you can create NIC bandwidth filter (or NIC capabilities filter) and in it you can implement the way to retrieve Qos policy information by using neutron client. That moves, again some of the responsibility to examine and understand external resources to nova. Could it make sense to make that part pluggable via stevedore?, so we would provide something that takes the "resource id" (for a port in this case) and returns the requirements translated to resource classes (NIC_BW_KB in this case). Best regards, Miguel Ángel Ajo [1] http://lists.openstack.org/pipermail/openstack-dev/2016-February/086371.html [2] https://bugs.launchpad.net/neutron/+bug/1560963 [3] http://developer.openstack.org/api-ref-networking-v2-ext.html#showPolicy __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [neutron] [nova] scheduling bandwidth resources / NIC_BW_KB resource class
Hi!, In the context of [1] (generic resource pools / scheduling in nova) and [2] (minimum bandwidth guarantees -egress- in neutron), I had a talk a few weeks ago with Jay Pipes, The idea was leveraging the generic resource pools and scheduling mechanisms defined in [1] to find the right hosts and track the total available bandwidth per host (and per host "physical network"), something in neutron (still to be defined where) would notify the new API about the total amount of "NIC_BW_KB" available on every host/physnet. That part is quite clear to me, From [1] I'm not sure which blueprint introduces the ability to schedule based on the resource allocation/availability itself, ("resource-providers-scheduler" seems more like an optimization to the schedule/DB interaction, right?) And, that brings me to another point: at the moment of filtering hosts, nova I guess, will have the neutron port information, it has to somehow identify if the port is tied to a minimum bandwidth QoS policy. That would require identifying that the port has a "qos_policy_id" attached to it, and then, asking neutron for the specific QoS policy [3], then look out for a minimum bandwidth rule (still to be defined), and extract the required bandwidth from it. That moves, again some of the responsibility to examine and understand external resources to nova. Could it make sense to make that part pluggable via stevedore?, so we would provide something that takes the "resource id" (for a port in this case) and returns the requirements translated to resource classes (NIC_BW_KB in this case). Best regards, Miguel Ángel Ajo [1] http://lists.openstack.org/pipermail/openstack-dev/2016-February/086371.html [2] https://bugs.launchpad.net/neutron/+bug/1560963 [3] http://developer.openstack.org/api-ref-networking-v2-ext.html#showPolicy __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev