Re: [openstack-dev] [nova] Resource tracker
On Tue, Oct 7, 2014 at 10:56 AM, Vishvananda Ishaya wrote: > > On Oct 7, 2014, at 6:21 AM, Daniel P. Berrange > wrote: > > > On Mon, Oct 06, 2014 at 02:55:20PM -0700, Joe Gordon wrote: > >> On Mon, Oct 6, 2014 at 6:03 AM, Gary Kotton wrote: > >> > >>> Hi, > >>> At the moment the resource tracker in Nova ignores that statistics that > >>> are returned by the hypervisor and it calculates the values on its > own. Not > >>> only is this highly error prone but it is also very costly – all of the > >>> resources on the host are read from the database. Not only the fact > that we > >>> are doing something very costly is troubling, the fact that we are over > >>> calculating resources used by the hypervisor is also an issue. In my > >>> opinion this leads us to not fully utilize hosts at our disposal. I > have a > >>> number of concerns with this approach and would like to know why we > are not > >>> using the actual resource reported by the hypervisor. > >>> The reason for asking this is that I have added a patch which uses the > >>> actual hypervisor resources returned and it lead to a discussion on the > >>> particular review (https://review.openstack.org/126237). > >>> > >> > >> So it sounds like you have mentioned two concerns here: > >> > >> 1. The current method to calculate hypervisor usage is expensive in > terms > >> of database access. > >> 2. Nova ignores that statistics that are returned by the hypervisor and > >> uses its own calculations. > >> > >> > >> To #1, maybe we can doing something better, optimize the query, cache > the > >> result etc. As for #2 nova intentionally doesn't use the hypervisor > >> statistics for a few reasons: > >> > >> * Make scheduling more deterministic, make it easier to reproduce issues > >> etc. > >> * Things like memory ballooning and thin provisioning in general, mean > that > >> the hypervisor is not reporting how much of the resources can be > allocated > >> but rather how much are currently in use (This behavior can vary from > >> hypervisor to hypervisor today AFAIK -- which makes things confusing). > So > >> if I don't want to over subscribe RAM, and the hypervisor is using > memory > >> ballooning, the hypervisor statistics are mostly useless. I am sure > there > >> are more complex schemes that we can come up with that allow us to > factor > >> in the properties of thin provisioning, but is the extra complexity > worth > >> it? > > > > That is just an example of problems with the way Nova virt drivers > > /currently/ report usage to the schedular. It is easily within the > > realm of possibility for the virt drivers to be changed so that they > > report stats which take into account things like ballooning and thin > > provisioning so that we don't oversubscribe. Ignoring the hypervisor > > stats entirely and re-doing the calculations in the resource tracker > > code is just a crude workaround really. It is just swapping one set > > of problems for a new set of problems. > I agree, lets make reported hypervisor stats actually useful for scheduling. This would mean we can have fewer config options (currently the operator has to set aside resources for the underlying OS via a config option). > > +1 lets make the hypervisors report detailed enough information that we > can do it without having to recalculate. > Do we have any idea of how expensive recalculating this information is? > > Vish > > > > >> That being said I am fine with discussing in a spec the idea of adding > an > >> option to use the hypervisor reported statistics, as long as it is off > by > >> default. > > > > I'm against the idea of adding config options to switch between multiple > > codepaths because it is just punting the problem to the admins who are > > in an even worse position to decide what is best. It is saying would you > > rather your cloud have bug A or have bug B. We should be fixing the data > > the hypervisors report so that the resource tracker doesn't have to > ignore > > them, and give the admins something which just works and avoid having to > > choose between 2 differently broken options. > > > > > > Regards, > > Daniel > > -- > > |: http://berrange.com -o- > http://www.flickr.com/photos/dberrange/ :| > > |: http://libvirt.org -o- > http://virt-manager.org :| > > |: http://autobuild.org -o- > http://search.cpan.org/~danberr/ :| > > |: http://entangle-photo.org -o- > http://live.gnome.org/gtk-vnc :| > > > > ___ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/ope
Re: [openstack-dev] [nova] Resource tracker
On Oct 7, 2014, at 6:21 AM, Daniel P. Berrange wrote: > On Mon, Oct 06, 2014 at 02:55:20PM -0700, Joe Gordon wrote: >> On Mon, Oct 6, 2014 at 6:03 AM, Gary Kotton wrote: >> >>> Hi, >>> At the moment the resource tracker in Nova ignores that statistics that >>> are returned by the hypervisor and it calculates the values on its own. Not >>> only is this highly error prone but it is also very costly – all of the >>> resources on the host are read from the database. Not only the fact that we >>> are doing something very costly is troubling, the fact that we are over >>> calculating resources used by the hypervisor is also an issue. In my >>> opinion this leads us to not fully utilize hosts at our disposal. I have a >>> number of concerns with this approach and would like to know why we are not >>> using the actual resource reported by the hypervisor. >>> The reason for asking this is that I have added a patch which uses the >>> actual hypervisor resources returned and it lead to a discussion on the >>> particular review (https://review.openstack.org/126237). >>> >> >> So it sounds like you have mentioned two concerns here: >> >> 1. The current method to calculate hypervisor usage is expensive in terms >> of database access. >> 2. Nova ignores that statistics that are returned by the hypervisor and >> uses its own calculations. >> >> >> To #1, maybe we can doing something better, optimize the query, cache the >> result etc. As for #2 nova intentionally doesn't use the hypervisor >> statistics for a few reasons: >> >> * Make scheduling more deterministic, make it easier to reproduce issues >> etc. >> * Things like memory ballooning and thin provisioning in general, mean that >> the hypervisor is not reporting how much of the resources can be allocated >> but rather how much are currently in use (This behavior can vary from >> hypervisor to hypervisor today AFAIK -- which makes things confusing). So >> if I don't want to over subscribe RAM, and the hypervisor is using memory >> ballooning, the hypervisor statistics are mostly useless. I am sure there >> are more complex schemes that we can come up with that allow us to factor >> in the properties of thin provisioning, but is the extra complexity worth >> it? > > That is just an example of problems with the way Nova virt drivers > /currently/ report usage to the schedular. It is easily within the > realm of possibility for the virt drivers to be changed so that they > report stats which take into account things like ballooning and thin > provisioning so that we don't oversubscribe. Ignoring the hypervisor > stats entirely and re-doing the calculations in the resource tracker > code is just a crude workaround really. It is just swapping one set > of problems for a new set of problems. +1 lets make the hypervisors report detailed enough information that we can do it without having to recalculate. Vish > >> That being said I am fine with discussing in a spec the idea of adding an >> option to use the hypervisor reported statistics, as long as it is off by >> default. > > I'm against the idea of adding config options to switch between multiple > codepaths because it is just punting the problem to the admins who are > in an even worse position to decide what is best. It is saying would you > rather your cloud have bug A or have bug B. We should be fixing the data > the hypervisors report so that the resource tracker doesn't have to ignore > them, and give the admins something which just works and avoid having to > choose between 2 differently broken options. > > > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev signature.asc Description: Message signed with OpenPGP using GPGMail ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Resource tracker
On Mon, Oct 06, 2014 at 02:55:20PM -0700, Joe Gordon wrote: > On Mon, Oct 6, 2014 at 6:03 AM, Gary Kotton wrote: > > > Hi, > > At the moment the resource tracker in Nova ignores that statistics that > > are returned by the hypervisor and it calculates the values on its own. Not > > only is this highly error prone but it is also very costly – all of the > > resources on the host are read from the database. Not only the fact that we > > are doing something very costly is troubling, the fact that we are over > > calculating resources used by the hypervisor is also an issue. In my > > opinion this leads us to not fully utilize hosts at our disposal. I have a > > number of concerns with this approach and would like to know why we are not > > using the actual resource reported by the hypervisor. > > The reason for asking this is that I have added a patch which uses the > > actual hypervisor resources returned and it lead to a discussion on the > > particular review (https://review.openstack.org/126237). > > > > So it sounds like you have mentioned two concerns here: > > 1. The current method to calculate hypervisor usage is expensive in terms > of database access. > 2. Nova ignores that statistics that are returned by the hypervisor and > uses its own calculations. > > > To #1, maybe we can doing something better, optimize the query, cache the > result etc. As for #2 nova intentionally doesn't use the hypervisor > statistics for a few reasons: > > * Make scheduling more deterministic, make it easier to reproduce issues > etc. > * Things like memory ballooning and thin provisioning in general, mean that > the hypervisor is not reporting how much of the resources can be allocated > but rather how much are currently in use (This behavior can vary from > hypervisor to hypervisor today AFAIK -- which makes things confusing). So > if I don't want to over subscribe RAM, and the hypervisor is using memory > ballooning, the hypervisor statistics are mostly useless. I am sure there > are more complex schemes that we can come up with that allow us to factor > in the properties of thin provisioning, but is the extra complexity worth > it? That is just an example of problems with the way Nova virt drivers /currently/ report usage to the schedular. It is easily within the realm of possibility for the virt drivers to be changed so that they report stats which take into account things like ballooning and thin provisioning so that we don't oversubscribe. Ignoring the hypervisor stats entirely and re-doing the calculations in the resource tracker code is just a crude workaround really. It is just swapping one set of problems for a new set of problems. > That being said I am fine with discussing in a spec the idea of adding an > option to use the hypervisor reported statistics, as long as it is off by > default. I'm against the idea of adding config options to switch between multiple codepaths because it is just punting the problem to the admins who are in an even worse position to decide what is best. It is saying would you rather your cloud have bug A or have bug B. We should be fixing the data the hypervisors report so that the resource tracker doesn't have to ignore them, and give the admins something which just works and avoid having to choose between 2 differently broken options. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Resource tracker
On Mon, Oct 6, 2014 at 6:03 AM, Gary Kotton wrote: > Hi, > At the moment the resource tracker in Nova ignores that statistics that > are returned by the hypervisor and it calculates the values on its own. Not > only is this highly error prone but it is also very costly – all of the > resources on the host are read from the database. Not only the fact that we > are doing something very costly is troubling, the fact that we are over > calculating resources used by the hypervisor is also an issue. In my > opinion this leads us to not fully utilize hosts at our disposal. I have a > number of concerns with this approach and would like to know why we are not > using the actual resource reported by the hypervisor. > The reason for asking this is that I have added a patch which uses the > actual hypervisor resources returned and it lead to a discussion on the > particular review (https://review.openstack.org/126237). > So it sounds like you have mentioned two concerns here: 1. The current method to calculate hypervisor usage is expensive in terms of database access. 2. Nova ignores that statistics that are returned by the hypervisor and uses its own calculations. To #1, maybe we can doing something better, optimize the query, cache the result etc. As for #2 nova intentionally doesn't use the hypervisor statistics for a few reasons: * Make scheduling more deterministic, make it easier to reproduce issues etc. * Things like memory ballooning and thin provisioning in general, mean that the hypervisor is not reporting how much of the resources can be allocated but rather how much are currently in use (This behavior can vary from hypervisor to hypervisor today AFAIK -- which makes things confusing). So if I don't want to over subscribe RAM, and the hypervisor is using memory ballooning, the hypervisor statistics are mostly useless. I am sure there are more complex schemes that we can come up with that allow us to factor in the properties of thin provisioning, but is the extra complexity worth it? That being said I am fine with discussing in a spec the idea of adding an option to use the hypervisor reported statistics, as long as it is off by default. > Thanks > Gary > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Resource tracker
On Mon, Oct 06, 2014 at 01:03:01PM +, Gary Kotton wrote: > Hi, > At the moment the resource tracker in Nova ignores that statistics that > are returned by the hypervisor and it calculates the values on its own. > Not only is this highly error prone but it is also very costly - all of > the resources on the host are read from the database. Not only the fact > that we are doing something very costly is troubling, the fact that we > are over calculating resources used by the hypervisor is also an issue. > In my opinion this leads us to not fully utilize hosts at our disposal. > I have a number of concerns with this approach and would like to know > why we are not using the actual resource reported by the hypervisor. > The reason for asking this is that I have added a patch which uses the > actual hypervisor resources returned and it lead to a discussion on the > particular review (https://review.openstack.org/126237). If i'm interpreting git history correctly, this behaviour was (re-)introduced in this commit: commit 8e851409f3a8a345ec954a880c81232fbf9e27b4 Author: Brian Elliott Date: Fri Sep 14 15:17:07 2012 + Fix bugs in resource tracker and cleanup Fixes bugs in resource tracker: * Handle disk oversubscription * Handle suspended/powered off instances The usage model is changed to the old style that is based on actual instance usage on a compute host. (Not the current point in time of the hypervisor's reported host stats) There is now a 'limits' filter property that can be passed from the scheduler to the compute node to indicate that oversubscription of resources is desired: The 'limits' filter property is a dict with the following possible keys: * memory_mb - Specifies the memory ceiling for the compute node. * disk_gb - Specifies the disk space ceiling for the compute node. * vcpu - Specifies the max number of vcpus for the compute node. There is also some general cleanup and additional unit tests in an attempt to simplify down this function. bug 1048842 bug 1052157 Change-Id: I6ee851b8c03234a78a64d9f5c494dfc7059cdda4 Unfortunately that commit message isn't very informative as to why this change was made, and the bugs don't seem to have any real detail either. Perhaps Brian remembers himself ? Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev