Great writeup @Mathieu and thanks @sean and @jrolls! -d
On Mon, Aug 29, 2016 at 3:34 PM, Mathieu Gagné <mga...@calavera.ca> wrote: > Hi, > > For those that attended the OpenStack Ops meetup, you probably heard > me complaining about a serious performance issue we had with Nova > scheduler (Kilo) with Ironic. > > Thanks to Sean Dague and Matt Riedemann, we found the root cause. > > It was caused by this block of code [1] which is hitting the database > for each node loaded by the scheduler. This block of code is called if > no instance info is found in the scheduler cache. > > I found that this instance info is only populated if the > scheduler_tracks_instance_changes config [2] is enabled which it is by > default. But being a good operator (wink wink), I followed the Ironic > install guide which recommends disabling it [3], unknowingly getting > myself into deep troubles. > > There isn't much information about the purpose of this config in the > kilo branch. Fortunately, you can find more info in the master branch > [4], thanks to the config documentation effort. This instance info > cache is used by filters which rely on instance location to perform > affinity/anti-affinity placement or anything that cares about the > instances running on the destination node. > > Enabling this option will make it so Nova scheduler loads instance > info asynchronously at start up. Depending on the number of > hypervisors and instances, it can take several minutes. (we are > talking about 10-15 minutes with 600+ Ironic nodes, or ~1s per node in > our case) > > So Jim Roll jumped into the discussion on IRC and found a bug [5] he > opened and fixed in Liberty. It makes it so Nova scheduler never > populates the instance info cache if Ironic host manager is loaded. > For those running Nova with Ironic, you will agree that there is no > known use case where affinity/anti-affinity is used. (please reply if > you know of one) > > To summarize, the poor performance of Nova scheduler will only show if > you are running the Kilo version of Nova and you disable > scheduler_tracks_instance_changes which might be the case if you are > running Ironic too. > > For those curious about our Nova scheduler + Ironic setup, we have > done the following to get nova scheduler to ludicrous speed: > > 1) Use CachingScheduler > > There was a great talk at the OpenStack Summit about why you would > want to use it. [6] > > By default, the Nova scheduler will load ALL nodes (hypervisors) from > database to memory before each scheduling. If you have A LOT of > hypervisors, this process can take a while. This means scheduling > won't happen until this step is completed. It could also mean that > scheduling will always fail if you don't tweak service_down_time (see > 3 below) if you have lot of hypervisors. > > This driver will make it so nodes (hypervisors) are loaded in memory > every ~60 seconds. Since information is now pre-cached, the scheduling > process can happen right away, it is super fast. > > There is a lot of side-effects to using it though. For example: > - you can only run ONE nova-scheduler process since cache state won't > be shared between processes and you don't want instances to be > scheduled twice to the same node/hypervisor. > - It can take ~1m before new capacity is recognized by the scheduler. > (new or freed nodes) The cache is refreshed every 60 seconds with a > periodic task. (this can be changed with scheduler_driver_task_period) > > In the context of Ironic, it is a compromise we are willing to accept. > We are not adding Ironic nodes that often and nodes aren't > created/deleting as often as virtual machines. > > 2) Run a single nova-compute service > > I strongly suggest you DO NOT run multiple nova-compute services. If > you do, you will have duplicated hypervisors loaded by the scheduler > and you could end up with conflicting scheduling. You will also have > twice as much hypervisors to load in the scheduler. > > Note: I heard about multiple compute host support in Nova for Ironic > with use of an hash ring but I don't have much details about it. So > this recommendation might not apply to you if you are using a recent > version of Nova. > > 3) Increase service_down_time > > If you have a lot of nodes, you might have to increase this value > which is set to 60 seconds by default. This value is used by the > ComputeFilter filter to exclude nodes it hasn't heard from. If it > takes more than 60 seconds to list the list of nodes, you might guess > what we will happen, the scheduler will reject all of them since node > info is already outdated when it finally hits the filtering steps. I > strongly suggest you tweak this setting, regardless of the use of > CachingScheduler. > > 4) Tweak scheduler to only load empty nodes/hypervisors > > So this is a hack [7] we did before finding out about the bug [5] we > described and identified earlier. When investigating our performance > issue, we enabled debug logging and saw that periodic task was taking > forever to complete (10-15m) with CachingScheduler driver. > > We knew (strongly suspected) Nova scheduler was spending a huge amount > of time loading nodes/hypervisors. We (unfortunately) didn't push > further our investigation and jumped right away to optimization phase. > > So we came up with the idea of only loading empty nodes/hypervisors. > Remember, we are still in the context of Ironic, not cloud and virtual > machines. So it made perfect sense for us to stop spending time > loading nodes/hypervisors we would discard anyway. > > Thanks to all that help us debugging our scheduling performance > issues, it is now crazy fast. =) > > [1] https://github.com/openstack/nova/blob/kilo-eol/nova/ > scheduler/host_manager.py#L589-L592 > [2] https://github.com/openstack/nova/blob/kilo-eol/nova/ > scheduler/host_manager.py#L65-L68 > [3] http://docs.openstack.org/developer/ironic/deploy/ > install-guide.html#configure-compute-to-use-the-bare-metal-service > [4] https://github.com/openstack/nova/blob/282c257aff6b53a1b6bb4b4b034a67 > 0c450d19d8/nova/conf/scheduler.py#L166-L185 > [5] https://bugs.launchpad.net/nova/+bug/1479124 > [6] https://www.youtube.com/watch?v=BcHyiOdme2s > [7] https://gist.github.com/mgagne/1fbeca4c0b60af73f019bc2e21eb4a80 > > -- > Mathieu > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators