Andres did the testing of the changes and has logs to prove the improvement.
On Tue, Feb 6, 2018 at 4:43 PM, Jason Hobbs <[email protected]> wrote: > Blake, that's great. Do you have before and after numbers showing the > improvement this change made? > > Do you have any data or logs that led you to believe this was the > culprit in the slow responses I saw on my cluster? > > On Tue, Feb 6, 2018 at 3:12 PM, Blake Rouse <[email protected]> > wrote: > > Actually caching does make a difference. That method is not just caching > > the reading of a file, it caches the searching of the file based on the > > purpose, the reading of that file from disk (sure can be in kernel > > cache), the parsing of the template by tempita. > > > > All of that is redudant work that is being done on every single request. > > Searching the filesystem and reading the file from cache is all syscalls > > even if they come from the kernel cache. Since MAAS is async based that > > means that coroutine will be placed on hold while we wait for the result > > to be loaded from the kernel into the memory of the process. That gives > > other coroutines time to do other things, which means that coroutine > > doesn't get to execute until others are done or blocked by there own > > async request. > > > > Caching this information can greatly improve that by not requiring the > > coroutine to be pushed back into the eventloop while it is waiting for > > data from the kernel and without this change when the data comes back it > > still has to be processed by tempita which will take time and block the > > eventloop from completing other work. > > > > So its not simply that we should use the kernel to cache reads from the > > disk there is a lot more involved here. We have noticed improvements > > with this change on systems that are being ran with large number of VM's > > because of the reduction of IO. > > > > -- > > You received this bug notification because you are subscribed to the bug > > report. > > https://bugs.launchpad.net/bugs/1743249 > > > > Title: > > Failed Deployment after timeout trying to retrieve grub cfg > > > > Status in MAAS: > > New > > Status in grub2 package in Ubuntu: > > Fix Released > > > > Bug description: > > A node failed to deploy after it failed to retrieve a grub.cfg from > > MAAS due to a timeout. In the logs, it's clear that the server tried > > to retrieve the grub cfg many times, over about 30 seconds: > > > > http://paste.ubuntu.com/26387256/ > > > > We see the same thing for other hosts around the same time: > > > > http://paste.ubuntu.com/26387262/ > > > > It seems like MAAS is taking way too long to respond to these > > requests. > > > > This is very similar to bug 1724677, which was happening pre- > > metldown/spectre. The only difference is we don't see "[critical] TFTP > > back-end failed" in the logs anymore. > > > > I connected to the console on this system and it had errors about > > timing out retrieving the grub-cfg, then it had an error message along > > the lines of "error not an ip" and then "double free". After I > > connected but before I could get a screenshot the system rebooted and > > was directed by maas to power off, which it did successfully after > > booting to linux. > > > > Full logs are available here: > > https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa- > > ed277a020e7c/cpe_cloud_395/infra-logs.tar > > > > This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1. > > > > To manage notifications about this bug go to: > > https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions > > -- > You received this bug notification because you are subscribed to MAAS. > https://bugs.launchpad.net/bugs/1743249 > > Title: > Failed Deployment after timeout trying to retrieve grub cfg > > Status in MAAS: > New > Status in grub2 package in Ubuntu: > Fix Released > > Bug description: > A node failed to deploy after it failed to retrieve a grub.cfg from > MAAS due to a timeout. In the logs, it's clear that the server tried > to retrieve the grub cfg many times, over about 30 seconds: > > http://paste.ubuntu.com/26387256/ > > We see the same thing for other hosts around the same time: > > http://paste.ubuntu.com/26387262/ > > It seems like MAAS is taking way too long to respond to these > requests. > > This is very similar to bug 1724677, which was happening pre- > metldown/spectre. The only difference is we don't see "[critical] TFTP > back-end failed" in the logs anymore. > > I connected to the console on this system and it had errors about > timing out retrieving the grub-cfg, then it had an error message along > the lines of "error not an ip" and then "double free". After I > connected but before I could get a screenshot the system rebooted and > was directed by maas to power off, which it did successfully after > booting to linux. > > Full logs are available here: > https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa- > ed277a020e7c/cpe_cloud_395/infra-logs.tar > > This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1. > > To manage notifications about this bug go to: > https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions > -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1743249 Title: Failed Deployment after timeout trying to retrieve grub cfg To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
