@Jason, The pcap exactly shows the behavior I was hoping to see, which is grub tries to get X config first, and since it didn't get a response, it moves on and tries to get Y config.
On Mon, Feb 5, 2018 at 4:45 PM, Jason Hobbs <[email protected]> wrote: > On Mon, Feb 5, 2018 at 3:27 PM, Andres Rodriguez > <[email protected]> wrote: > > @Steve, > > > > MAAS already has a mechanism to collapse retries into the initial > request. > > In this case, it is the rack that grabs the requests and makes a request > to > > the region. If retries come within the time that the rack is waiting for > a > > response from the region, these request get "ignored" and the Rack will > > only answer the first request. This is what the logs show after testing > > with fixed grub, where grub makes multiple requests and MAAS answers > > seconds after does requests, but only answers once. This is because the > > requests were collapsed on the maas side. > > > > If, however, the retries come in after the region has answered the rack, > > they these requests will be served. > > This is not true. MAAS is responding to every single request grub > makes for the file - the tcpdump logs show it. And these are not > "read 4 times" requests - they are retries because grub didn't get a > response. > > This pcap shows MAAS responding to every request for grub.cfg-<mac>: > https://bugs.launchpad.net/maas/+bug/1743249/+attachment/ > 5046952/+files/spearow-fall-back-to-default-amd64.pcap > > Jason > > > > > On Mon, Feb 5, 2018 at 2:34 PM, Steve Langasek < > [email protected] > >> wrote: > > > >> Jason's feedback was that, after making the changes to the storage > >> configuration of his environment, deploying the test grubx64.efi doesn't > >> have any effect on the MAAS server's response time to tftp requests. So > >> at this point it's not at all clear that the grub change, while correct, > >> helps with this high-level symptom. > >> > >> It has also been suggested that each udp retry is generating a separate > >> database query from MAAS. That is absolutely a MAAS bug if true, and > >> not something that can or should be fixed in GRUB. > >> > >> ** Changed in: grub2 (Ubuntu) > >> Importance: Critical => Medium > >> > >> -- > >> You received this bug notification because you are subscribed to MAAS. > >> https://bugs.launchpad.net/bugs/1743249 > >> > >> Title: > >> Failed Deployment after timeout trying to retrieve grub cfg > >> > >> To manage notifications about this bug go to: > >> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions > >> > >> Launchpad-Notification-Type: bug > >> Launchpad-Bug: product=maas; milestone=2.4.x; status=Incomplete; > >> importance=Undecided; assignee=None; > >> Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main; > >> status=In Progress; importance=Medium; [email protected]; > >> Launchpad-Bug-Tags: cdo-qa cdo-qa-blocker foundations-engine patch > >> Launchpad-Bug-Information-Type: Public > >> Launchpad-Bug-Private: no > >> Launchpad-Bug-Security-Vulnerability: no > >> Launchpad-Bug-Commenters: andreserl blake-rouse cgregan jason-hobbs > vorlon > >> Launchpad-Bug-Reporter: Jason Hobbs (jason-hobbs) > >> Launchpad-Bug-Modifier: Steve Langasek (vorlon) > >> Launchpad-Message-Rationale: Subscriber (MAAS) > >> Launchpad-Message-For: andreserl > >> > > > > > > -- > > Andres Rodriguez (RoAkSoAx) > > Ubuntu Server Developer > > MSc. Telecom & Networking > > Systems Engineer > > > > -- > > You received this bug notification because you are subscribed to the bug > > report. > > https://bugs.launchpad.net/bugs/1743249 > > > > Title: > > Failed Deployment after timeout trying to retrieve grub cfg > > > > Status in MAAS: > > New > > Status in grub2 package in Ubuntu: > > In Progress > > > > Bug description: > > A node failed to deploy after it failed to retrieve a grub.cfg from > > MAAS due to a timeout. In the logs, it's clear that the server tried > > to retrieve the grub cfg many times, over about 30 seconds: > > > > http://paste.ubuntu.com/26387256/ > > > > We see the same thing for other hosts around the same time: > > > > http://paste.ubuntu.com/26387262/ > > > > It seems like MAAS is taking way too long to respond to these > > requests. > > > > This is very similar to bug 1724677, which was happening pre- > > metldown/spectre. The only difference is we don't see "[critical] TFTP > > back-end failed" in the logs anymore. > > > > I connected to the console on this system and it had errors about > > timing out retrieving the grub-cfg, then it had an error message along > > the lines of "error not an ip" and then "double free". After I > > connected but before I could get a screenshot the system rebooted and > > was directed by maas to power off, which it did successfully after > > booting to linux. > > > > Full logs are available here: > > https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa- > > ed277a020e7c/cpe_cloud_395/infra-logs.tar > > > > This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1. > > > > To manage notifications about this bug go to: > > https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions > > -- > You received this bug notification because you are subscribed to MAAS. > https://bugs.launchpad.net/bugs/1743249 > > Title: > Failed Deployment after timeout trying to retrieve grub cfg > > To manage notifications about this bug go to: > https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions > > Launchpad-Notification-Type: bug > Launchpad-Bug: product=maas; milestone=2.4.x; status=New; > importance=Undecided; assignee=None; > Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main; > status=In Progress; importance=Medium; [email protected]; > Launchpad-Bug-Tags: cdo-qa cdo-qa-blocker foundations-engine patch > Launchpad-Bug-Information-Type: Public > Launchpad-Bug-Private: no > Launchpad-Bug-Security-Vulnerability: no > Launchpad-Bug-Commenters: andreserl blake-rouse cgregan jason-hobbs vorlon > Launchpad-Bug-Reporter: Jason Hobbs (jason-hobbs) > Launchpad-Bug-Modifier: Jason Hobbs (jason-hobbs) > Launchpad-Message-Rationale: Subscriber (MAAS) > Launchpad-Message-For: andreserl > -- Andres Rodriguez (RoAkSoAx) Ubuntu Server Developer MSc. Telecom & Networking Systems Engineer -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1743249 Title: Failed Deployment after timeout trying to retrieve grub cfg To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
