OK - I was able to repro again, and this time with MAAS 2.6. Here are the steps
PREP WORK 1) Have 50 machines in Ready state with one interface enabled configured as 'Autoassign' to Default VLAN PXE subnet (auto assign so that every deploy/release causes MAAS to reload DNS) 2) Clear out any DNS entries in the PXE subnet (this forces nodes to send DNS queries to MAAS) 3) Settings-> Network Services -> DNS -> Upstream DNS -> enter valid upstream DNS IP 4) Settings-> Network Services -> DNS -> DNSSEC -> Automatic (for some reason this breaks Upstream DNS) 5) Verify that Upstream DNS is broken a) Rescue Mode one machine b) ssh to Rescue machine c) dig www.google.com d) (dig should timeout/fail) e) MAAS->Settings-> Network Services -> DNS -> DNSSEC -> Disable f) dig www.google.com g) (dig should succeed) h) MAAS->Settings-> Network Services -> DNS -> DNSSEC -> Automatic i) Release Rescue machine REPRO 1) run repro.py (attached, WARNING this code will use all machines available to MAAS) 2) wait up to 3 hours, checking if bind9 is hung by regularly running `sudo rndc status` on MAAS monitoring steps (optional) (See DNS Query activity) in one ssh window to Maas run sudo tcpdump dst <your-rack-controller-ip> -i ens3 and dst port 53 (See DNS reloads, and why) in another ssh window to Maas run sudo tail -f /var/log/maas/regiond.log |grep Reloaded -A 3 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1710278 Title: [2.3a1] named stuck on reload, DNS broken To manage notifications about this bug go to: https://bugs.launchpad.net/bind/+bug/1710278/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
