Re: bind9 startup problems: /var/cache /bind
I tested my suspicion that bind9-resolvconf was somehow implicated in the bind9 start problems by returning bind9-resolvconf to its original, disabled, state and restarting the system. Unfortunately, it didn't help: May 25 19:05:34 barley named[804]: /etc/bind/named.conf.options:2: change directory to '/var/cache/bind' failed: file not found But at least one theory has been eliminated. I also reviewed permissions and ownership of /var/cache/bind and the "directory" directive in named.conf.options for consistency with Debian post-install scripts and packaged files. There weren't any differences. Ross
Re: bind9 startup problems: /var/cache /bind
On Wed, May 22, 2019 at 2:47 PM Richard Hector wrote: > > RequiresMountsFor=/absolute/path/of/mount > > .. to go in the unit file - or IIRC running: > > sudo systemctl edit bind9.service > > ... and putting in: > > ---8< > [Unit] > RequiresMountsFor=/var > ---8< > > ... followed by: > sudo systemctl daemon-reload > Thank you for the very clear instructions. I wish README.Debian for systemd said how you're supposed to handle /etc, since it's somewhat non-standard. I tried it. Unfortunately, it didn't work. After rebooting I still get -- May 22 18:38:49 barley named[829]: loading configuration from '/etc/bind/named.conf' May 22 18:38:49 barley named[829]: /etc/bind/named.conf.options:2: change directory to '/var/cache/bind' failed: file not found May 22 18:38:49 barley named[829]: /etc/bind/named.conf.options:2: parsing failed: file not found May 22 18:38:49 barley named[829]: loading configuration: file not found May 22 18:38:49 barley named[829]: exiting (due to fatal error) Again, when I restart the service it succeeds. As a check to see if my options took, systemctl show bind9 does have the lines (leaving in some resolvconf stuff because of my suspicions): - Requires=sysinit.target -.mount var.mount system.slice Wants=nss-lookup.target bind9-resolvconf.service ConsistsOf=bind9-resolvconf.service Before=bind9-resolvconf.service multi-user.target shutdown.target nss-lookup.target After=basic.target system.slice -.mount systemd-journald.socket sysinit.target var.mount network.target RequiresMountsFor=/var --- Finally, ls -ld /var/cache/bind drwxrwxr-x 2 root bind 4096 May 22 16:50 /var/cache/bind The manual restart of bin began just before 18:50 local time. Ross
Re: bind9 startup problems: /var/cache /bind
On 23/05/19 9:08 AM, Ross Boylan wrote: > /var is a separate file system, and like / it's encrypted, so it might > take a bit of time to activate it. Whether it's available when > needed, I don't know, though the error suggests it might not be. > Could systemd be launching services while some of the mounts (and the > required decryption) are still to be done? > > Is there some systemd way to ensure the file system is mounted before > launching bind? But I'd think if /var weren't available, bind > wouldn't be the only one with a problem. Well, I don't see anything in bind9.service to prevent it starting. I'm not sure what the best dependency is ... A bit of web searching finds: https://unix.stackexchange.com/questions/246935/set-systemd-service-to-execute-after-fstab-mount ... which suggests: RequiresMountsFor=/absolute/path/of/mount .. to go in the unit file - or IIRC running: sudo systemctl edit bind9.service ... and putting in: ---8< [Unit] RequiresMountsFor=/var ---8< ... followed by: sudo systemctl daemon-reload Not tested. Cheers, Richard signature.asc Description: OpenPGP digital signature
Re: bind9 startup problems: /var/cache /bind
/var is a separate file system, and like / it's encrypted, so it might take a bit of time to activate it. Whether it's available when needed, I don't know, though the error suggests it might not be. Could systemd be launching services while some of the mounts (and the required decryption) are still to be done? Is there some systemd way to ensure the file system is mounted before launching bind? But I'd think if /var weren't available, bind wouldn't be the only one with a problem. Ross
Re: bind9 startup problems: /var/cache /bind
On 23/05/19 8:00 AM, Ross Boylan wrote: > At system start, bind9 fails to start on a recently created buster > system. Some of the local bind is based on configuration from an > earlier bind. The logs show > /etc/bind/named.conf.options:2: change directory to '/var/cache/bind' > failed: file not found > > But if I then start it manually via systemctl, it starts. But then I > need to fix up other services which were counting on working name > resolution when they started. Is /var/cache (or /var) a separate filesystem, that might not be mounted in time at boot? Richard signature.asc Description: OpenPGP digital signature
bind9 startup problems: /var/cache /bind
At system start, bind9 fails to start on a recently created buster system. Some of the local bind is based on configuration from an earlier bind. The logs show /etc/bind/named.conf.options:2: change directory to '/var/cache/bind' failed: file not found But if I then start it manually via systemctl, it starts. But then I need to fix up other services which were counting on working name resolution when they started. /var/cache/bind exists (at least right now, with bind running). Somewhat oddly it has a recent timestamp that coincided with a chron.hourly run, but not much other activity I see in the log. I started experiencing this problem when I activated the bind9-resolvconf service, though it's very simple and I don't see how it could matter. Internet search turned up https://serverfault.com/questions/404219/bind-9-8-not-loading-var-cache-bind-failed-file-not-found with the response "make the directory". Except I have the directory. Also, README.Debian says "The working directory for named is now /var/cache/bind." So it seems like something the package should have created on installation, or at least dynamically as it starts. Double-checked that apparmor seems to have entries that match. Unless the trailing slash is a problem? /var/cache/bind/** lrw, /var/cache/bind/ rw, That is, the program is trying to open /var/cache/bind, but the pattern is /var/cache/bind/. Of course, if it were an apparmor problem then my later restarts would have failed too, and they didn't. Some kind of race condition? The bind9 daemon is running as the bind user. Ideas? Thanks. Ross Boylan