Re: bind9 startup problems: /var/cache /bind

2019-05-25 Thread Ross Boylan
I tested my suspicion that bind9-resolvconf was somehow implicated in
the bind9 start problems by returning bind9-resolvconf to its
original, disabled, state and restarting the system.  Unfortunately,
it didn't help:
May 25 19:05:34 barley named[804]: /etc/bind/named.conf.options:2:
change directory to '/var/cache/bind' failed: file not found

But at least one theory has been eliminated.

I also reviewed permissions and ownership of /var/cache/bind and the
"directory" directive in named.conf.options for consistency with
Debian post-install scripts and packaged files.  There weren't any
differences.

Ross



Re: bind9 startup problems: /var/cache /bind

2019-05-22 Thread Ross Boylan
On Wed, May 22, 2019 at 2:47 PM Richard Hector  wrote:
>
> RequiresMountsFor=/absolute/path/of/mount
>
> .. to go in the unit file - or IIRC running:
>
> sudo systemctl edit bind9.service
>
> ... and putting in:
>
> ---8<
> [Unit]
> RequiresMountsFor=/var
> ---8<
>
> ... followed by:
> sudo systemctl daemon-reload
>
Thank you for the very clear instructions.  I wish README.Debian for
systemd said how you're supposed to handle /etc, since it's somewhat
non-standard.

I tried it.  Unfortunately, it didn't work.  After rebooting I still get
--
May 22 18:38:49 barley named[829]: loading configuration from
'/etc/bind/named.conf'
May 22 18:38:49 barley named[829]: /etc/bind/named.conf.options:2:
change directory to '/var/cache/bind' failed: file not found
May 22 18:38:49 barley named[829]: /etc/bind/named.conf.options:2:
parsing failed: file not found
May 22 18:38:49 barley named[829]: loading configuration: file not found
May 22 18:38:49 barley named[829]: exiting (due to fatal error)

Again, when I restart the service it succeeds.
As a check to see if my options took, systemctl show bind9 does have
the lines (leaving in some resolvconf stuff because of my suspicions):
-
Requires=sysinit.target -.mount var.mount system.slice
Wants=nss-lookup.target bind9-resolvconf.service
ConsistsOf=bind9-resolvconf.service
Before=bind9-resolvconf.service multi-user.target shutdown.target
nss-lookup.target
After=basic.target system.slice -.mount systemd-journald.socket
sysinit.target var.mount network.target
RequiresMountsFor=/var
---

Finally, ls -ld /var/cache/bind
drwxrwxr-x 2 root bind 4096 May 22 16:50 /var/cache/bind
The manual restart of bin began just before 18:50 local time.

Ross



Re: bind9 startup problems: /var/cache /bind

2019-05-22 Thread Richard Hector
On 23/05/19 9:08 AM, Ross Boylan wrote:
> /var is a separate file system, and like / it's encrypted, so it might
> take a bit of time to activate it.  Whether it's available when
> needed, I don't know, though the error suggests it might not be.
> Could systemd be launching services while some of the mounts (and the
> required decryption) are still to be done?
> 
> Is there some systemd way to ensure the file system is mounted before
> launching bind?  But I'd think if /var weren't available, bind
> wouldn't be the only one with a problem.

Well, I don't see anything in bind9.service to prevent it starting. I'm
not sure what the best dependency is ...

A bit of web searching finds:
https://unix.stackexchange.com/questions/246935/set-systemd-service-to-execute-after-fstab-mount

... which suggests:
RequiresMountsFor=/absolute/path/of/mount

.. to go in the unit file - or IIRC running:

sudo systemctl edit bind9.service

... and putting in:

---8<
[Unit]
RequiresMountsFor=/var
---8<

... followed by:
sudo systemctl daemon-reload

Not tested.

Cheers,
Richard



signature.asc
Description: OpenPGP digital signature


Re: bind9 startup problems: /var/cache /bind

2019-05-22 Thread Ross Boylan
/var is a separate file system, and like / it's encrypted, so it might
take a bit of time to activate it.  Whether it's available when
needed, I don't know, though the error suggests it might not be.
Could systemd be launching services while some of the mounts (and the
required decryption) are still to be done?

Is there some systemd way to ensure the file system is mounted before
launching bind?  But I'd think if /var weren't available, bind
wouldn't be the only one with a problem.

Ross



Re: bind9 startup problems: /var/cache /bind

2019-05-22 Thread Richard Hector
On 23/05/19 8:00 AM, Ross Boylan wrote:
> At system start, bind9 fails to start on a recently created buster
> system.  Some of the local bind is based on configuration from an
> earlier bind.  The logs show
> /etc/bind/named.conf.options:2: change directory to '/var/cache/bind'
> failed: file not found
> 
> But if I then start it manually via systemctl, it starts.  But then I
> need to fix up other services which were counting on working name
> resolution when they started.

Is /var/cache (or /var) a separate filesystem, that might not be mounted
in time at boot?

Richard



signature.asc
Description: OpenPGP digital signature


bind9 startup problems: /var/cache /bind

2019-05-22 Thread Ross Boylan
At system start, bind9 fails to start on a recently created buster
system.  Some of the local bind is based on configuration from an
earlier bind.  The logs show
/etc/bind/named.conf.options:2: change directory to '/var/cache/bind'
failed: file not found

But if I then start it manually via systemctl, it starts.  But then I
need to fix up other services which were counting on working name
resolution when they started.

/var/cache/bind exists (at least right now, with bind running).
Somewhat oddly it has a recent timestamp that coincided with a
chron.hourly run, but not much other activity I see in the log.

I started experiencing this problem when I activated the
bind9-resolvconf service, though it's very simple and I don't see how
it could matter.

Internet search turned up
https://serverfault.com/questions/404219/bind-9-8-not-loading-var-cache-bind-failed-file-not-found
with the response "make the directory".  Except I have the directory.
Also, README.Debian says "The working directory for named is now
/var/cache/bind." So it seems like something the package should have
created on installation, or at least dynamically as it starts.

Double-checked that apparmor seems to have entries that match.  Unless
the trailing slash  is a problem?
  /var/cache/bind/** lrw,
  /var/cache/bind/ rw,
That is, the program is trying to open /var/cache/bind, but the
pattern is /var/cache/bind/.
Of course, if it were an apparmor problem then my later restarts would
have failed too, and they didn't.

Some kind of race condition?

The bind9 daemon is running as the bind user.

Ideas?

Thanks.
Ross Boylan