Re: [Dnsmasq-discuss] [PATCH] Refuse to start with EADDRINUSE in --bind-dynamic mode

Petr Menšík Sat, 25 Nov 2023 09:46:48 -0800

Yes, the problem is 3) has a condition we wait until it changes thenretry. But for a lot (most?) of errors we lack any indication from thesystem it has changed.

For example insufficient memory or insufficient file descriptors. It maychange, but unlike watching up and down interfaces, there is no hookwhich would retry listener creation. It fails once and then just mayberetries on explicit reload. That is why I think it is absolutelynecessary to log any failure we pass somewhere, unless we know we willretry later.


More below...

On 11/23/23 13:47, Simon Kelley wrote:

That's a good point, but I don't think there needs to be any non-fatalerror logging. There are three situations during startup.
1) bind() succeeds.
2) bind fails for a reason which won't change - fatal error.
3) bind fails for a reason which may change - startup and wait untilit does change and try again.
The canonical example of 3) is the one I gave before,--listen-address=1.2.3.4 but not local interface has address 1.2.3.4.The intention is that when a new interface comes up with address1.2.3.4 then a new socket will be created and bound. This is longafter startup, so the only option if it fails then is to log the event.

Of course, this is very special case somehow well handled. I agree thereis not much else to do. We could only make the error fatal, but I don'tthink that is desired.

If the only situation where we want to wait is the one above, then thesolution to to make EADDRNOTAVAIL at startup the only one where wekeep waiting, and all the others are fatal. I think when I originallywrote this I wasn't sure if that was the only non-fatal error which iswhy the code is as it is.
This is not a complete solution to your original problem of enforcingonly one dnsmasq daemon process in any case. For example if youconfigure a single listen-address which doesn't exist on the machine,then you can start as many dnsmasq processes as you like and they'llall start up and be waiting for the interface with that address to becreated. Once it is, all will try and bind it, and all but one willfail, but they'll all still exist. Managing daemon processes is reallythe job of sysvinit or systemd, but the authors of the bug seem tosant protection from just running the binary from the command line.

We at Fedora support only services managed by systemd. But even forthat, it needs to get some feedback of failure. If the processterminates with non-zero status code, unit will be marked failed. We*need* that. Alternative might be support for libsystemd with notifysocket, which would work with Type=notify services. Now it will reportfailed startup only with Type=forking. Later failure is logged only as awarning regardless of type of the error. I think we want unexpectederror types to be logged as errors, especially for insufficientresources errors like ENOMEM. Or made them fatal. With systemd unitRestart=on-failure, it might be able to recover from memory leaks ifsuch errors were fatal. Not sure we want that, might break a lot ofdeployments, but also fix some.

TLDR;
We either pick a set of errors which are Ok to continue(EADDRNOTAVAIL, what others?) and fail fatally at startup for allothers, or we pick a set of errors to fail fatally at startup(EADDRINUSE, EACCESS, what others?) and continue for all others.
Cheers,

Simon.

I would say safer would be to fatal error everything except explicitlywaived, for now just EADDRNOTAVAIL and EINTR? I think most of theseerrors means incomplete degraded service anyway, without reliableself-repair code present. If it had repeat timer with exponentiallyincreasing time of retry (with some upper bound), then we might want itto start anyway. But I think it is safer to prevent half-initializedservice. Systemd can provide autorecovery with smart settings. Do wehave a way to specify I do not require TCP listening socket for DNS? Itshould be clearly discouraged, but for some kinds of tests it might beacceptable.


Cheers,
Petr

On 23/11/2023 11:13, Petr Menšík wrote:
To fix problem with multiple instances correctly refusing running onthe same machine and namespaces, yes, it would be sufficient.
But I think part of the problem is hiding all problems during startupand not showing them at all, in any source. I think that is okay forEADDRNOTAVAIL to not be printed. But I think in other cases we wantat least warning somewhere. This way you also get exact error messageprinted. For example selinux policy hardening may prevent yourprocess to listen on port 53, even though it has NET_BIND_SERVICE.
With my modification it will print errors for listeners used. Note10.1.2.3 is hidden at that phase. You would not know it withoutstrace analysis. I expect there can be different errors, for examplerunning out of file descriptors or memory. Hiding somethingnon-standard happened during startup is quite a bad design. Only somekinds of errors are okay during startup.
$ sudo -u nobody fedora-like/dnsmasq -d --bind-dynamic--listen-address=10.1.2.3dnsmasq: failed to create listening socket for 127.0.0.1: Permissiondenieddnsmasq: failed to create listening socket for 127.0.0.1: Permissiondenied
dnsmasq: failed to create listening socket for ::1: Permission denied
dnsmasq: failed to create listening socket for ::1: Permission denied

dnsmasq: process is missing required capability NET_BIND_SERVICE

# Compare this with:
$ sudo -u nobody fedora-like/dnsmasq -d --bind-interfaces--listen-address=10.1.2.3dnsmasq: failed to create listening socket for 127.0.0.1: Permissiondenied
I think we want any errors printed, even if they are not made fatal.Except carefully chosen type of errors, which are expected and wouldraise just false alarms. Not sure how to trigger other types oferrors, but I am sure I would like to see them, even if they did notcause the process to die. That is why I have used more complicatedapproach, which should print everything unexpected, even when dnsmasqis not stopped. In order to investigate you first have to knowsomething unusual has happened.
On 23. 11. 23 0:29, Simon Kelley wrote:
Isn't this sufficient to fix the problem?
Not calling die() when bind-dynamic is set is intended to handle thecase that bind returns EADDRNOTAVAIL because you've configured--listen-address=1.2.3.4 but there's not a local interface with thataddress. dnsmasq runs anyway in the expectation that such aninterface will appear in the future and a socket will be bound then.
I don't think there's a die()/syslog() conflict at all.


diff --git a/src/network.c b/src/network.c
index ca9fada..db1d528 100644
--- a/src/network.c
+++ b/src/network.c
@@ -924,7 +924,7 @@ static int make_sock(union mysockaddr *addr, inttype, int dienow)
        {
/* failure to bind addresses given by --listen-address atthis point
             is OK if we're doing bind-dynamic */
-         if (!option_bool(OPT_CLEVERBIND))
+         if (!option_bool(OPT_CLEVERBIND) || errno == EADDRINUSE)
            die(s, daemon->addrbuff, EC_BADNET);
        }
       else

Cheers,

Simon.

On 22/11/2023 19:27, Petr Menšík wrote:
Hello everyone,
I have received error report RHEL-16398 [1], which I think makessense to fix even in the lastest version. I believe it allowsnon-intentional another instance running without error. What isworse, it does not even show any warning that initialization isincomplete.
Of course the problem at start is those errors happen in time whenno log is available. I think that can be fixed easily by usingstderr at that time. That is patch #1.
Second makes EADDRNOTAVAIL bind errors still hidden, but prints allother errors at least to stderr. On a system with systemd thatshould make it present in journalctl -u dnsmasq anyway. EADDRINUSEis made fatal, because that would not be usually handled by newaddresses added later. If there is a need to start another dnsmasqinstance without TCP listeners, I think that should be specifiedmore explicitly. Makes EADDRINUSE fatal the same way as with--bind-interfaces.
Would you find any other errors, which should be hidden or madefatal? What would you think of those changes?
1. https://issues.redhat.com/browse/RHEL-16398

--
Petr Menšík
Software Engineer, RHEL
Red Hat, https://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB


_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss

Re: [Dnsmasq-discuss] [PATCH] Refuse to start with EADDRINUSE in --bind-dynamic mode

Reply via email to