Re: [Dnsmasq-discuss] [PATCH] Re: Issues with dnsmasq under NM and domain redirection: REFUSED

2023-11-27 Thread Simon Kelley



On 31/10/2023 16:39, Petr Menšík wrote:
I am still not sure what exactly causes this problem, but I have hit it 
again. I am sure it happens sometimes, when I disconnect from my Lenovo 
docking station and then connect back to it.


Interesting thing I have found is it gets unblocked by sending a simple 
dig -4 @localhost +tcp fedoraproject.org query. TCP query seems to do 
enumerate_interfaces(0) on every query, which fixes incorrect ifindex 
and unblocks the dnsmasq.


I am not sure why check_servers(0); called from dbus.c does not fix this 
reliably. It seems to me it should. It may be just delayed or run too 
soon. I think we can afford enumerating interface on fatal error, which 
results in REFUSED response anyway.


It runs with these parameters:

/usr/sbin/dnsmasq --no-resolv --keep-in-foreground --no-hosts 
--bind-interfaces --pid-file=/run/NetworkManager/dnsmasq.pid 
--listen-address=127.0.0.1 --cache-size=400 --clear-on-reload 
--conf-file=/dev/null --proxy-dnssec 
--enable-dbus=org.freedesktop.NetworkManager.dnsmasq 
--conf-dir=/etc/NetworkManager/dnsmasq.d


But it seems to me local_bind would bind interface whether 
--bind-interfaces or --bind-dynamic is present. So I think no condition 
should be for enumerate_interfaces(0); call in this case as well.


If that's sufficient to fix this bug, then I can't see a reason not to 
make the change. The other way to fix it is to 
s/--bind-interfaces/--bind-dynamic/  That's maybe a better fix, since 
there are platforms which can't enumerate interfaces, so the problem 
will still be there. At least if you set --bind-dynamic on such a 
platform it will warn you as it falls back to bind-interfaces behaviour.


Cheers,

Simon.



I have created for it bug #2247269 [1] for tracking this.

1. https://bugzilla.redhat.com/show_bug.cgi?id=2247269

On 16. 10. 23 15:02, Petr Menšík wrote:

Hello everyone.

Today I have returned to work, where I am running dnsmasq 2.89 on my 
Fedora 27 laptop. It is configured by Network Manager by its 
dns=dnsmasq plugin. But when I returned today, I have found our 
internal network refused to resolve any name. I dug into dnsmasq what 
it does. Problem is it did not fix itself after a while, but 
stubbornly failed without later fix.


It were failing quite often on random_sock() local_bind call. The 
errno returned 99. I have noticed it failed to notice change of 
ifindex in interface it should be bound to.


(gdb) bt
#0  0x7f53305e7020 in strerror () from /lib64/libc.so.6
#1  0x5557a3ec2c4b in random_sock (s=s@entry=0x5557a43fef50) at 
/usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:2511
#2  0x5557a3ec62f2 in allocate_rfd 
(fdlp=fdlp@entry=0x5557a43f5280, serv=serv@entry=0x5557a43fef50)

    at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:2607
#3  0x5557a3ec72dc in forward_query (udpfd=4, 
udpaddr=0x7ffdb6bfbd30, dst_addr=0x7ffdb6bfbd00, dst_iface=0, 
header=0x5557a43e03d0, plen=51,
    limit=0x5557a43e0880 "", now=1697453089, forward=0x5557a43f5230, 
ad_reqd=1, do_bit=0, fast_retry=0)

    at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:498
#4  0x5557a3ed0ebd in receive_query (now=1697453089, 
listen=0x5557a43e0cc0) at 
/usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/forward.c:1869
#5  check_dns_listeners (now=1697453089) at 
/usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/dnsmasq.c:1845
#6  0x5557a3eac9ef in main (argc=, argv=out>) at /usr/src/debug/dnsmasq-2.89-5.fc37.x86_64/src/dnsmasq.c:1266


(gdb) p *$d->servers->next->next->next->next->next->next
$8 = {flags = 800, domain_len = 14, domain = 0x5557a43f5eb0 
"brq.redhat.com", next = 0x5557a43ffa10, serial = 6, arrayposn = 23,
  last_server = -1, addr = {sa = {sa_family = 2, sa_data = 
"\0005\n&\005\032\226\r\2170S\177\000"}, in = {sin_family = 2, 
sin_port = 13568,
  sin_addr = {s_addr = 436545034}, sin_zero = 
"\226\r\2170S\177\000"}, in6 = {sin6_family = 2, sin6_port = 13568, 
sin6_flowinfo = 436545034,
  sin6_addr = {__in6_u = {__u6_addr8 = 
"\226\r\2170S\177\000\\275\001\a\220\000\000", __u6_addr16 = 
{3478, 12431, 32595, 0, 48432, 1793,
    144, 0}, __u6_addr32 = {814681494, 32595, 117554480, 
144}}}, sin6_scope_id = 3446832640}}, source_addr = {sa = {sa_family = 2,
  sa_data = "\000\000\000\000\000\000@\274\277\266\375\177\000"}, 
in = {sin_family = 2, sin_port = 0, sin_addr = {s_addr = 0},
  sin_zero = "@\274\277\266\375\177\000"}, in6 = {sin6_family = 2, 
sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
  __u6_addr8 = 
"@\274\277\266\375\177\000\000@\274\277\266\375\177\000", __u6_addr16 
= {48192, 46783, 32765, 0, 48192, 46783, 32765, 0},
  __u6_addr32 = {3066018880, 32765, 3066018880, 32765}}}, 
sin6_scope_id = 814672583}},
  interface = "enp9s0u1\000\000\000\000\000\000\000\000", ifindex = 7, 
sfd = 0x0, tcpfd = 0, edns_pktsz = 1232, pktsz_reduced = 0, queries = 
446,
  failed_queries = 0, nxdomain_replies = 0, retrys = 4, query_latency 
= 0, mma_latency 

Re: [Dnsmasq-discuss] [PATCH] Refuse to start with EADDRINUSE in --bind-dynamic mode

2023-11-27 Thread Simon Kelley



On 25/11/2023 16:51, Petr Menšík wrote:
Yes, the problem is 3) has a condition we wait until it changes then 
retry. But for a lot (most?) of errors we lack any indication from the 
system it has changed.


For example insufficient memory or insufficient file descriptors. It may 
change, but unlike watching up and down interfaces, there is no hook 
which would retry listener creation. It fails once and then just maybe 
retries on explicit reload. That is why I think it is absolutely 
necessary to log any failure we pass somewhere, unless we know we will 
retry later.


You're right, the only error from bind() that should be ignored is 
EADDRNOTAVAIL. everything else should be a fatal error during startup or 
logged once  the daemon is running.


I've just pushed a patch to that effect.

Cheers,

Simon.



More below...

On 11/23/23 13:47, Simon Kelley wrote:
That's a good point, but I don't think there needs to be any non-fatal 
error logging. There are three situations during startup.


1) bind() succeeds.
2) bind fails for a reason which won't change - fatal error.
3) bind fails for a reason which may change - startup and wait until 
it does change and try again.


The canonical example of 3) is the one I gave before, 
--listen-address=1.2.3.4 but not local interface has address 1.2.3.4. 
The intention is that when a new interface comes up with address 
1.2.3.4 then a new socket will be created and bound. This is long 
after startup, so the only option if it fails then is to log the event.
Of course, this is very special case somehow well handled. I agree there 
is not much else to do. We could only make the error fatal, but I don't 
think that is desired.


If the only situation where we want to wait is the one above, then the 
solution to to make EADDRNOTAVAIL at startup the only one where we 
keep waiting, and all the others are fatal. I think when I originally 
wrote this I wasn't sure if that was the only non-fatal error which is 
why the code is as it is.


This is not a complete solution to your original problem of enforcing 
only one dnsmasq daemon process in any case. For example if you 
configure a single listen-address which doesn't exist on the machine, 
then you can start as many dnsmasq processes as you like and they'll 
all start up and be waiting for the interface with that address to be 
created. Once it is, all will try and bind it, and all but one will 
fail, but they'll all still exist. Managing daemon processes is really 
the job of sysvinit or systemd, but the authors of the bug seem to 
sant protection from just running the binary from the command line.


We at Fedora support only services managed by systemd. But even for 
that, it needs to get some feedback of failure. If the process 
terminates with non-zero status code, unit will be marked failed. We 
*need* that. Alternative might be support for libsystemd with notify 
socket, which would work with Type=notify services. Now it will report 
failed startup only with Type=forking. Later failure is logged only as a 
warning regardless of type of the error. I think we want unexpected 
error types to be logged as errors, especially for insufficient 
resources errors like ENOMEM. Or made them fatal. With systemd unit 
Restart=on-failure, it might be able to recover from memory leaks if 
such errors were fatal. Not sure we want that, might break a lot of 
deployments, but also fix some.




TLDR;

We either pick a set of errors which are Ok to continue 
(EADDRNOTAVAIL, what others?) and fail fatally at startup for all 
others, or we pick a set of errors to fail fatally at startup 
(EADDRINUSE, EACCESS, what others?) and continue for all others.



Cheers,

Simon.


I would say safer would be to fatal error everything except explicitly 
waived, for now just EADDRNOTAVAIL and EINTR? I think most of these 
errors means incomplete degraded service anyway, without reliable 
self-repair code present. If it had repeat timer with exponentially 
increasing time of retry (with some upper bound), then we might want it 
to start anyway. But I think it is safer to prevent half-initialized 
service. Systemd can provide autorecovery with smart settings. Do we 
have a way to specify I do not require TCP listening socket for DNS? It 
should be clearly discouraged, but for some kinds of tests it might be 
acceptable.


Cheers,
Petr




On 23/11/2023 11:13, Petr Menšík wrote:
To fix problem with multiple instances correctly refusing running on 
the same machine and namespaces, yes, it would be sufficient.


But I think part of the problem is hiding all problems during startup 
and not showing them at all, in any source. I think that is okay for 
EADDRNOTAVAIL to not be printed. But I think in other cases we want 
at least warning somewhere. This way you also get exact error message 
printed. For example selinux policy hardening may prevent your 
process to listen on port 53, even though it has NET_BIND_SERVICE.


With my modification it will print 

[Dnsmasq-discuss] small typo in dnsmasq.conf.example

2023-11-27 Thread Brenton Bostick
Near the end of dnsmasq.conf.example, there is this section:

# Provide an alias for a "local" DNS name. Note that this _only_ works
# for targets which are names from DHCP or /etc/hosts. Give host
# "bert" another name, bertrand
#cname=bertand,bert


It seems that the example has a typo, i.e., it has "bertand" instead of
"bertrand", It should say:

#cname=bertrand,bert


Brenton
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss