Re: [Dnsmasq-discuss] [PATCH] Offered IPv4 DHCP address ping fix

2021-12-08 Thread Geert Stappers via Dnsmasq-discuss
On Wed, Dec 08, 2021 at 01:18:42AM +0100, Petr Menšík wrote:
> Hi Simon and others,
> 
> I am debugging strange issue, which happens inside OpenStack in certain
> situations. It seems under not precisely defined conditions dnsmasq
> returns "no address available" error even in situation, when not yet all
> leases are used.
   
> From 18e49004782549068450cded3c12ff65e44a4308 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= 
> Date: Wed, 8 Dec 2021 00:11:46 +0100
> Subject: [PATCH 2/2] Simplify ICMP ping from dhcp

Should there be a  PATCH 1/2 ?



Groeten
Geert Stappers
-- 
Silence is hard to parse

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq on large scale network

2021-12-08 Thread Geert Stappers via Dnsmasq-discuss
On Thu, Dec 09, 2021 at 03:28:40AM +0100, Petr Menšík wrote:
> On 12/5/21 19:44, Fabian Druschke wrote:
> > Hey friends, i hope you all are doing fine.
> >
> > Currently i'm facing a little challenge. I have a large network with
> > more or less 30k clients, and i need a router for NAT from the LAN
> > subnets, in the 10.0.0.0/8 address space to the outside WAN public ip
> > address. So it's a quite simple scenario.
> >
> > I've purchased a Juniper MX150 router already, but it was the wrong
> > choice due to the lack of NAT support at all. So i wanted to use
> > OpenWrt for this scenario, because it is really really simple to set
> > up for this use.
> >
> > What i'm struggeling with, is the DHCP server included on OpenWrt. By
> > default it's dnsmasq, and it's easy to configure through the LuCi web
> > interface. Before going into production i'd like to make sure if
> > dnsmasq is designed or capable to handle this amount of clients (peak
> > 30k / 5 requests per second).
> >
> > Does someone have experience with such a scenario, and is there a
> > proper tool to benchmark DHCP ?
> >
> 
> Interesting, I am just debugging situation when multiple instances start
> and request DHCP at similar time. Without no-ping option, it works quite
> bad. Even starting 16 instances at the same time does not work reliably
> to us with ping enabled. It seems our 2.79 version is broken, 2.81 were
> fixed. But if you have 5 requests per seconds, I would use more heavy
> server. Dnsmasq is great for small networks, but I think it has no
> design for high performance. It does not scale well with hundreds or
> thousands clients.
> 
> Make sure you use no-ping for good performance. Ping code does just one
> at a time, which makes it quite slow if enabled. I would try dhcp-server
> or kea, it seems you have big enough network.

I would avoid single points of failure.
 

Groeten
Geert Stappers
-- 
Silence is hard to parse

___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] 2.80 dnspooq v3 problem

2021-12-08 Thread Petr Menšík
Hi,

yes, there were some fixes related to bind to device option. I would
suggest looking at CentOS 8/RHEL 8 patches of 2.79 [1], which hopefully
fixed also regressions caused by the CVE fixes. Description of the
problem matches something I had to fix later, it should be some of
recent patches.

I think it might be referenced by this:

||



#
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=3f535da79e7a42104543ef5c7b5fa2bed819a78b


#
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=04490bf622ac84891aad6f2dd2edf83725decdee


Patch27: dnsmasq-2.79-mixed-family-failed.patch

Not all fixes are always backported. I think the issue you describe were
about not matching sfd->fd socket properly. One were ignored because
SO_BINDTODEVICE, the other because mismatching socket number. Result was
ignored responses. Cannot remember exact commit, I am sorry. I think
Simon fixed it together with random sockets of source device, so it has
no separate commit.

Cheers,
Petr

1. https://git.centos.org/rpms/dnsmasq/blob/c8s/f/SOURCES

On 12/3/21 12:32, sunil rathod wrote:
> Hi  Petr,
> I have used the following patches for 2.80 release along with dnspooq
> patch to resolve the bugs.
>
> Does this patch have any implications  with the "SO_BINDTODEVICE"
> option in sockets. In my system, when DNS replies arrive on the
> interface, the kernel seems to drop these because of a mismatched
> socket. After the kernel upgrade, I see this problem. Is there a way
> we can bind to an IP address rather than interface for forwarding interf
>
>
> 1.
> https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q1/014789.html
> 2.
> http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=74d4fcd756a85bc1823232ea74334f7ccfb9d5d2
> 3.
> http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=8f9bd615053cd13aba82a111ec20bb79d25a2d1e
>
> Regards,
> Sunil
>
> On Fri, 2 Apr 2021 at 05:21, Simon Kelley  wrote:
>
>
>
> On 31/03/2021 08:50, Petr Menšík wrote:
> > Hi Sunil,
> >
> > This is exactly the same issue I reported on thread [1].
> Unfortunately
> > it haven't got merged separately, but it should be patched by
> > CVE-2021-3448 fix [2]. It happens only when you have rp_filter
> set to 1.
> > The root cause of this is the lookup_frec part change in commit
> > 8f9bd615053cd [3], including the part added previously by commit
> [2].
> >
> > Yes, these are uncovered bugs not found when testing dnspooq
> patches.
> > The root of the issue was there also before, but it stopped
> working only
> > after dnspooq patches. They are related.
> >
>
> Thanks Petr, Given the above.
>
> 1) This is not fixed in the 2.80 dnspooq v3  patches.
> 2) It is fixed in the forthcoming 2.85  release.
>
> Simon.
>
>
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>
-- 
Petr Menšík
Software Engineer
Red Hat, http://www.redhat.com/
email: pemen...@redhat.com
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] [PATCH] Offered IPv4 DHCP address ping fix

2021-12-08 Thread Petr Menšík
I am afraid there are some implementation problems found in IPv4 DHCP
allocations. If multiple machines start at the same time (I have script
to launch 16), especially with dhcp-sequential-ip it is problematic. ISC
dhclient can cope with it somehow well, but some PXE booting firmware is
less patient.

I think icmp_ping code needs to be reworked. It needs to be asynchronous
and it has to support multiple pings pending at one time. I have
attached two dumps. no-ping one shows how well it works with no-ping. It
takes under 8 seconds. test2 file shows how it ping just single
discover, waits ages (3 seconds), sends offer and tries another. This
algorithm may work on small networks, but anything with more than 10
clients should not be started at the same time. Because it may fail to
start.

I think hash should be removed from ping cache. Instead, weaker form of
lease should be done for offers also, with just short 3 second lifetime.
It would ensure we do not offer the same address to different clients
while we still have more addresses available. Needed with
dhcp-sequential-ip mode. I think some kind of timeout action is needed
for pings. It is not a problem sending 15 pings in two seconds, when
every has different target mac. But it should wait for timeout in
parallel. It does wait sequentially now. While it does, it does not
respond to DHCP. Any idea, how that could be improved without a major
rewrite? I have tested 2.79 version. which needs fixed from 2.81. But it
seems nothing important changed since then in current code.

Cheers,
Petr

On 12/8/21 01:18, Petr Menšík wrote:
>
> Hi Simon and others,
>
> I am debugging strange issue, which happens inside OpenStack in
> certain situations. It seems under not precisely defined conditions
> dnsmasq returns "no address available" error even in situation, when
> not yet all leases are used.
>
> It seems do_icmp_ping is responsible for ruling out recently tried IP
> addresses. It seems a bit weird address allocation happens only for
> addresses recently not pinged. I have found another place which does
> do_icmp_ping, but does not use hash value computed from hardware
> address. Even when it is already known at that time. First patch
> attached adds hash also to second place. That should mean single
> address would use shared ping. The second patch simplifies a bit
> do_icmp_patch and its return value. Instead of artificially ensuring
> hash would match, just return correct value when hash matches. The
> second change is just optional optimization.
>
> Few details are at RH bug #2028704 [1]. Original tested version 2.79
> did not contain commit 0669ee7a69a
> 
> [2], which improves the situation. But I think there remain cases when
> ping is not accepted when it should be. Testing with latest release
> did not work according to report. I think the first patch may fix
> still missing part.
>
> Cheers,
> Petr
>
> 1. https://bugzilla.redhat.com/show_bug.cgi?id=2028704
> 2.
> http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=0669ee7a69a004ce34fed41e50aa575f8e04427b
>
> -- 
> Petr Menšík
> Software Engineer
> Red Hat, http://www.redhat.com/
> email: pemen...@redhat.com
> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB

-- 
Petr Menšík
Software Engineer
Red Hat, http://www.redhat.com/
email: pemen...@redhat.com
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB


dnsmasq-16-no-ping.pcapng
Description: application/pcapng


dnsmasq-16-test2.pcapng
Description: application/pcapng
___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] dnsmasq on large scale network

2021-12-08 Thread Petr Menšík
Interesting, I am just debugging situation when multiple instances start
and request DHCP at similar time. Without no-ping option, it works quite
bad. Even starting 16 instances at the same time does not work reliably
to us with ping enabled. It seems our 2.79 version is broken, 2.81 were
fixed. But if you have 5 requests per seconds, I would use more heavy
server. Dnsmasq is great for small networks, but I think it has no
design for high performance. It does not scale well with hundreds or
thousands clients.

Make sure you use no-ping for good performance. Ping code does just one
at a time, which makes it quite slow if enabled. I would try dhcp-server
or kea, it seems you have big enough network.

Just my 2 cents.

Cheers,
Petr

On 12/5/21 19:44, Fabian Druschke wrote:
> Hey friends, i hope you all are doing fine.
>
> Currently i'm facing a little challenge. I have a large network with
> more or less 30k clients, and i need a router for NAT from the LAN
> subnets, in the 10.0.0.0/8 address space to the outside WAN public ip
> address. So it's a quite simple scenario.
>
> I've purchased a Juniper MX150 router already, but it was the wrong
> choice due to the lack of NAT support at all. So i wanted to use
> OpenWrt for this scenario, because it is really really simple to set
> up for this use.
>
> What i'm struggeling with, is the DHCP server included on OpenWrt. By
> default it's dnsmasq, and it's easy to configure through the LuCi web
> interface. Before going into production i'd like to make sure if
> dnsmasq is designed or capable to handle this amount of clients (peak
> 30k / 5 requests per second).
>
> Does someone have experience with such a scenario, and is there a
> proper tool to benchmark DHCP ?
>
>
> Thanks in advance!
>
>
> ___
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>
-- 
Petr Menšík
Software Engineer
Red Hat, http://www.redhat.com/
email: pemen...@redhat.com
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss


Re: [Dnsmasq-discuss] [PATCH] Fix segfault and regressions in option --rebind-domain-ok

2021-12-08 Thread Simon Kelley
On 18/11/2021 18:21, Sung Pae wrote:
> Hello list,
> 
> The --rebind-domain-ok option is broken in v2.86 and on master in the
> following ways:
> 
> * In v2.85, --stop-dns-rebind --rebind-domain-ok=test.me would only allow
>   "test.me" and subdomains of "test.me" to return private addresses to the
>   user. A query for localtest.me, which is known to return 127.0.0.1, is
>   blocked as expected.
> 
>   In v2.86, the --rebind-domain-ok feature is implemented with a simple suffix
>   comparison, which means that --stop-dns-rebind --rebind-domain-ok=test.me
>   fails to block the response of "127.0.0.1" for "localtest.me" because
>   "test.me" is a suffix of "localtest.me".
> 
>   Here is a reproducible example:
> 
> v2.85$ src/dnsmasq -C /dev/null -a 127.0.0.1 -p 5353 -S 1.1.1.1 -qd 
> --no-resolv --stop-dns-rebind --rebind-domain-ok=test.me
> dnsmasq: started, version 2.85 cachesize 150
> dnsmasq: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n 
> no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-cryptohash 
> no-DNSSEC loop-detect inotify dumpfile
> dnsmasq: using nameserver 1.1.1.1#53
> dnsmasq: read /etc/hosts - 1 addresses
> dnsmasq: query[A] localtest.me from 127.0.0.1
> dnsmasq: forwarded localtest.me to 1.1.1.1
> dnsmasq: possible DNS-rebind attack detected: localtest.me
> 
> master$ src/dnsmasq -C /dev/null -a 127.0.0.1 -p 5353 -S 1.1.1.1 -qd 
> --no-resolv --stop-dns-rebind --rebind-domain-ok=test.me
> dnsmasq: started, version 2.87test4-4-gc0409fa cachesize 150
> dnsmasq: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n 
> no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset auth 
> no-cryptohash no-DNSSEC loop-detect inotify dumpfile
> dnsmasq: using nameserver 1.1.1.1#53
> dnsmasq: read /etc/hosts - 1 addresses
> dnsmasq: query[A] localtest.me from 127.0.0.1
> dnsmasq: forwarded localtest.me to 1.1.1.1
> dnsmasq: reply localtest.me is 127.0.0.1
> 
> * In v2.85, --stop-dns-rebind --rebind-domain-ok=// means "stop potential DNS
>   rebinding attacks, but allow private responses for dotless domains", which
>   mirrors the special meaning of // in the --server option.
> 
>   In v2.86, --stop-dns-rebind --rebind-domain-ok=// crashes dnsmasq during
>   resolution.
> 
> v2.85$ src/dnsmasq -C /dev/null -a 127.0.0.1 -p 5353 -S 192.168.0.1 -qd 
> --no-resolv --stop-dns-rebind --rebind-domain-ok=//
> dnsmasq: started, version 2.85 cachesize 150
> dnsmasq: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n 
> no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-cryptohash 
> no-DNSSEC loop-detect inotify dumpfile
> dnsmasq: using nameserver 1.1.1.1#53
> dnsmasq: read /etc/hosts - 1 addresses
> dnsmasq: query[A] brother-laser-printer from 127.0.0.1
> dnsmasq: forwarded brother-laser-printer to 192.168.0.1
> dnsmasq: reply brother-laser-printer is 192.168.0.50
> 
> master$ src/dnsmasq -C /dev/null -a 127.0.0.1 -p 5353 -S 192.168.0.1 -qd 
> --no-resolv --stop-dns-rebind --rebind-domain-ok=//
> dnsmasq: started, version 2.87test4-2-g9560658 cachesize 150
> dnsmasq: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n 
> no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset no-nftset auth 
> no-cryptohash no-DNSSEC loop-detect inotify dumpfile
> dnsmasq: using nameserver 1.1.1.1#53
> dnsmasq: read /etc/hosts - 1 addresses
> dnsmasq: query[A] brother-laser-printer from 127.0.0.1
> Segmentation fault (core dumped)
> 
>   Note that the new suffix-matching algorithm of --rebind-domain-ok means that
>   even if the crash above is fixed, an empty option value effectively negates
>   --stop-dns-rebind because the empty string is a suffix of all possible
>   strings.
> 
> The attached patches address the issues above and restore the behavior of
> --rebind-domain-ok back to the semantics of v2.85. The patches are also
> available on Github:
> 
> https://github.com/guns/dnsmasq/compare/master...fix-option-rebind-domain-ok
> https://github.com/guns/dnsmasq/commit/3abd86eb9e53efeea270767fd251284851d706d4
> https://github.com/guns/dnsmasq/commit/4afb5b4ce50a4d3b7f917d2ce83ea1b27dd55db5
> 

Thanks for the clear and complete bug report. I accept that these are
bugs and regressions. I applied your first patch as is, but the second
one seems to be a way over-complicated fix. I've therefore taken the
liberty of fixing the issue a different way. Please see
https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=1176cd58c90fc37bf98a6f774b26fc1adc8fd8e9

If you could test that, and make sure I didn't break things differently,
I'd be very grateful.

Thanks again.


Simon.


___
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss