Re: unbound/dns issue (malformed packets?)

2019-09-16 Thread Peter J. Philipp
Hi Joe,

The domain whatsapp.com doesn't guarantee integrity to you (they have dnssec
turned off, at least last I checked).  It's possible that someone got in your 
middle and inserted a bogus record.  This being said I'M ignorant to the fact 
that nlnetlabs have changed their internal database, so this is likely not a 
corruption issue but stems from the wire.

Hopefully my 2 cents are helpful.

-peter

On Sun, Sep 15, 2019 at 06:23:28PM -0700, Joe Barnett wrote:
> I've been seeing some issues which I believe to be related to dns/resolving.
> The short of it is that the results of
> 
> # dig web.whatsapp.com
> 
> start out as:
> 
> ; <<>> DiG 9.4.2-P2 <<>> web.whatsapp.com
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57665
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;web.whatsapp.com.  IN  A
> 
> ;; ANSWER SECTION:
> web.whatsapp.com.   3595IN  CNAME   mmx-ds.cdn.whatsapp.net.
> mmx-ds.cdn.whatsapp.net. 55 IN  A   31.13.70.49
> 
> ;; Query time: 6 msec
> ;; SERVER: 192.168.254.254#53(192.168.254.254)
> ;; WHEN: Sun Sep 15 14:46:24 2019
> ;; MSG SIZE  rcvd: 87
> 
> which seems reasonable (and functional), but then soon become:
> 
> ;; Warning: Message parser reports malformed message packet.
> 
> ; <<>> DiG 9.4.2-P2 <<>> web.whatsapp.com
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40939
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;web.whatsapp.com.  IN  A
> 
> ;; ANSWER SECTION:
> web.whatsapp.com.   3528IN  CNAME   mmx-ds.cdn.whatsapp.net.
> mmx-ds.cdn.whatsapp.net. 30772  RESERVED0 A \# 4 1F0D4631
> 
> ;; Query time: 2 msec
> ;; SERVER: 192.168.254.254#53(192.168.254.254)
> ;; WHEN: Sun Sep 15 14:47:31 2019
> ;; MSG SIZE  rcvd: 87
> 
> At which point I am no longer able to access web.whatsapp.com.  Given that
> whatsapp is a facebook property, I tried the above against facebook.com,
> www.facebook.com, instagram.com, and www.instagram.com as well.  With the
> exception of instagram.com, the other three (facebook, www.facebook,
> www.instagram) return a hex (?) formatted version of the IP address, similar
> to what is seen in the later of the above examples.  My thinking is (or was)
> that there are some issues relating to fb's DNS.  From outside of my
> network, however, other resolvers seem to be able to continually resolve the
> above names correctly.  I don't know what those resolvers are, but
> specifically I am referring to whatever Linode and DigitalOcean use in the
> nameservers they provide to their basic Linux vms (I am using the default
> network config in my vms at Linode and DigitalOcean).  I have a suspicion
> that Linode uses unbound, but I do not know how to verify that.  Oh, as far
> as I can tell, those facebook-family names *seem* to be the only names for
> which I see this behavior -- all other names that I have tried to run
> through dig (and nslookup) seem to return reasonable and seemingly correct
> results.
> 
> A bit about my (home) network.  I have Cox cable internet service, an Arris
> SBG7580-AC, and an OpenBSD 6.5 machine that sits between the modem and the
> rest of the network.  I(we) do use the modem in router mode (but without
> using the built-in WiFi) as my wife's work git-up consists of a
> pre-configured black-box of a Juniper device.  Not wanting that device in
> the rest of our network, I set the modem to "RoutedWithNAT" and the two
> network devices plug into the modem, but provide two separate networks.  For
> remote ingress into the rest of the network, I set the modem's DMZ to point
> to the OpenBSD box.  My pf.conf does the usual small network stuff including
> NAT, a bit of redirection, etc.  It has changed very little in the past
> several years.  My unbound.conf is also nearly unchanged since I first set
> it up when OpenBSD dropped bind and replaced it with unbound.  My OpenBSD
> machine provides name resolving for the rest of the network.  My
> unbound.conf follows:
> 
> server:
> interface: 0.0.0.0
> interface: ::1
> do-ip6: no
> 
> access-control: 0.0.0.0/0 refuse
> access-control: 127.0.0.0/8 allow
> access-control: 192.168.0.0/16 allow
> access-control: 10.0.0.0/24 allow
> access-control: 172.16.0.0/24 allow
> access-control: ::0/0 refuse
> access-control: ::1 allow
> 
> hide-identity: yes
> hide-version: yes
> 
> # ftp://FTP.INTERNIC.NET/domain/named.cache
> root-hints: "/var/unbound/etc/named.cache"
> 
> # uncomment to enable DNSSEC
> auto-trust-anchor-file: "/var/unbound/db/root.key"
> 
> ### various local-zone, local-data, and local-date-ptr ###
> 
> remote-control:
> control-enable: yes
> control-use-cert: yes
>

unbound/dns issue (malformed packets?)

2019-09-15 Thread Joe Barnett
I've been seeing some issues which I believe to be related to 
dns/resolving.  The short of it is that the results of


# dig web.whatsapp.com

start out as:

; <<>> DiG 9.4.2-P2 <<>> web.whatsapp.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57665
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;web.whatsapp.com.  IN  A

;; ANSWER SECTION:
web.whatsapp.com.   3595IN  CNAME   mmx-ds.cdn.whatsapp.net.
mmx-ds.cdn.whatsapp.net. 55 IN  A   31.13.70.49

;; Query time: 6 msec
;; SERVER: 192.168.254.254#53(192.168.254.254)
;; WHEN: Sun Sep 15 14:46:24 2019
;; MSG SIZE  rcvd: 87

which seems reasonable (and functional), but then soon become:

;; Warning: Message parser reports malformed message packet.

; <<>> DiG 9.4.2-P2 <<>> web.whatsapp.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40939
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;web.whatsapp.com.  IN  A

;; ANSWER SECTION:
web.whatsapp.com.   3528IN  CNAME   mmx-ds.cdn.whatsapp.net.
mmx-ds.cdn.whatsapp.net. 30772  RESERVED0 A \# 4 1F0D4631

;; Query time: 2 msec
;; SERVER: 192.168.254.254#53(192.168.254.254)
;; WHEN: Sun Sep 15 14:47:31 2019
;; MSG SIZE  rcvd: 87

At which point I am no longer able to access web.whatsapp.com.  Given 
that whatsapp is a facebook property, I tried the above against 
facebook.com, www.facebook.com, instagram.com, and www.instagram.com as 
well.  With the exception of instagram.com, the other three (facebook, 
www.facebook, www.instagram) return a hex (?) formatted version of the 
IP address, similar to what is seen in the later of the above examples.  
My thinking is (or was) that there are some issues relating to fb's DNS. 
 From outside of my network, however, other resolvers seem to be able to 
continually resolve the above names correctly.  I don't know what those 
resolvers are, but specifically I am referring to whatever Linode and 
DigitalOcean use in the nameservers they provide to their basic Linux 
vms (I am using the default network config in my vms at Linode and 
DigitalOcean).  I have a suspicion that Linode uses unbound, but I do 
not know how to verify that.  Oh, as far as I can tell, those 
facebook-family names *seem* to be the only names for which I see this 
behavior -- all other names that I have tried to run through dig (and 
nslookup) seem to return reasonable and seemingly correct results.


A bit about my (home) network.  I have Cox cable internet service, an 
Arris SBG7580-AC, and an OpenBSD 6.5 machine that sits between the modem 
and the rest of the network.  I(we) do use the modem in router mode (but 
without using the built-in WiFi) as my wife's work git-up consists of a 
pre-configured black-box of a Juniper device.  Not wanting that device 
in the rest of our network, I set the modem to "RoutedWithNAT" and the 
two network devices plug into the modem, but provide two separate 
networks.  For remote ingress into the rest of the network, I set the 
modem's DMZ to point to the OpenBSD box.  My pf.conf does the usual 
small network stuff including NAT, a bit of redirection, etc.  It has 
changed very little in the past several years.  My unbound.conf is also 
nearly unchanged since I first set it up when OpenBSD dropped bind and 
replaced it with unbound.  My OpenBSD machine provides name resolving 
for the rest of the network.  My unbound.conf follows:


server:
interface: 0.0.0.0
interface: ::1
do-ip6: no

access-control: 0.0.0.0/0 refuse
access-control: 127.0.0.0/8 allow
access-control: 192.168.0.0/16 allow
access-control: 10.0.0.0/24 allow
access-control: 172.16.0.0/24 allow
access-control: ::0/0 refuse
access-control: ::1 allow

hide-identity: yes
hide-version: yes

# ftp://FTP.INTERNIC.NET/domain/named.cache
root-hints: "/var/unbound/etc/named.cache"

# uncomment to enable DNSSEC
auto-trust-anchor-file: "/var/unbound/db/root.key"

### various local-zone, local-data, and local-date-ptr ###

remote-control:
control-enable: yes
control-use-cert: yes
control-interface: /var/run/unbound.sock

do-ip6, root-hints, and auto-trust-anchor-file are somewhat recent 
additions to my unbound.conf, but I experience the same behavior with 
unbound.conf as above, and also when I comment out those three additions 
(bringing it back to a configuration that has worked for several years).


My OpenBSD machine is an APU2 which I have been using without issue for 
over a year.  My backup machine is an ALIX2D3 I think it is called.  
Other than the APU running amd64, and the ALIX running i386, the 
machines are otherwise configured exactly the same.  The APU2 has been 
consistently maintained, and this