Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-13 Thread Vincent Lefevre
On 2016-08-13 16:46:46 +0200, Aurelien Jarno wrote:
> On 2016-08-13 04:23, Vincent Lefevre wrote:
> > I was not suggesting not to return all addresses. But in case of error
> > (which could just be a temporary network error, not necessarily due to
> > a bug in the nameserver, e.g. due to network congestion), if some of
> > the IP addresses are known, they could be made available to the calling
> > application in case they could be useful (e.g. for a connection). If
> > the application wants all the addresses, it can check error conditions
> > as usual.
> 
> The glibc provides getaddrinfo() which is a POSIX interface, also
> described in RFC2553. You can't change it just because you think it's
> better. Alternatively some other resolver libraries might provide the
> behaviour you need. Anyway in both cases it requires some changes on the
> application side too, which is clearly out of scope of this bug report.

It seems that POSIX doesn't specify the answer in the struct addrinfo
in case of error. But anyway, I was thinking more of an alternative
function, which could be more efficient when the goal is to do a
connection, since the applications need to be modified. Since many
applications could benefit from this, having such a function in the
GNU libc may be better than another resolver library.

Now, in the present case of keys.gnupg.net, this may be unnecessary
(see below about the 11/9/5 and the truncation bit).

> Also note that in your case the getaddrinfo() function returns an
> EAI_AGAIN error aka "Temporary failure in name resolution". The
> application (in your case gnupg) can try to handle the failure or at
> least display a better error message than "Host not found" which is
> clearly misleading in that case.

Indeed. Now, with the new gnupg 2.x that has just replaced the old
one in Debian/unstable, resolving seems to be done differently and
I no longer get an error (I've checked that "ping" still fails to
be sure that this wasn't due to something else). So, there's no bug
to report to gnupg. :)

> > And I would say that it could be the opposite. Imagine a host with
> > hundreds of millions of IP addresses...
> 
> I am sure there is a limit somewhere in one of the RFC.

I haven't found a limit (though I didn't check everything).

According to

  
http://serverfault.com/questions/652237/whats-the-maximum-number-of-ips-a-dns-a-record-can-have

there isn't a limit, but this doesn't seem to be based on RFC's,
more on testing. With the example, 1000 records are obtained per
TCP query; other records are obtained with additional TCP queries,
but only one more at a time (rotation by 1). Well, this is rather
ugly with this client.

> Anyway if such a DNS entry exists, I don't think returning a failure
> is really a problem.

And this is what the nameserver of our router is doing! Its chosen
limit can appear to be low, but in absence of specification, how
to choose a practical limit? It seems to be rare to have more than
4 A or  records. Even www.google.org has only one. BTW, I'd be
interested in some statistics.

> > > The point is that the local resolver is supposed to be working
> > > correctly.
> > 
> > and the network quality is good, which is not always the case.
> > 
> > > If it doesn't, one can easily setup a local recursive name server
> > > like unbound.
> > 
> > Unfortunately, this is not a general solution due to buggy ISP's.
> > 
> > > > 11:55:59.097743 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? 
> > > > keys.gnupg.net. (32)
> > > > 11:55:59.097796 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ ? 
> > > > keys.gnupg.net. (32)
> > > > 11:55:59.098339 IP 192.168.0.6.38010 > 192.168.0.1.domain: 4217+ PTR? 
> > > > 1.0.168.192.in-addr.arpa. (42)
> > > > 11:55:59.143100 IP 192.168.0.1.domain > 192.168.0.6.38010: 4217 
> > > > NXDomain* 0/1/0 (94)
> > > > 11:55:59.143325 IP 192.168.0.6.43592 > 192.168.0.1.domain: 23396+ PTR? 
> > > > 6.0.168.192.in-addr.arpa. (42)
> > > > 11:55:59.161082 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 
> > > > CNAME pool.sks-keyservers.net., A 198.128.3.63, A 93.94.119.246, A 
> > > > 78.46.223.54, A 131.175.15.4, A 151.252.40.184, A 5.9.50.141, A 
> > > > 209.135.211.141, A 5.135.158.148, A 68.187.0.77, A 193.17.17.6 (502)
> > > 
> > > This tcpdump trace doesn't show the answer header, so we don't know if
> > > the truncation flag is set. That said the 11/9/5 says that the answer
> > > contains 11 answer records, 9 name server records and 5 additional
> > > records. This clearly doesn't fit. A normal DNS server would just return
> > > 11 answers, so 11/0/0.
> > > 
> > > That said I just realized that the strace entry in your previous email
> > > contains the beginning of the answer:
> > > 
> > > > 30419 recvfrom(4, 
> > > > "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, 
> > > > {sa_family=AF_INET, sin_port=htons(53), 
> > > > sin_addr=inet_addr("192.168.0.1")}, [16]) = 500
> > > 
> > > Converted into 

Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-13 Thread Aurelien Jarno
On 2016-08-13 04:23, Vincent Lefevre wrote:
> On 2016-08-12 23:24:29 +0200, Aurelien Jarno wrote:
> > On 2016-08-12 12:15, Vincent Lefevre wrote:
> > > According to tcpdump output below, there is no truncation: the number
> > > of A's and 's (10 for each) match what "host keys.gnupg.net"
> > > gives. BTW, even if there were a truncation, there shouldn't be a
> > > failure: using of the returned IP addresses would be sufficient to
> > > connect.
> > 
> > That a wrong assumption. The libc getaddrinfo interface is not to
> > connect to an IP, but rather to return *all* addresses corresponding to
> > a query. The returned IPs are not necessarily used for a connection
> > later. 
> 
> I was not suggesting not to return all addresses. But in case of error
> (which could just be a temporary network error, not necessarily due to
> a bug in the nameserver, e.g. due to network congestion), if some of
> the IP addresses are known, they could be made available to the calling
> application in case they could be useful (e.g. for a connection). If
> the application wants all the addresses, it can check error conditions
> as usual.

The glibc provides getaddrinfo() which is a POSIX interface, also
described in RFC2553. You can't change it just because you think it's
better. Alternatively some other resolver libraries might provide the
behaviour you need. Anyway in both cases it requires some changes on the
application side too, which is clearly out of scope of this bug report.

Also note that in your case the getaddrinfo() function returns an
EAI_AGAIN error aka "Temporary failure in name resolution". The
application (in your case gnupg) can try to handle the failure or at
least display a better error message than "Host not found" which is
clearly misleading in that case.

> > Not returning all addresses so might lead to data loss or
> > security issue.
> 
> Well, an application should not base its security on the nameserver.
> It is well-known that nameservers can return fake answers.

The local recursive nameserver is by definition trusted. If additional
security is required, DNSSEC can be used.

> And I would say that it could be the opposite. Imagine a host with
> hundreds of millions of IP addresses...

I am sure there is a limit somewhere in one of the RFC. Anyway if such
a DNS entry exists, I don't think returning a failure is really a
problem.

> > The point is that the local resolver is supposed to be working
> > correctly.
> 
> and the network quality is good, which is not always the case.
> 
> > If it doesn't, one can easily setup a local recursive name server
> > like unbound.
> 
> Unfortunately, this is not a general solution due to buggy ISP's.
> 
> > > 11:55:59.097743 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? 
> > > keys.gnupg.net. (32)
> > > 11:55:59.097796 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ ? 
> > > keys.gnupg.net. (32)
> > > 11:55:59.098339 IP 192.168.0.6.38010 > 192.168.0.1.domain: 4217+ PTR? 
> > > 1.0.168.192.in-addr.arpa. (42)
> > > 11:55:59.143100 IP 192.168.0.1.domain > 192.168.0.6.38010: 4217 NXDomain* 
> > > 0/1/0 (94)
> > > 11:55:59.143325 IP 192.168.0.6.43592 > 192.168.0.1.domain: 23396+ PTR? 
> > > 6.0.168.192.in-addr.arpa. (42)
> > > 11:55:59.161082 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 
> > > CNAME pool.sks-keyservers.net., A 198.128.3.63, A 93.94.119.246, A 
> > > 78.46.223.54, A 131.175.15.4, A 151.252.40.184, A 5.9.50.141, A 
> > > 209.135.211.141, A 5.135.158.148, A 68.187.0.77, A 193.17.17.6 (502)
> > 
> > This tcpdump trace doesn't show the answer header, so we don't know if
> > the truncation flag is set. That said the 11/9/5 says that the answer
> > contains 11 answer records, 9 name server records and 5 additional
> > records. This clearly doesn't fit. A normal DNS server would just return
> > 11 answers, so 11/0/0.
> > 
> > That said I just realized that the strace entry in your previous email
> > contains the beginning of the answer:
> > 
> > > 30419 recvfrom(4, 
> > > "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, 
> > > {sa_family=AF_INET, sin_port=htons(53), 
> > > sin_addr=inet_addr("192.168.0.1")}, [16]) = 500
> > 
> > Converted into hexadecimal, this is:
> >   27 4a 83 80 00 01 00 0b 00 08 00 00 04 6b 65 79
> >   73 05 67 6e 75 70 67 03 6e 65 74 00 00 1c 00 01
> > 
> > 274a is the identification. The flags are 8380 and corresponds to QR,
> > TC, RD, RA. Your name server clearly says that the answer is truncated.
> > On a working nameserver, the flags are 8180 for this query, so the same
> > without the truncation flag.
> 
> I don't understand here. You said above "This clearly doesn't fit.",
> so that it is normal that the truncation flag is set, isn't it?
> Or do you mean that the answer should have been 11/0/0, so that
> the truncation flag wouldn't be set as a consequence?

Your recursive DNS nameserver got asked to resolve keys.gnupg.net. As
all A records fit inside the 512 bytes limit, your 

Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-12 Thread Vincent Lefevre
On 2016-08-12 23:24:29 +0200, Aurelien Jarno wrote:
> On 2016-08-12 12:15, Vincent Lefevre wrote:
> > According to tcpdump output below, there is no truncation: the number
> > of A's and 's (10 for each) match what "host keys.gnupg.net"
> > gives. BTW, even if there were a truncation, there shouldn't be a
> > failure: using of the returned IP addresses would be sufficient to
> > connect.
> 
> That a wrong assumption. The libc getaddrinfo interface is not to
> connect to an IP, but rather to return *all* addresses corresponding to
> a query. The returned IPs are not necessarily used for a connection
> later. 

I was not suggesting not to return all addresses. But in case of error
(which could just be a temporary network error, not necessarily due to
a bug in the nameserver, e.g. due to network congestion), if some of
the IP addresses are known, they could be made available to the calling
application in case they could be useful (e.g. for a connection). If
the application wants all the addresses, it can check error conditions
as usual.

> Not returning all addresses so might lead to data loss or
> security issue.

Well, an application should not base its security on the nameserver.
It is well-known that nameservers can return fake answers.

And I would say that it could be the opposite. Imagine a host with
hundreds of millions of IP addresses...

> The point is that the local resolver is supposed to be working
> correctly.

and the network quality is good, which is not always the case.

> If it doesn't, one can easily setup a local recursive name server
> like unbound.

Unfortunately, this is not a general solution due to buggy ISP's.

> > 11:55:59.097743 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? 
> > keys.gnupg.net. (32)
> > 11:55:59.097796 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ ? 
> > keys.gnupg.net. (32)
> > 11:55:59.098339 IP 192.168.0.6.38010 > 192.168.0.1.domain: 4217+ PTR? 
> > 1.0.168.192.in-addr.arpa. (42)
> > 11:55:59.143100 IP 192.168.0.1.domain > 192.168.0.6.38010: 4217 NXDomain* 
> > 0/1/0 (94)
> > 11:55:59.143325 IP 192.168.0.6.43592 > 192.168.0.1.domain: 23396+ PTR? 
> > 6.0.168.192.in-addr.arpa. (42)
> > 11:55:59.161082 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 
> > CNAME pool.sks-keyservers.net., A 198.128.3.63, A 93.94.119.246, A 
> > 78.46.223.54, A 131.175.15.4, A 151.252.40.184, A 5.9.50.141, A 
> > 209.135.211.141, A 5.135.158.148, A 68.187.0.77, A 193.17.17.6 (502)
> 
> This tcpdump trace doesn't show the answer header, so we don't know if
> the truncation flag is set. That said the 11/9/5 says that the answer
> contains 11 answer records, 9 name server records and 5 additional
> records. This clearly doesn't fit. A normal DNS server would just return
> 11 answers, so 11/0/0.
> 
> That said I just realized that the strace entry in your previous email
> contains the beginning of the answer:
> 
> > 30419 recvfrom(4, 
> > "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, 
> > {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, 
> > [16]) = 500
> 
> Converted into hexadecimal, this is:
>   27 4a 83 80 00 01 00 0b 00 08 00 00 04 6b 65 79
>   73 05 67 6e 75 70 67 03 6e 65 74 00 00 1c 00 01
> 
> 274a is the identification. The flags are 8380 and corresponds to QR,
> TC, RD, RA. Your name server clearly says that the answer is truncated.
> On a working nameserver, the flags are 8180 for this query, so the same
> without the truncation flag.

I don't understand here. You said above "This clearly doesn't fit.",
so that it is normal that the truncation flag is set, isn't it?
Or do you mean that the answer should have been 11/0/0, so that
the truncation flag wouldn't be set as a consequence?

> Even if it is a quite standard setup, you have to admit it doesn't
> behave according to the RFC.

I wonder which part of the RFC you are talking about.

> You should complain to the manufacturer and try to get a firmware
> update.

I'll see what I can do.

> Trying to workaround things on the libc side just gives even less value
> to the RFCs, and encourage selling broken hardware.

I doubt that GNU libc would make any difference. What matters is
how MS-Windows behaves, and probably nowadays Android and iOS too.
Also, if there were conformance tests, e.g. from the Linux
community, this could help. At least the buyers would have a way
to choose, and it could be easier to report issues to the vendors.

> > FYI, I also often get 5-second timeouts in name resolution whatever
> > the host (you can see it above): I get the answer for A or , but
> > sometimes, the other answer is lost. I have a DHCP hook that tests
> > whether I'm using this router:
> > 
> > [...]
> >   ping -n -c 1 -I "$interface" "$new_routers" > /dev/null
> >   if grep -i -q $mac /proc/net/arp; then
> > logger "Google Public DNS with TCP to avoid recurrent timeout"
> > [...]
> 
> This show how broken is your name server. It probably has problem 

Processed: Re: Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-12 Thread Debian Bug Tracking System
Processing control commands:

> retitle -1 libc6: support for non-compliant nameserver should be improved
Bug #834098 [libc6] libc6: name resolution fails for keys.gnupg.net on some 
machines / networks
Changed Bug title to 'libc6: support for non-compliant nameserver should be 
improved' from 'libc6: name resolution fails for keys.gnupg.net on some 
machines / networks'.
> severity -1 wishlist
Bug #834098 [libc6] libc6: support for non-compliant nameserver should be 
improved
Severity set to 'wishlist' from 'normal'

-- 
834098: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=834098
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-12 Thread Aurelien Jarno
control: retitle -1 libc6: support for non-compliant nameserver should be 
improved
control: severity -1 wishlist

On 2016-08-12 12:15, Vincent Lefevre wrote:
> On 2016-08-12 09:26:10 +0200, Aurelien Jarno wrote:
> > The libc does a first connection to the configured name server
> > (192.168.0.1) using UDP. Note the size of the packet, very close to
> > the 512 bytes limit without EDNS0 support. This very likely mean the
> > answer is marked as truncated (look at the number of entries in the
> > host answer).
> 
> According to tcpdump output below, there is no truncation: the number
> of A's and 's (10 for each) match what "host keys.gnupg.net"
> gives. BTW, even if there were a truncation, there shouldn't be a
> failure: using of the returned IP addresses would be sufficient to
> connect.

That a wrong assumption. The libc getaddrinfo interface is not to
connect to an IP, but rather to return *all* addresses corresponding to
a query. The returned IPs are not necessarily used for a connection
later. Not returning all addresses so might lead to data loss or
security issue. On example among other is the forward-confirmed reverse
DNS method used for example by some mail servers. Not returning all IPs
might lead to a rejected or a discarded mail depending on the policy.

The point is that the local resolver is supposed to be working
correctly. If it doesn't, one can easily setup a local recursive name
server like unbound.

> 11:55:59.097743 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? 
> keys.gnupg.net. (32)
> 11:55:59.097796 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ ? 
> keys.gnupg.net. (32)
> 11:55:59.098339 IP 192.168.0.6.38010 > 192.168.0.1.domain: 4217+ PTR? 
> 1.0.168.192.in-addr.arpa. (42)
> 11:55:59.143100 IP 192.168.0.1.domain > 192.168.0.6.38010: 4217 NXDomain* 
> 0/1/0 (94)
> 11:55:59.143325 IP 192.168.0.6.43592 > 192.168.0.1.domain: 23396+ PTR? 
> 6.0.168.192.in-addr.arpa. (42)
> 11:55:59.161082 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 CNAME 
> pool.sks-keyservers.net., A 198.128.3.63, A 93.94.119.246, A 78.46.223.54, A 
> 131.175.15.4, A 151.252.40.184, A 5.9.50.141, A 209.135.211.141, A 
> 5.135.158.148, A 68.187.0.77, A 193.17.17.6 (502)

This tcpdump trace doesn't show the answer header, so we don't know if
the truncation flag is set. That said the 11/9/5 says that the answer
contains 11 answer records, 9 name server records and 5 additional
records. This clearly doesn't fit. A normal DNS server would just return
11 answers, so 11/0/0.

That said I just realized that the strace entry in your previous email
contains the beginning of the answer:

> 30419 recvfrom(4, 
> "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, 
> {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, 
> [16]) = 500

Converted into hexadecimal, this is:
  27 4a 83 80 00 01 00 0b 00 08 00 00 04 6b 65 79
  73 05 67 6e 75 70 67 03 6e 65 74 00 00 1c 00 01

274a is the identification. The flags are 8380 and corresponds to QR,
TC, RD, RA. Your name server clearly says that the answer is truncated.
On a working nameserver, the flags are 8180 for this query, so the same
without the truncation flag.

> > It therefore looks to me like a bug with your network setup, not a
> > libc one.
> 
> Well, though I didn't want that, this is quite a standard network
> setup: my machine just uses DHCP with some standard ADSL modem
> router. And given that many users have similar issues and there
> isn't any problem with Android, I suppose that there's some bug
> on the libc side (or libc can be improved).

Even if it is a quite standard setup, you have to admit it doesn't
behave according to the RFC. You should complain to the manufacturer
and try to get a firmware update.

Trying to workaround things on the libc side just gives even less value
to the RFCs, and encourage selling broken hardware.


> FYI, I also often get 5-second timeouts in name resolution whatever
> the host (you can see it above): I get the answer for A or , but
> sometimes, the other answer is lost. I have a DHCP hook that tests
> whether I'm using this router:
> 
> [...]
>   ping -n -c 1 -I "$interface" "$new_routers" > /dev/null
>   if grep -i -q $mac /proc/net/arp; then
> logger "Google Public DNS with TCP to avoid recurrent timeout"
> [...]

This show how broken is your name server. It probably has problem with
 requests. Note that the RFC explicitly allows to not support some
request types (including  ones), but in that case the router must
provide an answer that it doesn't support it and not simply drop it.
You might want to try to workaround this by using "options
single-request" or "options single-request-reopen" in etc/resolv.conf.

In short it cleary shows that the problem comes from the name server and
not the GNU libc:
- the nameserver set the truncation bit
- the nameserver doesn't answer on the TCP port
- the nameserver sometimes drop  queries

With such a 

Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-12 Thread Vincent Lefevre
On 2016-08-12 09:26:10 +0200, Aurelien Jarno wrote:
> The libc does a first connection to the configured name server
> (192.168.0.1) using UDP. Note the size of the packet, very close to
> the 512 bytes limit without EDNS0 support. This very likely mean the
> answer is marked as truncated (look at the number of entries in the
> host answer).

According to tcpdump output below, there is no truncation: the number
of A's and 's (10 for each) match what "host keys.gnupg.net"
gives. BTW, even if there were a truncation, there shouldn't be a
failure: using of the returned IP addresses would be sufficient to
connect.

11:55:59.097743 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? 
keys.gnupg.net. (32)
11:55:59.097796 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ ? 
keys.gnupg.net. (32)
11:55:59.098339 IP 192.168.0.6.38010 > 192.168.0.1.domain: 4217+ PTR? 
1.0.168.192.in-addr.arpa. (42)
11:55:59.143100 IP 192.168.0.1.domain > 192.168.0.6.38010: 4217 NXDomain* 0/1/0 
(94)
11:55:59.143325 IP 192.168.0.6.43592 > 192.168.0.1.domain: 23396+ PTR? 
6.0.168.192.in-addr.arpa. (42)
11:55:59.161082 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 CNAME 
pool.sks-keyservers.net., A 198.128.3.63, A 93.94.119.246, A 78.46.223.54, A 
131.175.15.4, A 151.252.40.184, A 5.9.50.141, A 209.135.211.141, A 
5.135.158.148, A 68.187.0.77, A 193.17.17.6 (502)
11:55:59.184491 IP 192.168.0.1.domain > 192.168.0.6.43592: 23396 NXDomain* 
0/1/0 (94)
11:56:04.102206 IP 192.168.0.6.41008 > 192.168.0.1.domain: 60367+ A? 
keys.gnupg.net. (32)
11:56:04.141278 ARP, Request who-has 192.168.0.6 tell 192.168.0.1, length 28
11:56:04.141296 ARP, Reply 192.168.0.6 is-at cc:3d:82:a9:e3:ea (oui Unknown), 
length 28
11:56:04.144746 IP 192.168.0.1.domain > 192.168.0.6.41008: 60367 11/9/5 CNAME 
pool.sks-keyservers.net., A 193.17.17.6, A 68.187.0.77, A 5.135.158.148, A 
151.252.40.184, A 198.128.3.63, A 78.46.223.54, A 131.175.15.4, A 
209.135.211.141, A 5.9.50.141, A 93.94.119.246 (502)
11:56:04.144795 IP 192.168.0.6.41008 > 192.168.0.1.domain: 31606+ ? 
keys.gnupg.net. (32)
11:56:04.186687 IP 192.168.0.1.domain > 192.168.0.6.41008: 31606| 11/8/0 CNAME 
pool.sks-keyservers.net.,  2a01:7e00::f03c:91ff:fe69:8da9,  
2a05:8b81:1000:76::d239,  2001:6f8:1c3c:babe::62:1,  
2001:4c80:40:628:5c70:d1ff:fe44:1424,  2001:67c:26b4::2c6b,  
2a01:4f8:161:4283:1000::203,  2a02:c200:1:10:2:6:4251:1,  
2001:720:418:caf1::8,  2001:470:d:367::555,  2a01:7a0:1::6 (500)
11:56:04.186787 IP 192.168.0.6.36060 > 192.168.0.1.domain: Flags [S], seq 
206201484, win 29200, options [mss 1460,sackOK,TS val 69369420 ecr 0,nop,wscale 
7], length 0
11:56:04.188240 IP 192.168.0.1.domain > 192.168.0.6.36060: Flags [R.], seq 0, 
ack 206201485, win 0, length 0
11:56:04.188296 IP 192.168.0.6.36366 > 192.168.0.1.domain: 19382+ A? 
keys.gnupg.net. (32)
11:56:04.229939 IP 192.168.0.1.domain > 192.168.0.6.36366: 19382 11/9/5 CNAME 
pool.sks-keyservers.net., A 93.94.119.246, A 209.135.211.141, A 78.46.223.54, A 
5.135.158.148, A 151.252.40.184, A 5.9.50.141, A 193.17.17.6, A 131.175.15.4, A 
198.128.3.63, A 68.187.0.77 (502)
11:56:04.229992 IP 192.168.0.6.36366 > 192.168.0.1.domain: 13056+ ? 
keys.gnupg.net. (32)
11:56:04.271424 IP 192.168.0.1.domain > 192.168.0.6.36366: 13056| 11/8/0 CNAME 
pool.sks-keyservers.net.,  2a01:7e00::f03c:91ff:fe69:8da9,  
2a02:c200:1:10:2:6:4251:1,  2001:67c:26b4::2c6b,  
2a05:8b81:1000:76::d239,  2001:470:d:367::555,  
2001:4c80:40:628:5c70:d1ff:fe44:1424,  2001:6f8:1c3c:babe::62:1,  
2a01:7a0:1::6,  2a01:4f8:161:4283:1000::203,  2001:720:418:caf1::8 (500)
11:56:04.271501 IP 192.168.0.6.36062 > 192.168.0.1.domain: Flags [S], seq 
3864689937, win 29200, options [mss 1460,sackOK,TS val 69369441 ecr 
0,nop,wscale 7], length 0
11:56:04.272936 IP 192.168.0.1.domain > 192.168.0.6.36062: Flags [R.], seq 0, 
ack 3864689938, win 0, length 0

> It therefore looks to me like a bug with your network setup, not a
> libc one.

Well, though I didn't want that, this is quite a standard network
setup: my machine just uses DHCP with some standard ADSL modem
router. And given that many users have similar issues and there
isn't any problem with Android, I suppose that there's some bug
on the libc side (or libc can be improved).

FYI, I also often get 5-second timeouts in name resolution whatever
the host (you can see it above): I get the answer for A or , but
sometimes, the other answer is lost. I have a DHCP hook that tests
whether I'm using this router:

[...]
  ping -n -c 1 -I "$interface" "$new_routers" > /dev/null
  if grep -i -q $mac /proc/net/arp; then
logger "Google Public DNS with TCP to avoid recurrent timeout"
[...]

and in this case, I use the Google Public DNS with TCP. But for
some reason, it seems that the /proc/net/arp doesn't contain the
MAC address yet (with the ping, this was working in the past, but
this is no longer the case), so that I just get the 

Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-12 Thread Aurelien Jarno
On 2016-08-12 09:26, Aurelien Jarno wrote:
> On 2016-08-11 23:33, Vincent Lefevre wrote:
> > Package: libc6
> > Version: 2.23-4
> > Severity: normal
> > 
> > I always get the folloing error on this machine:
> > 
> > zira:~> gpg --keyserver-options verbose,debug --keyserver 
> > hkp://keys.gnupg.net --recv-key 
> > gpg: requesting key  from hkp server keys.gnupg.net
> > gpgkeys: curl version = GnuPG curl-shim
> > * HTTP proxy is "null"
> > * HTTP URL is 
> > "http://keys.gnupg.net:11371/pks/lookup?op=get=mr=0x;
> > * SRV tag is "pgpkey-http": host and port may be overridden
> > * HTTP auth is "null"
> > * HTTP method is GET
> > ?: keys.gnupg.net: Host not found
> > gpgkeys: HTTP fetch error 7: couldn't connect: Connection refused
> > gpg: no valid OpenPGP data found.
> > gpg: Total number processed: 0
> > gpg: keyserver communications error: keyserver unreachable
> > gpg: keyserver communications error: public key not found
> > gpg: keyserver receive failed: public key not found
> > zsh: exit 2 gpg --keyserver-options verbose,debug --keyserver 
> > hkp://keys.gnupg.net  
> > 
> > even though the host exists:
> > 
> > zira:~> host keys.gnupg.net
> > keys.gnupg.net is an alias for pool.sks-keyservers.net.
> > pool.sks-keyservers.net has address 5.9.50.141
> > pool.sks-keyservers.net has address 91.143.92.136
> > pool.sks-keyservers.net has address 108.18.103.116
> > pool.sks-keyservers.net has address 131.155.141.70
> > pool.sks-keyservers.net has address 85.10.205.199
> > pool.sks-keyservers.net has address 163.172.29.20
> > pool.sks-keyservers.net has address 104.236.209.43
> > pool.sks-keyservers.net has address 84.200.66.125
> > pool.sks-keyservers.net has address 5.9.143.170
> > pool.sks-keyservers.net has address 185.95.216.79
> > pool.sks-keyservers.net has IPv6 address 2607:5300:60:490f:1::19
> > pool.sks-keyservers.net has IPv6 address 2604:a880:800:10::227:e001
> > pool.sks-keyservers.net has IPv6 address 2001:41d0:2:55c2:5054:ff:fe12:3
> > pool.sks-keyservers.net has IPv6 address 2a01:4f8:a0:4024::2:0
> > pool.sks-keyservers.net has IPv6 address 2a02:180:a:65:2456:6542:1101:1010
> > pool.sks-keyservers.net has IPv6 address 2a01:4f8:d16:24c1::2
> > pool.sks-keyservers.net has IPv6 address 2a01:7c8:aabc:45a:5054:ff:fe9b:59a3
> > pool.sks-keyservers.net has IPv6 address 2001:470:d:367::555
> > pool.sks-keyservers.net has IPv6 address 2a01:4f8:161:4283:1000::203
> > pool.sks-keyservers.net has IPv6 address 
> > 2001:4c80:40:628:5c70:d1ff:fe44:1424
> > 
> > This seems to be a known issue:
> > 
> >   https://lists.gnupg.org/pipermail/gnupg-users/2015-October/054532.html
> > 
> > (searching for "keys.gnupg.net: Host not found" gives much more
> > examples).
> > 
> > I wondered whether this was a bug from gnupg, until I tried:
> > 
> > zira:~> ping keys.gnupg.net
> > ping: keys.gnupg.net: Temporary failure in name resolution
> > zsh: exit 2 ping keys.gnupg.net
> > 
> > which I always get. Ditto with telnet.
> > 
> > An excerpt of the strace for gnupg:
> > 
> > 30419 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
> > 30419 socket(AF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4
> > 30419 connect(4, {sa_family=AF_INET, sin_port=htons(53), 
> > sin_addr=inet_addr("192.168.0.1")}, 16) = 0
> > 30419 poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
> > 30419 sendmmsg(4, {{{msg_name(0)=NULL, 
> > msg_iov(1)=[{"\343\376\1\0\0\1\0\0\0\0\0\0\4keys\5gnupg\3net\0\0\1\0\1", 
> > 32}], msg_controllen=0, msg_flags=0}, 32}, {{msg_name(0)=NULL, 
> > msg_iov(1)=[{"'J\1\0\0\1\0\0\0\0\0\0\4keys\5gnupg\3net\0\0\34\0\1", 32}], 
> > msg_controllen=0, msg_flags=0}, 32}}, 2, MSG_NOSIGNAL) = 2
> > 30419 poll([{fd=4, events=POLLIN}], 1, 5000) = 1 ([{fd=4, revents=POLLIN}])
> > 30419 ioctl(4, FIONREAD, [500]) = 0
> > 30419 recvfrom(4, 
> > "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, 
> > {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, 
> > [16]) = 500
> > 30419 close(4)  = 0
> 
> The libc does a first connection to the configured name server
> (192.168.0.1) using UDP. Note the size of the packet, very close to
> the 512 bytes limit without EDNS0 support. This very likely mean the
> answer is marked as truncated (look at the number of entries in the
> host answer).

It would be interesting to see what is actually returned by your
name server (for example with tcpdump), as it seems despite a big number
of records given the A and  records are done in two different
queries, they should fit in less than 512 bytes. This is actually what
the trace from your working server shows. I wouldn't be surprised if
this name server returns additional records that have not been
requested.

I guess you can also workaround the issue by activating edns0 (adding an
"options edns0" line in /etc/resolv.conf) which allows UDP queries
bigger than 512 bytes. This however requires that your 

Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-12 Thread Aurelien Jarno
On 2016-08-11 23:33, Vincent Lefevre wrote:
> Package: libc6
> Version: 2.23-4
> Severity: normal
> 
> I always get the folloing error on this machine:
> 
> zira:~> gpg --keyserver-options verbose,debug --keyserver 
> hkp://keys.gnupg.net --recv-key 
> gpg: requesting key  from hkp server keys.gnupg.net
> gpgkeys: curl version = GnuPG curl-shim
> * HTTP proxy is "null"
> * HTTP URL is 
> "http://keys.gnupg.net:11371/pks/lookup?op=get=mr=0x;
> * SRV tag is "pgpkey-http": host and port may be overridden
> * HTTP auth is "null"
> * HTTP method is GET
> ?: keys.gnupg.net: Host not found
> gpgkeys: HTTP fetch error 7: couldn't connect: Connection refused
> gpg: no valid OpenPGP data found.
> gpg: Total number processed: 0
> gpg: keyserver communications error: keyserver unreachable
> gpg: keyserver communications error: public key not found
> gpg: keyserver receive failed: public key not found
> zsh: exit 2 gpg --keyserver-options verbose,debug --keyserver 
> hkp://keys.gnupg.net  
> 
> even though the host exists:
> 
> zira:~> host keys.gnupg.net
> keys.gnupg.net is an alias for pool.sks-keyservers.net.
> pool.sks-keyservers.net has address 5.9.50.141
> pool.sks-keyservers.net has address 91.143.92.136
> pool.sks-keyservers.net has address 108.18.103.116
> pool.sks-keyservers.net has address 131.155.141.70
> pool.sks-keyservers.net has address 85.10.205.199
> pool.sks-keyservers.net has address 163.172.29.20
> pool.sks-keyservers.net has address 104.236.209.43
> pool.sks-keyservers.net has address 84.200.66.125
> pool.sks-keyservers.net has address 5.9.143.170
> pool.sks-keyservers.net has address 185.95.216.79
> pool.sks-keyservers.net has IPv6 address 2607:5300:60:490f:1::19
> pool.sks-keyservers.net has IPv6 address 2604:a880:800:10::227:e001
> pool.sks-keyservers.net has IPv6 address 2001:41d0:2:55c2:5054:ff:fe12:3
> pool.sks-keyservers.net has IPv6 address 2a01:4f8:a0:4024::2:0
> pool.sks-keyservers.net has IPv6 address 2a02:180:a:65:2456:6542:1101:1010
> pool.sks-keyservers.net has IPv6 address 2a01:4f8:d16:24c1::2
> pool.sks-keyservers.net has IPv6 address 2a01:7c8:aabc:45a:5054:ff:fe9b:59a3
> pool.sks-keyservers.net has IPv6 address 2001:470:d:367::555
> pool.sks-keyservers.net has IPv6 address 2a01:4f8:161:4283:1000::203
> pool.sks-keyservers.net has IPv6 address 2001:4c80:40:628:5c70:d1ff:fe44:1424
> 
> This seems to be a known issue:
> 
>   https://lists.gnupg.org/pipermail/gnupg-users/2015-October/054532.html
> 
> (searching for "keys.gnupg.net: Host not found" gives much more
> examples).
> 
> I wondered whether this was a bug from gnupg, until I tried:
> 
> zira:~> ping keys.gnupg.net
> ping: keys.gnupg.net: Temporary failure in name resolution
> zsh: exit 2 ping keys.gnupg.net
> 
> which I always get. Ditto with telnet.
> 
> An excerpt of the strace for gnupg:
> 
> 30419 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
> 30419 socket(AF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4
> 30419 connect(4, {sa_family=AF_INET, sin_port=htons(53), 
> sin_addr=inet_addr("192.168.0.1")}, 16) = 0
> 30419 poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
> 30419 sendmmsg(4, {{{msg_name(0)=NULL, 
> msg_iov(1)=[{"\343\376\1\0\0\1\0\0\0\0\0\0\4keys\5gnupg\3net\0\0\1\0\1", 
> 32}], msg_controllen=0, msg_flags=0}, 32}, {{msg_name(0)=NULL, 
> msg_iov(1)=[{"'J\1\0\0\1\0\0\0\0\0\0\4keys\5gnupg\3net\0\0\34\0\1", 32}], 
> msg_controllen=0, msg_flags=0}, 32}}, 2, MSG_NOSIGNAL) = 2
> 30419 poll([{fd=4, events=POLLIN}], 1, 5000) = 1 ([{fd=4, revents=POLLIN}])
> 30419 ioctl(4, FIONREAD, [500]) = 0
> 30419 recvfrom(4, 
> "'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, 
> {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, 
> [16]) = 500
> 30419 close(4)  = 0

The libc does a first connection to the configured name server
(192.168.0.1) using UDP. Note the size of the packet, very close to
the 512 bytes limit without EDNS0 support. This very likely mean the
answer is marked as truncated (look at the number of entries in the
host answer).

> 30419 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
> 30419 connect(4, {sa_family=AF_INET, sin_port=htons(53), 
> sin_addr=inet_addr("192.168.0.1")}, 16) = -1 ECONNREFUSED (Connection refused)
> 30419 close(4)  = 0
> 30419 socket(AF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4

The query is then done using TCP to get the full answer. However the
name server refuses the connection.

> On a machine, where this works:
> 
> 20726 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=114, ...}) = 0
> 20726 socket(AF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4
> 20726 connect(4, {sa_family=AF_INET, sin_port=htons(53), 
> sin_addr=inet_addr("140.77.1.32")}, 16) = 0
> 20726 poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
> 20726 sendmmsg(4, {{{msg_name(0)=NULL, 
> 

Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-11 Thread Vincent Lefevre
On 2016-08-11 23:33:30 +0200, Vincent Lefevre wrote:
[...]
> I wondered whether this was a bug from gnupg, until I tried:
> 
> zira:~> ping keys.gnupg.net
> ping: keys.gnupg.net: Temporary failure in name resolution
> zsh: exit 2 ping keys.gnupg.net
> 
> which I always get. Ditto with telnet.
[...]

I'd add that there's no such name resolution failure under Android
(same network, same name server 192.168.0.1).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#834098: libc6: name resolution fails for keys.gnupg.net on some machines / networks

2016-08-11 Thread Vincent Lefevre
Package: libc6
Version: 2.23-4
Severity: normal

I always get the folloing error on this machine:

zira:~> gpg --keyserver-options verbose,debug --keyserver hkp://keys.gnupg.net 
--recv-key 
gpg: requesting key  from hkp server keys.gnupg.net
gpgkeys: curl version = GnuPG curl-shim
* HTTP proxy is "null"
* HTTP URL is 
"http://keys.gnupg.net:11371/pks/lookup?op=get=mr=0x;
* SRV tag is "pgpkey-http": host and port may be overridden
* HTTP auth is "null"
* HTTP method is GET
?: keys.gnupg.net: Host not found
gpgkeys: HTTP fetch error 7: couldn't connect: Connection refused
gpg: no valid OpenPGP data found.
gpg: Total number processed: 0
gpg: keyserver communications error: keyserver unreachable
gpg: keyserver communications error: public key not found
gpg: keyserver receive failed: public key not found
zsh: exit 2 gpg --keyserver-options verbose,debug --keyserver 
hkp://keys.gnupg.net  

even though the host exists:

zira:~> host keys.gnupg.net
keys.gnupg.net is an alias for pool.sks-keyservers.net.
pool.sks-keyservers.net has address 5.9.50.141
pool.sks-keyservers.net has address 91.143.92.136
pool.sks-keyservers.net has address 108.18.103.116
pool.sks-keyservers.net has address 131.155.141.70
pool.sks-keyservers.net has address 85.10.205.199
pool.sks-keyservers.net has address 163.172.29.20
pool.sks-keyservers.net has address 104.236.209.43
pool.sks-keyservers.net has address 84.200.66.125
pool.sks-keyservers.net has address 5.9.143.170
pool.sks-keyservers.net has address 185.95.216.79
pool.sks-keyservers.net has IPv6 address 2607:5300:60:490f:1::19
pool.sks-keyservers.net has IPv6 address 2604:a880:800:10::227:e001
pool.sks-keyservers.net has IPv6 address 2001:41d0:2:55c2:5054:ff:fe12:3
pool.sks-keyservers.net has IPv6 address 2a01:4f8:a0:4024::2:0
pool.sks-keyservers.net has IPv6 address 2a02:180:a:65:2456:6542:1101:1010
pool.sks-keyservers.net has IPv6 address 2a01:4f8:d16:24c1::2
pool.sks-keyservers.net has IPv6 address 2a01:7c8:aabc:45a:5054:ff:fe9b:59a3
pool.sks-keyservers.net has IPv6 address 2001:470:d:367::555
pool.sks-keyservers.net has IPv6 address 2a01:4f8:161:4283:1000::203
pool.sks-keyservers.net has IPv6 address 2001:4c80:40:628:5c70:d1ff:fe44:1424

This seems to be a known issue:

  https://lists.gnupg.org/pipermail/gnupg-users/2015-October/054532.html

(searching for "keys.gnupg.net: Host not found" gives much more
examples).

I wondered whether this was a bug from gnupg, until I tried:

zira:~> ping keys.gnupg.net
ping: keys.gnupg.net: Temporary failure in name resolution
zsh: exit 2 ping keys.gnupg.net

which I always get. Ditto with telnet.

An excerpt of the strace for gnupg:

30419 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
30419 socket(AF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4
30419 connect(4, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("192.168.0.1")}, 16) = 0
30419 poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
30419 sendmmsg(4, {{{msg_name(0)=NULL, 
msg_iov(1)=[{"\343\376\1\0\0\1\0\0\0\0\0\0\4keys\5gnupg\3net\0\0\1\0\1", 32}], 
msg_controllen=0, msg_flags=0}, 32}, {{msg_name(0)=NULL, 
msg_iov(1)=[{"'J\1\0\0\1\0\0\0\0\0\0\4keys\5gnupg\3net\0\0\34\0\1", 32}], 
msg_controllen=0, msg_flags=0}, 32}}, 2, MSG_NOSIGNAL) = 2
30419 poll([{fd=4, events=POLLIN}], 1, 5000) = 1 ([{fd=4, revents=POLLIN}])
30419 ioctl(4, FIONREAD, [500]) = 0
30419 recvfrom(4, 
"'J\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, 
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, 
[16]) = 500
30419 close(4)  = 0
30419 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
30419 connect(4, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("192.168.0.1")}, 16) = -1 ECONNREFUSED (Connection refused)
30419 close(4)  = 0
30419 socket(AF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4
30419 connect(4, {sa_family=AF_INET, sin_port=htons(53), 
sin_addr=inet_addr("192.168.0.1")}, 16) = 0
30419 poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
30419 sendmmsg(4, {{{msg_name(0)=NULL, 
msg_iov(1)=[{"\227\30\1\0\0\1\0\0\0\0\0\0\4keys\5gnupg\3net\0\0\1\0\1", 32}], 
msg_controllen=0, 
msg_flags=MSG_RST|MSG_ERRQUEUE|MSG_NOSIGNAL|MSG_MORE|MSG_WAITFORONE|MSG_BATCH|MSG_FASTOPEN|0x8c7a},
 32}, {{msg_name(0)=NULL, 
msg_iov(1)=[{"\347\360\1\0\0\1\0\0\0\0\0\0\4keys\5gnupg\3net\0\0\34\0\1", 32}], 
msg_controllen=0, 
msg_flags=MSG_OOB|MSG_PEEK|MSG_CTRUNC|MSG_WAITALL|MSG_FIN|MSG_SYN|MSG_CONFIRM|MSG_WAITFORONE|MSG_BATCH|MSG_FASTOPEN|0x85b0},
 32}}, 2, MSG_NOSIGNAL) = 2
30419 poll([{fd=4, events=POLLIN}], 1, 5000) = 1 ([{fd=4, revents=POLLIN}])
30419 ioctl(4, FIONREAD, [500]) = 0
30419 recvfrom(4, 
"\347\360\203\200\0\1\0\v\0\10\0\0\4keys\5gnupg\3net\0\0\34\0\1"..., 2048, 0, 
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, 
[16]) = 500
30419 close(4)  =