Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-05 Thread Luca Tettamanti
On Tue, May 5, 2009 at 12:38 AM, Bastian Blank wa...@debian.org wrote:
 On Mon, May 04, 2009 at 11:55:22PM +0200, Luca Tettamanti wrote:
 To recap: my ADSL router receives two requests and sends back *two*
 answers; to the A query it replies with the expected data, to the 
 query it replies NotImpl (see the tcpdump in the first email).

 I doubt that this behaviour is allowed. Not implemented is a response
 to the query type, aka standard query or inverse query, not to the
 contents of the query.[1]

I see, I though that not implemented was linked to the qtype, not to
the opcode. So, yes, the router is broken, thanks for explaining.

 This behaviour also violates the transparency considerations.[2]

I'm pretty sure that my router predates that RFC :P

Luca



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-04 Thread Aurelien Jarno
On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote:
 On Mon, May 4, 2009 at 6:45 AM, Aurelien Jarno aurel...@aurel32.net wrote:
  On Sun, May 03, 2009 at 08:54:07PM +0200, Luca Tettamanti wrote:
  Package: libc6
  Version: 2.9-9
  Severity: normal
 
  Hello, I was affected by the resolver bug that was solved in 2.9-7; as of 
  2.9-9
  the resolver stopped working again. The automatic workaround that is 
  mentioned
  in the changelog is not working, and single-request in resolv.conf 
  doesn't
  seem to have any effect either.
 
  Here's a dump of the resolver trying to get the address of google.com:
 
  14:11:00.754265 IP (tos 0x0, ttl 64, id 45448, offset 0, flags [DF], proto 
  UDP (17), length 60)
      10.0.0.3.60486  10.0.0.138.53: [udp sum ok] 39108+ A? www.google.com. 
  (32)
  14:11:00.754303 IP (tos 0x0, ttl 64, id 45449, offset 0, flags [DF], proto 
  UDP (17), length 60)
      10.0.0.3.60486  10.0.0.138.53: [udp sum ok] 48015+ ? 
  www.google.com. (32)
  14:11:00.759312 IP (tos 0x0, ttl 64, id 1324, offset 0, flags [none], 
  proto UDP (17), length 60)
      10.0.0.138.53  10.0.0.3.60486: [udp sum ok] 48015 NotImp q: ? 
  www.google.com. 0/0/0 (32)
  14:11:00.817710 IP (tos 0x0, ttl 64, id 1325, offset 0, flags [none], 
  proto UDP (17), length 144)
      10.0.0.138.53  10.0.0.3.60486: 39108 q: A? www.google.com. 5/0/0 
  www.google.com. CNAME www.l.google.com.[|domain]
 
  The DNS server (it's my ADSL router) responds NotImp to the  query (it 
  does
  not support IPv6). The reply (CNAME) to the A query seems correct though.
 
 
  Could you please try the glibc from http://temp.aurel32.net/glibc-test/ ?
  I have backported a few more patch from upstream, but I have no way to
  know if they change something or not.
 
 The option single-request works, the automagic workaround does not,

That's a good news.

 i.e. I always see the two requests going out in parallel.
 Actually I'm not sure I understand how it's supposes to work: if the
 first request fails usually the caller gives up, no?

The first request done by a program should timeout, and the second
request by the same program should then be done sequentially, like when
single-request is set.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-04 Thread Luca Tettamanti
On Mon, May 4, 2009 at 6:45 AM, Aurelien Jarno aurel...@aurel32.net wrote:
 On Sun, May 03, 2009 at 08:54:07PM +0200, Luca Tettamanti wrote:
 Package: libc6
 Version: 2.9-9
 Severity: normal

 Hello, I was affected by the resolver bug that was solved in 2.9-7; as of 
 2.9-9
 the resolver stopped working again. The automatic workaround that is 
 mentioned
 in the changelog is not working, and single-request in resolv.conf doesn't
 seem to have any effect either.

 Here's a dump of the resolver trying to get the address of google.com:

 14:11:00.754265 IP (tos 0x0, ttl 64, id 45448, offset 0, flags [DF], proto 
 UDP (17), length 60)
     10.0.0.3.60486  10.0.0.138.53: [udp sum ok] 39108+ A? www.google.com. 
 (32)
 14:11:00.754303 IP (tos 0x0, ttl 64, id 45449, offset 0, flags [DF], proto 
 UDP (17), length 60)
     10.0.0.3.60486  10.0.0.138.53: [udp sum ok] 48015+ ? 
 www.google.com. (32)
 14:11:00.759312 IP (tos 0x0, ttl 64, id 1324, offset 0, flags [none], proto 
 UDP (17), length 60)
     10.0.0.138.53  10.0.0.3.60486: [udp sum ok] 48015 NotImp q: ? 
 www.google.com. 0/0/0 (32)
 14:11:00.817710 IP (tos 0x0, ttl 64, id 1325, offset 0, flags [none], proto 
 UDP (17), length 144)
     10.0.0.138.53  10.0.0.3.60486: 39108 q: A? www.google.com. 5/0/0 
 www.google.com. CNAME www.l.google.com.[|domain]

 The DNS server (it's my ADSL router) responds NotImp to the  query (it 
 does
 not support IPv6). The reply (CNAME) to the A query seems correct though.


 Could you please try the glibc from http://temp.aurel32.net/glibc-test/ ?
 I have backported a few more patch from upstream, but I have no way to
 know if they change something or not.

The option single-request works, the automagic workaround does not,
i.e. I always see the two requests going out in parallel.
Actually I'm not sure I understand how it's supposes to work: if the
first request fails usually the caller gives up, no?

Luca



--
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-04 Thread Luca Tettamanti
On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno aurel...@aurel32.net wrote:
 On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote:
 The option single-request works, the automagic workaround does not,

 That's a good news.

 i.e. I always see the two requests going out in parallel.
 Actually I'm not sure I understand how it's supposes to work: if the
 first request fails usually the caller gives up, no?

 The first request done by a program should timeout, and the second
 request by the same program should then be done sequentially, like when
 single-request is set.

That's not what is happening though. I try to open a page in konqueror
(I also tried other programs, it's not specific to konqueror): I see
two request (A and ) going out at the same time; konqueror says
it's unable to resolve the address - so far so good. I try to reload
the page and I still see both requests going out at the same time
(failure again). With single-request I see the first query, its answer
and only then the second query and its reply - as expected.
Furthermore I fear the workaround won't work for one-shot programs,
like apt helpers, right?

Luca



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-04 Thread Aurelien Jarno
On Mon, May 04, 2009 at 10:32:09PM +0200, Luca Tettamanti wrote:
 On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno aurel...@aurel32.net wrote:
  On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote:
  The option single-request works, the automagic workaround does not,
 
  That's a good news.
 
  i.e. I always see the two requests going out in parallel.
  Actually I'm not sure I understand how it's supposes to work: if the
  first request fails usually the caller gives up, no?
 
  The first request done by a program should timeout, and the second
  request by the same program should then be done sequentially, like when
  single-request is set.
 
 That's not what is happening though. I try to open a page in konqueror
 (I also tried other programs, it's not specific to konqueror): I see
 two request (A and ) going out at the same time; konqueror says
 it's unable to resolve the address - so far so good. I try to reload

That's the problem. When I say it should timeout, I mean it should take
long time to resolve, but at the end an answer should be returned.

 the page and I still see both requests going out at the same time
 (failure again). With single-request I see the first query, its answer
 and only then the second query and its reply - as expected.
 Furthermore I fear the workaround won't work for one-shot programs,
 like apt helpers, right?
 

It should work, with just a longer timeout.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-04 Thread Aurelien Jarno
On Mon, May 04, 2009 at 11:55:22PM +0200, Luca Tettamanti wrote:
 On Mon, May 4, 2009 at 10:57 PM, Aurelien Jarno aurel...@aurel32.net wrote:
  On Mon, May 04, 2009 at 10:32:09PM +0200, Luca Tettamanti wrote:
  On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno aurel...@aurel32.net 
  wrote:
   On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote:
   The option single-request works, the automagic workaround does not,
  
   That's a good news.
  
   i.e. I always see the two requests going out in parallel.
   Actually I'm not sure I understand how it's supposes to work: if the
   first request fails usually the caller gives up, no?
  
   The first request done by a program should timeout, and the second
   request by the same program should then be done sequentially, like when
   single-request is set.
 
  That's not what is happening though. I try to open a page in konqueror
  (I also tried other programs, it's not specific to konqueror): I see
  two request (A and ) going out at the same time; konqueror says
  it's unable to resolve the address - so far so good. I try to reload
 
  That's the problem. When I say it should timeout, I mean it should take
  long time to resolve, but at the end an answer should be returned.
 
 Ah ok, __libc_res_nsend should to statp-retry queries, which by
 default is 2 (confirmed by gdb).
 send_dg() returns 1 (reply), the socket is then closed; return value
 is 1 and control goes back to
 __libc_res_nquery.
 
 At this point we have the two answers:
 
 hp = {id = 27765, rd = 1, tc = 1, aa = 0, opcode = 5, qr = 0, rcode =
 1, cd = 0, ad = 0, unused = 0, ra = 1,
   qdcount = 128, ancount = 1, nscount = 2, arcount = 0}
 hp2 = {id = 6, rd = 1, tc = 0, aa = 0, opcode = 0, qr = 1, rcode =
 0, cd = 0, ad = 0, unused = 0, ra = 1,
   qdcount = 256, ancount = 512, nscount = 0, arcount = 0}
 
 The error in first one is FORMERR (I'd expect NOTIMP...), which is
 treated as an unrecoverable failure even if the second one succeeded.
 answer contains a 76bytes of reply:
 
 756c 2b81 8000 0100 0200  0003 6674  ul+...ft
 7002 6974 0664 6562 6961 6e03 6f72 6700  p.it.debian.org.
 0001 0001 c00c 0005 0001  02f5 000d  
 0366 7470 0462 6f66 6802 6974 00c0 2f00  .ftp.bofh.it../.
 0100 0100 0063 9400 04d5 5c08.c\.
 
 which seems sensible to me (ftp.it.debian.org is the name that I
 request, and it's a CNAME for ftp.bofh.it).
 
 To recap: my ADSL router receives two requests and sends back *two*
 answers; to the A query it replies with the expected data, to the 
 query it replies NotImpl (see the tcpdump in the first email). When
 both queries and sent in parallel __libc_res_nquery consider an
 unrecoverable failure an error in any of the two (even if one of them
 is valid): the logic should be reversed: the query was successful if
 we get _at least_ one response.

That's probably why some people reported success with this version of
the code, as in their case only one answer is received.

The problem is that the glibc considers NotImpl as an unrecoverable
failure, as it applies to the opcode type (here query), not to the
content of the query. Taking into account every possible answer
engineers have imagined to says that a DNS software does not support 
 query (while there is a correct way to say that) is not something
easy to do in the glibc.

 After careful pondering I think that my router is actually sane  (IOW,
 it doesn't discard  requests - it correctly replies that it does
 not support that query); I think that it's a real bug in glibc.

Your router is buggy. According to the RFC it answers that it does not
support queries.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-04 Thread Luca Tettamanti
On Mon, May 4, 2009 at 10:57 PM, Aurelien Jarno aurel...@aurel32.net wrote:
 On Mon, May 04, 2009 at 10:32:09PM +0200, Luca Tettamanti wrote:
 On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno aurel...@aurel32.net wrote:
  On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote:
  The option single-request works, the automagic workaround does not,
 
  That's a good news.
 
  i.e. I always see the two requests going out in parallel.
  Actually I'm not sure I understand how it's supposes to work: if the
  first request fails usually the caller gives up, no?
 
  The first request done by a program should timeout, and the second
  request by the same program should then be done sequentially, like when
  single-request is set.

 That's not what is happening though. I try to open a page in konqueror
 (I also tried other programs, it's not specific to konqueror): I see
 two request (A and ) going out at the same time; konqueror says
 it's unable to resolve the address - so far so good. I try to reload

 That's the problem. When I say it should timeout, I mean it should take
 long time to resolve, but at the end an answer should be returned.

Ah ok, __libc_res_nsend should to statp-retry queries, which by
default is 2 (confirmed by gdb).
send_dg() returns 1 (reply), the socket is then closed; return value
is 1 and control goes back to
__libc_res_nquery.

At this point we have the two answers:

hp = {id = 27765, rd = 1, tc = 1, aa = 0, opcode = 5, qr = 0, rcode =
1, cd = 0, ad = 0, unused = 0, ra = 1,
  qdcount = 128, ancount = 1, nscount = 2, arcount = 0}
hp2 = {id = 6, rd = 1, tc = 0, aa = 0, opcode = 0, qr = 1, rcode =
0, cd = 0, ad = 0, unused = 0, ra = 1,
  qdcount = 256, ancount = 512, nscount = 0, arcount = 0}

The error in first one is FORMERR (I'd expect NOTIMP...), which is
treated as an unrecoverable failure even if the second one succeeded.
answer contains a 76bytes of reply:

756c 2b81 8000 0100 0200  0003 6674  ul+...ft
7002 6974 0664 6562 6961 6e03 6f72 6700  p.it.debian.org.
0001 0001 c00c 0005 0001  02f5 000d  
0366 7470 0462 6f66 6802 6974 00c0 2f00  .ftp.bofh.it../.
0100 0100 0063 9400 04d5 5c08.c\.

which seems sensible to me (ftp.it.debian.org is the name that I
request, and it's a CNAME for ftp.bofh.it).

To recap: my ADSL router receives two requests and sends back *two*
answers; to the A query it replies with the expected data, to the 
query it replies NotImpl (see the tcpdump in the first email). When
both queries and sent in parallel __libc_res_nquery consider an
unrecoverable failure an error in any of the two (even if one of them
is valid): the logic should be reversed: the query was successful if
we get _at least_ one response.

After careful pondering I think that my router is actually sane  (IOW,
it doesn't discard  requests - it correctly replies that it does
not support that query); I think that it's a real bug in glibc.

Luca



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-04 Thread Bastian Blank
On Mon, May 04, 2009 at 11:55:22PM +0200, Luca Tettamanti wrote:
 To recap: my ADSL router receives two requests and sends back *two*
 answers; to the A query it replies with the expected data, to the 
 query it replies NotImpl (see the tcpdump in the first email).

I doubt that this behaviour is allowed. Not implemented is a response
to the query type, aka standard query or inverse query, not to the
contents of the query.[1]

This behaviour also violates the transparency considerations.[2]

Bastian

[1]: http://tools.ietf.org/html/rfc1035
[2]: http://tools.ietf.org/html/rfc3597

-- 
We have the right to survive!
Not by killing others.
-- Deela and Kirk, Wink of An Eye, stardate 5710.5



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-03 Thread Luca Tettamanti
Package: libc6
Version: 2.9-9
Severity: normal

Hello, I was affected by the resolver bug that was solved in 2.9-7; as of 2.9-9
the resolver stopped working again. The automatic workaround that is mentioned
in the changelog is not working, and single-request in resolv.conf doesn't
seem to have any effect either.

Here's a dump of the resolver trying to get the address of google.com:

14:11:00.754265 IP (tos 0x0, ttl 64, id 45448, offset 0, flags [DF], proto UDP 
(17), length 60)
10.0.0.3.60486  10.0.0.138.53: [udp sum ok] 39108+ A? www.google.com. (32)
14:11:00.754303 IP (tos 0x0, ttl 64, id 45449, offset 0, flags [DF], proto UDP 
(17), length 60)
10.0.0.3.60486  10.0.0.138.53: [udp sum ok] 48015+ ? www.google.com. 
(32)
14:11:00.759312 IP (tos 0x0, ttl 64, id 1324, offset 0, flags [none], proto UDP 
(17), length 60)
10.0.0.138.53  10.0.0.3.60486: [udp sum ok] 48015 NotImp q: ? 
www.google.com. 0/0/0 (32)
14:11:00.817710 IP (tos 0x0, ttl 64, id 1325, offset 0, flags [none], proto UDP 
(17), length 144)
10.0.0.138.53  10.0.0.3.60486: 39108 q: A? www.google.com. 5/0/0 
www.google.com. CNAME www.l.google.com.[|domain]

The DNS server (it's my ADSL router) responds NotImp to the  query (it does
not support IPv6). The reply (CNAME) to the A query seems correct though.

-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.30-rc3-00329-g0c8454f (SMP w/2 CPU cores; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libgcc1   1:4.3.3-8  GCC support library

libc6 recommends no packages.

Versions of packages libc6 suggests:
pn  glibc-doc none (no description available)
ii  locales   2.9-7  GNU C Library: National Language (

-- debconf information:
* glibc/upgrade: true
  glibc/disable-screensaver:
  glibc/restart-failed:
* glibc/restart-services: vsftpd openbsd-inetd mysql exim4 cups cron atd




-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#526823: libc6 2.9-9 broke DNS resolver again

2009-05-03 Thread Aurelien Jarno
On Sun, May 03, 2009 at 08:54:07PM +0200, Luca Tettamanti wrote:
 Package: libc6
 Version: 2.9-9
 Severity: normal
 
 Hello, I was affected by the resolver bug that was solved in 2.9-7; as of 
 2.9-9
 the resolver stopped working again. The automatic workaround that is mentioned
 in the changelog is not working, and single-request in resolv.conf doesn't
 seem to have any effect either.
 
 Here's a dump of the resolver trying to get the address of google.com:
 
 14:11:00.754265 IP (tos 0x0, ttl 64, id 45448, offset 0, flags [DF], proto 
 UDP (17), length 60)
 10.0.0.3.60486  10.0.0.138.53: [udp sum ok] 39108+ A? www.google.com. 
 (32)
 14:11:00.754303 IP (tos 0x0, ttl 64, id 45449, offset 0, flags [DF], proto 
 UDP (17), length 60)
 10.0.0.3.60486  10.0.0.138.53: [udp sum ok] 48015+ ? www.google.com. 
 (32)
 14:11:00.759312 IP (tos 0x0, ttl 64, id 1324, offset 0, flags [none], proto 
 UDP (17), length 60)
 10.0.0.138.53  10.0.0.3.60486: [udp sum ok] 48015 NotImp q: ? 
 www.google.com. 0/0/0 (32)
 14:11:00.817710 IP (tos 0x0, ttl 64, id 1325, offset 0, flags [none], proto 
 UDP (17), length 144)
 10.0.0.138.53  10.0.0.3.60486: 39108 q: A? www.google.com. 5/0/0 
 www.google.com. CNAME www.l.google.com.[|domain]
 
 The DNS server (it's my ADSL router) responds NotImp to the  query (it 
 does
 not support IPv6). The reply (CNAME) to the A query seems correct though.
 

Could you please try the glibc from http://temp.aurel32.net/glibc-test/ ?
I have backported a few more patch from upstream, but I have no way to
know if they change something or not.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net



-- 
To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org