Bug#526823: libc6 2.9-9 broke DNS resolver again
On Tue, May 5, 2009 at 12:38 AM, Bastian Blank wa...@debian.org wrote: On Mon, May 04, 2009 at 11:55:22PM +0200, Luca Tettamanti wrote: To recap: my ADSL router receives two requests and sends back *two* answers; to the A query it replies with the expected data, to the query it replies NotImpl (see the tcpdump in the first email). I doubt that this behaviour is allowed. Not implemented is a response to the query type, aka standard query or inverse query, not to the contents of the query.[1] I see, I though that not implemented was linked to the qtype, not to the opcode. So, yes, the router is broken, thanks for explaining. This behaviour also violates the transparency considerations.[2] I'm pretty sure that my router predates that RFC :P Luca -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#526823: libc6 2.9-9 broke DNS resolver again
On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote: On Mon, May 4, 2009 at 6:45 AM, Aurelien Jarno aurel...@aurel32.net wrote: On Sun, May 03, 2009 at 08:54:07PM +0200, Luca Tettamanti wrote: Package: libc6 Version: 2.9-9 Severity: normal Hello, I was affected by the resolver bug that was solved in 2.9-7; as of 2.9-9 the resolver stopped working again. The automatic workaround that is mentioned in the changelog is not working, and single-request in resolv.conf doesn't seem to have any effect either. Here's a dump of the resolver trying to get the address of google.com: 14:11:00.754265 IP (tos 0x0, ttl 64, id 45448, offset 0, flags [DF], proto UDP (17), length 60) 10.0.0.3.60486 10.0.0.138.53: [udp sum ok] 39108+ A? www.google.com. (32) 14:11:00.754303 IP (tos 0x0, ttl 64, id 45449, offset 0, flags [DF], proto UDP (17), length 60) 10.0.0.3.60486 10.0.0.138.53: [udp sum ok] 48015+ ? www.google.com. (32) 14:11:00.759312 IP (tos 0x0, ttl 64, id 1324, offset 0, flags [none], proto UDP (17), length 60) 10.0.0.138.53 10.0.0.3.60486: [udp sum ok] 48015 NotImp q: ? www.google.com. 0/0/0 (32) 14:11:00.817710 IP (tos 0x0, ttl 64, id 1325, offset 0, flags [none], proto UDP (17), length 144) 10.0.0.138.53 10.0.0.3.60486: 39108 q: A? www.google.com. 5/0/0 www.google.com. CNAME www.l.google.com.[|domain] The DNS server (it's my ADSL router) responds NotImp to the query (it does not support IPv6). The reply (CNAME) to the A query seems correct though. Could you please try the glibc from http://temp.aurel32.net/glibc-test/ ? I have backported a few more patch from upstream, but I have no way to know if they change something or not. The option single-request works, the automagic workaround does not, That's a good news. i.e. I always see the two requests going out in parallel. Actually I'm not sure I understand how it's supposes to work: if the first request fails usually the caller gives up, no? The first request done by a program should timeout, and the second request by the same program should then be done sequentially, like when single-request is set. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#526823: libc6 2.9-9 broke DNS resolver again
On Mon, May 4, 2009 at 6:45 AM, Aurelien Jarno aurel...@aurel32.net wrote: On Sun, May 03, 2009 at 08:54:07PM +0200, Luca Tettamanti wrote: Package: libc6 Version: 2.9-9 Severity: normal Hello, I was affected by the resolver bug that was solved in 2.9-7; as of 2.9-9 the resolver stopped working again. The automatic workaround that is mentioned in the changelog is not working, and single-request in resolv.conf doesn't seem to have any effect either. Here's a dump of the resolver trying to get the address of google.com: 14:11:00.754265 IP (tos 0x0, ttl 64, id 45448, offset 0, flags [DF], proto UDP (17), length 60) 10.0.0.3.60486 10.0.0.138.53: [udp sum ok] 39108+ A? www.google.com. (32) 14:11:00.754303 IP (tos 0x0, ttl 64, id 45449, offset 0, flags [DF], proto UDP (17), length 60) 10.0.0.3.60486 10.0.0.138.53: [udp sum ok] 48015+ ? www.google.com. (32) 14:11:00.759312 IP (tos 0x0, ttl 64, id 1324, offset 0, flags [none], proto UDP (17), length 60) 10.0.0.138.53 10.0.0.3.60486: [udp sum ok] 48015 NotImp q: ? www.google.com. 0/0/0 (32) 14:11:00.817710 IP (tos 0x0, ttl 64, id 1325, offset 0, flags [none], proto UDP (17), length 144) 10.0.0.138.53 10.0.0.3.60486: 39108 q: A? www.google.com. 5/0/0 www.google.com. CNAME www.l.google.com.[|domain] The DNS server (it's my ADSL router) responds NotImp to the query (it does not support IPv6). The reply (CNAME) to the A query seems correct though. Could you please try the glibc from http://temp.aurel32.net/glibc-test/ ? I have backported a few more patch from upstream, but I have no way to know if they change something or not. The option single-request works, the automagic workaround does not, i.e. I always see the two requests going out in parallel. Actually I'm not sure I understand how it's supposes to work: if the first request fails usually the caller gives up, no? Luca -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#526823: libc6 2.9-9 broke DNS resolver again
On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno aurel...@aurel32.net wrote: On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote: The option single-request works, the automagic workaround does not, That's a good news. i.e. I always see the two requests going out in parallel. Actually I'm not sure I understand how it's supposes to work: if the first request fails usually the caller gives up, no? The first request done by a program should timeout, and the second request by the same program should then be done sequentially, like when single-request is set. That's not what is happening though. I try to open a page in konqueror (I also tried other programs, it's not specific to konqueror): I see two request (A and ) going out at the same time; konqueror says it's unable to resolve the address - so far so good. I try to reload the page and I still see both requests going out at the same time (failure again). With single-request I see the first query, its answer and only then the second query and its reply - as expected. Furthermore I fear the workaround won't work for one-shot programs, like apt helpers, right? Luca -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#526823: libc6 2.9-9 broke DNS resolver again
On Mon, May 04, 2009 at 10:32:09PM +0200, Luca Tettamanti wrote: On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno aurel...@aurel32.net wrote: On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote: The option single-request works, the automagic workaround does not, That's a good news. i.e. I always see the two requests going out in parallel. Actually I'm not sure I understand how it's supposes to work: if the first request fails usually the caller gives up, no? The first request done by a program should timeout, and the second request by the same program should then be done sequentially, like when single-request is set. That's not what is happening though. I try to open a page in konqueror (I also tried other programs, it's not specific to konqueror): I see two request (A and ) going out at the same time; konqueror says it's unable to resolve the address - so far so good. I try to reload That's the problem. When I say it should timeout, I mean it should take long time to resolve, but at the end an answer should be returned. the page and I still see both requests going out at the same time (failure again). With single-request I see the first query, its answer and only then the second query and its reply - as expected. Furthermore I fear the workaround won't work for one-shot programs, like apt helpers, right? It should work, with just a longer timeout. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#526823: libc6 2.9-9 broke DNS resolver again
On Mon, May 04, 2009 at 11:55:22PM +0200, Luca Tettamanti wrote: On Mon, May 4, 2009 at 10:57 PM, Aurelien Jarno aurel...@aurel32.net wrote: On Mon, May 04, 2009 at 10:32:09PM +0200, Luca Tettamanti wrote: On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno aurel...@aurel32.net wrote: On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote: The option single-request works, the automagic workaround does not, That's a good news. i.e. I always see the two requests going out in parallel. Actually I'm not sure I understand how it's supposes to work: if the first request fails usually the caller gives up, no? The first request done by a program should timeout, and the second request by the same program should then be done sequentially, like when single-request is set. That's not what is happening though. I try to open a page in konqueror (I also tried other programs, it's not specific to konqueror): I see two request (A and ) going out at the same time; konqueror says it's unable to resolve the address - so far so good. I try to reload That's the problem. When I say it should timeout, I mean it should take long time to resolve, but at the end an answer should be returned. Ah ok, __libc_res_nsend should to statp-retry queries, which by default is 2 (confirmed by gdb). send_dg() returns 1 (reply), the socket is then closed; return value is 1 and control goes back to __libc_res_nquery. At this point we have the two answers: hp = {id = 27765, rd = 1, tc = 1, aa = 0, opcode = 5, qr = 0, rcode = 1, cd = 0, ad = 0, unused = 0, ra = 1, qdcount = 128, ancount = 1, nscount = 2, arcount = 0} hp2 = {id = 6, rd = 1, tc = 0, aa = 0, opcode = 0, qr = 1, rcode = 0, cd = 0, ad = 0, unused = 0, ra = 1, qdcount = 256, ancount = 512, nscount = 0, arcount = 0} The error in first one is FORMERR (I'd expect NOTIMP...), which is treated as an unrecoverable failure even if the second one succeeded. answer contains a 76bytes of reply: 756c 2b81 8000 0100 0200 0003 6674 ul+...ft 7002 6974 0664 6562 6961 6e03 6f72 6700 p.it.debian.org. 0001 0001 c00c 0005 0001 02f5 000d 0366 7470 0462 6f66 6802 6974 00c0 2f00 .ftp.bofh.it../. 0100 0100 0063 9400 04d5 5c08.c\. which seems sensible to me (ftp.it.debian.org is the name that I request, and it's a CNAME for ftp.bofh.it). To recap: my ADSL router receives two requests and sends back *two* answers; to the A query it replies with the expected data, to the query it replies NotImpl (see the tcpdump in the first email). When both queries and sent in parallel __libc_res_nquery consider an unrecoverable failure an error in any of the two (even if one of them is valid): the logic should be reversed: the query was successful if we get _at least_ one response. That's probably why some people reported success with this version of the code, as in their case only one answer is received. The problem is that the glibc considers NotImpl as an unrecoverable failure, as it applies to the opcode type (here query), not to the content of the query. Taking into account every possible answer engineers have imagined to says that a DNS software does not support query (while there is a correct way to say that) is not something easy to do in the glibc. After careful pondering I think that my router is actually sane (IOW, it doesn't discard requests - it correctly replies that it does not support that query); I think that it's a real bug in glibc. Your router is buggy. According to the RFC it answers that it does not support queries. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#526823: libc6 2.9-9 broke DNS resolver again
On Mon, May 4, 2009 at 10:57 PM, Aurelien Jarno aurel...@aurel32.net wrote: On Mon, May 04, 2009 at 10:32:09PM +0200, Luca Tettamanti wrote: On Mon, May 4, 2009 at 10:11 PM, Aurelien Jarno aurel...@aurel32.net wrote: On Mon, May 04, 2009 at 09:59:22PM +0200, Luca Tettamanti wrote: The option single-request works, the automagic workaround does not, That's a good news. i.e. I always see the two requests going out in parallel. Actually I'm not sure I understand how it's supposes to work: if the first request fails usually the caller gives up, no? The first request done by a program should timeout, and the second request by the same program should then be done sequentially, like when single-request is set. That's not what is happening though. I try to open a page in konqueror (I also tried other programs, it's not specific to konqueror): I see two request (A and ) going out at the same time; konqueror says it's unable to resolve the address - so far so good. I try to reload That's the problem. When I say it should timeout, I mean it should take long time to resolve, but at the end an answer should be returned. Ah ok, __libc_res_nsend should to statp-retry queries, which by default is 2 (confirmed by gdb). send_dg() returns 1 (reply), the socket is then closed; return value is 1 and control goes back to __libc_res_nquery. At this point we have the two answers: hp = {id = 27765, rd = 1, tc = 1, aa = 0, opcode = 5, qr = 0, rcode = 1, cd = 0, ad = 0, unused = 0, ra = 1, qdcount = 128, ancount = 1, nscount = 2, arcount = 0} hp2 = {id = 6, rd = 1, tc = 0, aa = 0, opcode = 0, qr = 1, rcode = 0, cd = 0, ad = 0, unused = 0, ra = 1, qdcount = 256, ancount = 512, nscount = 0, arcount = 0} The error in first one is FORMERR (I'd expect NOTIMP...), which is treated as an unrecoverable failure even if the second one succeeded. answer contains a 76bytes of reply: 756c 2b81 8000 0100 0200 0003 6674 ul+...ft 7002 6974 0664 6562 6961 6e03 6f72 6700 p.it.debian.org. 0001 0001 c00c 0005 0001 02f5 000d 0366 7470 0462 6f66 6802 6974 00c0 2f00 .ftp.bofh.it../. 0100 0100 0063 9400 04d5 5c08.c\. which seems sensible to me (ftp.it.debian.org is the name that I request, and it's a CNAME for ftp.bofh.it). To recap: my ADSL router receives two requests and sends back *two* answers; to the A query it replies with the expected data, to the query it replies NotImpl (see the tcpdump in the first email). When both queries and sent in parallel __libc_res_nquery consider an unrecoverable failure an error in any of the two (even if one of them is valid): the logic should be reversed: the query was successful if we get _at least_ one response. After careful pondering I think that my router is actually sane (IOW, it doesn't discard requests - it correctly replies that it does not support that query); I think that it's a real bug in glibc. Luca -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#526823: libc6 2.9-9 broke DNS resolver again
On Mon, May 04, 2009 at 11:55:22PM +0200, Luca Tettamanti wrote: To recap: my ADSL router receives two requests and sends back *two* answers; to the A query it replies with the expected data, to the query it replies NotImpl (see the tcpdump in the first email). I doubt that this behaviour is allowed. Not implemented is a response to the query type, aka standard query or inverse query, not to the contents of the query.[1] This behaviour also violates the transparency considerations.[2] Bastian [1]: http://tools.ietf.org/html/rfc1035 [2]: http://tools.ietf.org/html/rfc3597 -- We have the right to survive! Not by killing others. -- Deela and Kirk, Wink of An Eye, stardate 5710.5 -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#526823: libc6 2.9-9 broke DNS resolver again
Package: libc6 Version: 2.9-9 Severity: normal Hello, I was affected by the resolver bug that was solved in 2.9-7; as of 2.9-9 the resolver stopped working again. The automatic workaround that is mentioned in the changelog is not working, and single-request in resolv.conf doesn't seem to have any effect either. Here's a dump of the resolver trying to get the address of google.com: 14:11:00.754265 IP (tos 0x0, ttl 64, id 45448, offset 0, flags [DF], proto UDP (17), length 60) 10.0.0.3.60486 10.0.0.138.53: [udp sum ok] 39108+ A? www.google.com. (32) 14:11:00.754303 IP (tos 0x0, ttl 64, id 45449, offset 0, flags [DF], proto UDP (17), length 60) 10.0.0.3.60486 10.0.0.138.53: [udp sum ok] 48015+ ? www.google.com. (32) 14:11:00.759312 IP (tos 0x0, ttl 64, id 1324, offset 0, flags [none], proto UDP (17), length 60) 10.0.0.138.53 10.0.0.3.60486: [udp sum ok] 48015 NotImp q: ? www.google.com. 0/0/0 (32) 14:11:00.817710 IP (tos 0x0, ttl 64, id 1325, offset 0, flags [none], proto UDP (17), length 144) 10.0.0.138.53 10.0.0.3.60486: 39108 q: A? www.google.com. 5/0/0 www.google.com. CNAME www.l.google.com.[|domain] The DNS server (it's my ADSL router) responds NotImp to the query (it does not support IPv6). The reply (CNAME) to the A query seems correct though. -- System Information: Debian Release: squeeze/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 2.6.30-rc3-00329-g0c8454f (SMP w/2 CPU cores; PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages libc6 depends on: ii libgcc1 1:4.3.3-8 GCC support library libc6 recommends no packages. Versions of packages libc6 suggests: pn glibc-doc none (no description available) ii locales 2.9-7 GNU C Library: National Language ( -- debconf information: * glibc/upgrade: true glibc/disable-screensaver: glibc/restart-failed: * glibc/restart-services: vsftpd openbsd-inetd mysql exim4 cups cron atd -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#526823: libc6 2.9-9 broke DNS resolver again
On Sun, May 03, 2009 at 08:54:07PM +0200, Luca Tettamanti wrote: Package: libc6 Version: 2.9-9 Severity: normal Hello, I was affected by the resolver bug that was solved in 2.9-7; as of 2.9-9 the resolver stopped working again. The automatic workaround that is mentioned in the changelog is not working, and single-request in resolv.conf doesn't seem to have any effect either. Here's a dump of the resolver trying to get the address of google.com: 14:11:00.754265 IP (tos 0x0, ttl 64, id 45448, offset 0, flags [DF], proto UDP (17), length 60) 10.0.0.3.60486 10.0.0.138.53: [udp sum ok] 39108+ A? www.google.com. (32) 14:11:00.754303 IP (tos 0x0, ttl 64, id 45449, offset 0, flags [DF], proto UDP (17), length 60) 10.0.0.3.60486 10.0.0.138.53: [udp sum ok] 48015+ ? www.google.com. (32) 14:11:00.759312 IP (tos 0x0, ttl 64, id 1324, offset 0, flags [none], proto UDP (17), length 60) 10.0.0.138.53 10.0.0.3.60486: [udp sum ok] 48015 NotImp q: ? www.google.com. 0/0/0 (32) 14:11:00.817710 IP (tos 0x0, ttl 64, id 1325, offset 0, flags [none], proto UDP (17), length 144) 10.0.0.138.53 10.0.0.3.60486: 39108 q: A? www.google.com. 5/0/0 www.google.com. CNAME www.l.google.com.[|domain] The DNS server (it's my ADSL router) responds NotImp to the query (it does not support IPv6). The reply (CNAME) to the A query seems correct though. Could you please try the glibc from http://temp.aurel32.net/glibc-test/ ? I have backported a few more patch from upstream, but I have no way to know if they change something or not. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net -- To UNSUBSCRIBE, email to debian-glibc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org