So for SRU we ideally want a nice, self-contained, ubuntu-based test case. Is that possible here? It reads to me as if it's a bit non- deterministic, is that true?
@kjtsanaktsidis, do you think you can write up reproduction instructions? If not, would you be able to test the proposed glibc in your environment? We'll be patching focal first fwiw. ** Description changed: - When resolving DNS names with getaddrinfo(), I have seen this hang for 5 - seconds and then retry and succeed. The issue is that glibc will issue a - both an A and AAAA query on the same socket, and in some circumstances - they can be sent with the same DNS transaction ID as well. + [impact] + When resolving DNS names with getaddrinfo(), I have seen this hang for 5 seconds and then retry and succeed. The issue is that glibc will issue a both an A and AAAA query on the same socket, and in some circumstances they can be sent with the same DNS transaction ID as well. - I verified this with a packet capture; in the packet capture, I saw the - A and AAAA queries for a name be made with the same DNS transaction ID, - get responses, do nothing for five seconds, and then send the same DNS - query again. On the glibc side, I confirmed that it's blocked waiting - for the DNS response by interrupting it with gdb, even though the packet - capture shows the response has well and truly arrived. I've attached a - packet capture & a backtrace of the glibc hang. + [test case] + TBD + + [regression potential] + TBD. + + [original description] + I verified this with a packet capture; in the packet capture, I saw the A and AAAA queries for a name be made with the same DNS transaction ID, get responses, do nothing for five seconds, and then send the same DNS query again. On the glibc side, I confirmed that it's blocked waiting for the DNS response by interrupting it with gdb, even though the packet capture shows the response has well and truly arrived. I've attached a packet capture & a backtrace of the glibc hang. I believe this is the same issue reported in these places: - * In RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1904153 - * Also RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1903880 - * Upstream: https://sourceware.org/bugzilla/show_bug.cgi?id=26600 + * In RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1904153 + * Also RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1903880 + * Upstream: https://sourceware.org/bugzilla/show_bug.cgi?id=26600 The environment I noticed this bug in was: - * Docker for Mac on an arm64 m1 Macbook - * Docker for Mac Linux kernel version is 5.10.76-linuxkit - * Linux is also arm64, not emulated - * Container with the buggy DNS environment is Ubuntu bionic (also arm64, not emulated) - * Glibc 2.27-3ubuntu1.4 + * Docker for Mac on an arm64 m1 Macbook + * Docker for Mac Linux kernel version is 5.10.76-linuxkit + * Linux is also arm64, not emulated + * Container with the buggy DNS environment is Ubuntu bionic (also arm64, not emulated) + * Glibc 2.27-3ubuntu1.4 However one of the redhat reporters noticed this issue in m6 series EC2 instances in AWS. A patch has been provided upstream for this issue: https://sourceware.org/pipermail/libc-alpha/2020-September/117547.html I applied the upstream patch to glibc 2.27-3ubuntu1.4 and rebuilt the package, and the problem went away. I've attached the exact patch I applied, since I had to work through some conflicts. So, I think that patch just needs to be backported to Bionic and (I think) Focal as well. Is that reasonable? Thanks! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1961697 Title: Transaction ID collisions cause slow DNS lookups in getaddrinfo To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1961697/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
