Bug#962459: unbound: constantly crashing after about 3 minutes since start
Hi, We saw this issue appearing on our unbound (1.9.0-2+deb10u2) nodes since Feb 28th 8am CET as well. Updating to the packages from https://people.debian.org/~edmonds/unbound/1.9.6-0+deb10u0/ solved the issue for us. Is there a specific query or command to reproduce this on the older version? Im just wondering, why we started seeing this only since yesterday. We do take a bit of traffic on our unbound nodes and should have noticed this earlier. Best regards - Volker
Bug#962459: unbound: constantly crashing after about 3 minutes since start
Kebert Martin wrote: > Applied '0001-Apply-a-series-of-fixes-for-Unbound-1.9.0-suggested-.patch' > > Result: > Oct 28 20:24:28 debian systemd[1]: Starting Unbound DNS server... > Oct 28 20:24:28 debian package-helper[464]: /var/lib/unbound/root.key has > content > Oct 28 20:24:28 debian package-helper[464]: fail: the anchor is NOT ok and > could not be fixed > Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 0: subnet > Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 1: validator > Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 2: iterator > Oct 28 20:24:28 debian systemd[1]: Started Unbound DNS server. > Oct 28 20:24:28 debian unbound[468]: [468:0] info: start of service (unbound > 1.9.0). > ... > Oct 28 20:31:31 debian kernel: unbound[470]: segfault at 1b0 ip > 7fdb28876e48 sp 7fdb26fd6cf0 error 4 in libevent-2.1.so.6.0.2 > [7fdb28857000+54000] > [...] Hi, Kebert: Thanks for checking that. Sorry it didn't work, and apologies for the delay in getting back to you. We're now looking into the possibility of updating the version of unbound in buster to a newer upstream release that most likely already includes the right combination of fixes for this issue, rather than trying to backport the right set of fixes needed to the 1.9.0 release. If you have an opportunity, could you give the candidate unbound package available here a try? https://people.debian.org/~edmonds/unbound/1.9.6-0+deb10u0/ Thanks! -- Robert Edmonds edmo...@debian.org
Bug#962459: unbound: constantly crashing after about 3 minutes since start
Applied '0001-Apply-a-series-of-fixes-for-Unbound-1.9.0-suggested-.patch' Result: Oct 28 20:24:28 debian systemd[1]: Starting Unbound DNS server... Oct 28 20:24:28 debian package-helper[464]: /var/lib/unbound/root.key has content Oct 28 20:24:28 debian package-helper[464]: fail: the anchor is NOT ok and could not be fixed Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 0: subnet Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 1: validator Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 2: iterator Oct 28 20:24:28 debian systemd[1]: Started Unbound DNS server. Oct 28 20:24:28 debian unbound[468]: [468:0] info: start of service (unbound 1.9.0). ... Oct 28 20:31:31 debian kernel: unbound[470]: segfault at 1b0 ip 7fdb28876e48 sp 7fdb26fd6cf0 error 4 in libevent-2.1.so.6.0.2[7fdb28857000+54000] Oct 28 20:31:31 debian kernel: Code: 00 00 41 55 41 54 41 89 d5 55 53 41 89 f4 48 89 fb 48 83 ec 08 48 8b 05 76 51 23 00 8b 10 85 d2 0f 85 8c 00 00 00 48 8b 6b 40 <48> 8b bd b0 01 00 00 48 85 ff 74 11 48 8b 05 2d 51 23 00 8b 00 85 Oct 28 20:31:31 debian systemd[1]: unbound.service: Main process exited, code=killed, status=11/SEGV Oct 28 20:31:31 debian systemd[1]: unbound.service: Failed with result 'signal'. Oct 28 20:31:31 debian systemd[1]: unbound.service: Service RestartSec=100ms expired, scheduling restart. Oct 28 20:31:31 debian systemd[1]: unbound.service: Scheduled restart job, restart counter is at 1. Oct 28 20:31:31 debian systemd[1]: Stopped Unbound DNS server. Oct 28 20:31:31 debian systemd[1]: Starting Unbound DNS server... Oct 28 20:31:31 debian package-helper[1994]: /var/lib/unbound/root.key has content Oct 28 20:31:31 debian package-helper[1994]: success: the anchor is ok Oct 28 20:31:31 debian unbound[1998]: [1998:0] notice: init module 0: subnet Oct 28 20:31:31 debian unbound[1998]: [1998:0] notice: init module 1: validator Oct 28 20:31:31 debian unbound[1998]: [1998:0] notice: init module 2: iterator Oct 28 20:31:31 debian systemd[1]: Started Unbound DNS server. Oct 28 20:31:31 debian unbound[1998]: [1998:0] info: start of service (unbound 1.9.0). ... Oct 28 20:32:41 debian kernel: unbound[2001]: segfault at 7fbb0009 ip 560e7af6bfb0 sp 7fbb29274480 error 4 in unbound[560e7af52000+c6000] Oct 28 20:32:41 debian kernel: Code: 24 20 0f b7 80 86 00 00 00 66 89 02 41 0f b6 76 20 49 8b 1e 83 e6 02 49 8b 47 28 48 8d 53 02 48 8d 0c ed 00 00 00 00 49 89 16 <48> 8b 04 e8 48 3b 44 24 08 0f 8d 21 05 00 00 40 84 f6 0f 85 48 04 Oct 28 20:32:41 debian systemd[1]: unbound.service: Main process exited, code=killed, status=11/SEGV Oct 28 20:32:41 debian systemd[1]: unbound.service: Failed with result 'signal'. Oct 28 20:32:41 debian systemd[1]: unbound.service: Service RestartSec=100ms expired, scheduling restart. Oct 28 20:32:41 debian systemd[1]: unbound.service: Scheduled restart job, restart counter is at 2. Oct 28 20:32:41 debian systemd[1]: Stopped Unbound DNS server. Oct 28 20:32:41 debian systemd[1]: Starting Unbound DNS server... Oct 28 20:32:41 debian package-helper[2199]: /var/lib/unbound/root.key has content Oct 28 20:32:41 debian package-helper[2199]: success: the anchor is ok Oct 28 20:32:41 debian unbound[2203]: [2203:0] notice: init module 0: subnet Oct 28 20:32:41 debian unbound[2203]: [2203:0] notice: init module 1: validator Oct 28 20:32:41 debian unbound[2203]: [2203:0] notice: init module 2: iterator Oct 28 20:32:41 debian systemd[1]: Started Unbound DNS server. Oct 28 20:32:41 debian unbound[2203]: [2203:0] info: start of service (unbound 1.9.0). S pozdravem Martin Kebert Informace obsa?en? v t?to e-mailov? zpr?v? a v?ech p?ilo?en?ch souborech jsou d?v?rn? a jsou ur?eny pouze pro pot?ebu adres?ta. Pros?me, abyste v p??pad?, ?e tento e-mail obdr??te omylem, neprodlen? upozornili odes?latele a tento e-mail odstranili z Va?eho syst?mu. Pokud nejste zam??len?m p??jemcem, berte pros?m na v?dom?, ?e zve?ejn?n?, kop?rov?n?, ???en? ?i p?ijet? jak?hokoliv opat?en? v souvislosti s obsahem t?to zpr?vy je zak?z?no a m??e b?t protipr?vn?. _ The information contained in this e-mail message and all attached files is confidential and is intended solely for the use of the individual or entity to whom they are addressed. Please notify the sender immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is prohibited and may be unlawful.
Bug#962459: unbound: constantly crashing after about 3 minutes since start
Kebert Martin wrote: > Hi, > I tried the patch "p1_and_2.diff" from #973052. > I'm not saying it was extensive test, but 7 minutes after start I got first > crash: > Oct 28 17:35:26 debian systemd[1]: Started Unbound DNS server. > Oct 28 17:35:26 debian unbound[450]: [450:0] info: start of service (unbound > 1.9.0). > ... > Oct 28 17:42:26 debian systemd[1]: unbound.service: Main process exited, code= > killed, status=11/SEGV > Oct 28 17:42:26 debian systemd[1]: unbound.service: Failed with result > 'signal'. > Oct 28 17:42:26 debian systemd[1]: unbound.service: Service RestartSec=100ms > expired, scheduling restart. > Oct 28 17:42:26 debian systemd[1]: unbound.service: Scheduled restart job, > restart counter is at 1. > ... > and 10 minutes later flood (about 30/sec) of these messages: > ... > Oct 28 17:52:49 debian unbound[1885]: [warn] Epoll ADD(1) on fd 52 failed. Old > events were 0; read change was 1 (add); w > rite change was 0 (none); close change was 0 (none): Bad file descriptor > Oct 28 17:52:49 debian unbound[1885]: [1885:3] error: read (in tcp s): Bad > file > descriptor for port > ... > > and "unbound" stopped responding to "unbound-control" (even simple > "unbound-control status" hangs). > I can't decide whether it was caused by this patch or whether it is someting > different. > Anyway I installed version 1.10 back which works. Hi, Kebert: Instead of the "p1_and_2.diff" patch, can you try the attached patch which includes additional fixes recommended by upstream? If this works for you we can propose updating the version of unbound in buster with these fixes. Thanks! -- Robert Edmonds edmo...@debian.org >From 0bf0258a54b9e7fd7d596bed3412bbf12ba532b6 Mon Sep 17 00:00:00 2001 From: Robert Edmonds Date: Wed, 28 Oct 2020 13:36:17 -0400 Subject: [PATCH] Apply a series of fixes for Unbound 1.9.0 suggested by upstream Per https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4227#c8, upstream recommends applying the following commits against 1.9.0: 348cbab016f824a336b65d0091310fe5cd58e762 2b47ca080eb91e209fb86cd1dc90a6aff32e2a1f 0b77c9d6763686264d44dfd926c8cb4f2f03a43a 6067ce6d2b82ce2e80055e578fdfd7ba3e67c523 af6c5dea43fc63452d49b2339e607365b6652987 a08fe8ca609b651c8d8c8379780aad508d492421 However, commit 0b77c9d6763686264d44dfd926c8cb4f2f03a43a contains a complete revert of the code changes in cae8361dcd2809c8e266d259370c9ab8660c2c0e (added post-1.9.0), so I applied that patch as well in order to avoid needing to manually resolve the textual conflict when attempting to apply 0b77c9d6763686264d44dfd926c8cb4f2f03a43a to 1.9.0. Most hunks applied cleanly or with a small offset, excluding the changelog entries. The git-apply session was as follows: $ git describe debian/1.9.0-2+deb10u2 $ git apply --verbose --exclude=doc/Changelog \ /tmp/up/348cbab016f824a336b65d0091310fe5cd58e762.diff \ /tmp/up/2b47ca080eb91e209fb86cd1dc90a6aff32e2a1f.diff \ /tmp/up/cae8361dcd2809c8e266d259370c9ab8660c2c0e.diff \ /tmp/up/0b77c9d6763686264d44dfd926c8cb4f2f03a43a.diff \ /tmp/up/6067ce6d2b82ce2e80055e578fdfd7ba3e67c523.diff \ /tmp/up/af6c5dea43fc63452d49b2339e607365b6652987.diff \ /tmp/up/a08fe8ca609b651c8d8c8379780aad508d492421.diff Skipped patch 'doc/Changelog'. Checking patch util/netevent.c... Applied patch util/netevent.c cleanly. Skipped patch 'doc/Changelog'. Checking patch config.h.in... Hunk #1 succeeded at 83 (offset -3 lines). Hunk #2 succeeded at 167 (offset -3 lines). Checking patch configure... Hunk #1 succeeded at 19010 (offset -3 lines). Checking patch configure.ac... Hunk #1 succeeded at 1197 (offset -3 lines). Checking patch util/ub_event.c... Applied patch config.h.in cleanly. Applied patch configure cleanly. Applied patch configure.ac cleanly. Applied patch util/ub_event.c cleanly. Skipped patch 'doc/Changelog'. Checking patch services/listen_dnsport.c... Applied patch services/listen_dnsport.c cleanly. Skipped patch 'doc/Changelog'. Checking patch services/listen_dnsport.c... Hunk #1 succeeded at 1779 (offset -7 lines). Hunk #2 succeeded at 1857 (offset -7 lines). Applied patch services/listen_dnsport.c cleanly. Skipped patch 'doc/Changelog'. Checking patch services/listen_dnsport.c... Hunk #1 succeeded at 1746 (offset -6 lines). Checking patch services/mesh.c... Applied patch services/listen_dnsport.c cleanly. Applied patch services/mesh.c cleanly. Skipped patch 'doc/Changelog'. Checking patch daemon/worker.c... Hunk #1 succeeded at 770 (offset -2 lines). Checking patch util/netevent.c... Hunk #1 succeeded at 1551 (offset -16 lines). Hunk #2 succeeded at 1617 (offset -16 lines). Applied patch daemon/worker.c cleanly. Applied patch util/netevent.c cleanly. Skipped patch 'doc/Changelog'. Checking patch services/mesh.c... Hunk #1 succeeded at 1196 (offset 4
Bug#962459: unbound: constantly crashing after about 3 minutes since start
Hi, I tried the patch "p1_and_2.diff" from #973052. I'm not saying it was extensive test, but 7 minutes after start I got first crash: Oct 28 17:35:26 debian systemd[1]: Started Unbound DNS server. Oct 28 17:35:26 debian unbound[450]: [450:0] info: start of service (unbound 1.9.0). ... Oct 28 17:42:26 debian systemd[1]: unbound.service: Main process exited, code=killed, status=11/SEGV Oct 28 17:42:26 debian systemd[1]: unbound.service: Failed with result 'signal'. Oct 28 17:42:26 debian systemd[1]: unbound.service: Service RestartSec=100ms expired, scheduling restart. Oct 28 17:42:26 debian systemd[1]: unbound.service: Scheduled restart job, restart counter is at 1. ... and 10 minutes later flood (about 30/sec) of these messages: ... Oct 28 17:52:49 debian unbound[1885]: [warn] Epoll ADD(1) on fd 52 failed. Old events were 0; read change was 1 (add); w rite change was 0 (none); close change was 0 (none): Bad file descriptor Oct 28 17:52:49 debian unbound[1885]: [1885:3] error: read (in tcp s): Bad file descriptor for port ... and "unbound" stopped responding to "unbound-control" (even simple "unbound-control status" hangs). I can't decide whether it was caused by this patch or whether it is someting different. Anyway I installed version 1.10 back which works. BTW. In meantime second server had installed original "debian stable" version of unbound-1.9.0 (to compare with patched version) with: ... Oct 28 17:48:45 debian2 unbound[519]: [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_ Oct 28 17:48:45 debian2 systemd[1]: unbound.service: Main process exited, code=killed, status=6/ABRT ... Oct 28 17:55:13 debian2 unbound[2811]: [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_ Oct 28 17:55:13 debian2 systemd[1]: unbound.service: Main process exited, code=killed, status=6/ABRT ... Oct 28 18:01:42 debian2 unbound[3951]: [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_ Oct 28 18:01:42 debian2 systemd[1]: unbound.service: Main process exited, code=killed, status=6/ABRT ... Oct 28 18:07:22 debian2 unbound[5187]: [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_ Oct 28 18:07:22 debian2 systemd[1]: unbound.service: Main process exited, code=killed, status=6/ABRT ... Oct 28 18:18:03 debian2 unbound[6196]: [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_ Oct 28 18:18:03 debian2 systemd[1]: unbound.service: Main process exited, code=killed, status=6/ABRT ... Oct 28 18:22:36 debian2 unbound[8178]: [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_ Oct 28 18:22:36 debian2 systemd[1]: unbound.service: Main process exited, code=killed, status=6/ABRT ... I'd say it is quite consistent (although frequency might depends on amount of traffic). S pozdravem Martin Kebert 28. 10. 2020 v 2:04, Daniel Kahn Gillmor mailto:d...@debian.org>>: Control: forcemerge 973052 962459 Hi Kebert-- On Mon 2020-06-08 12:28:46 +0200, Kebert Martin wrote: unbound constantly crashing with: [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_ The issue is fixed in unbound 1.9.2 but this version is not available in debian packages. As a workaround I had unbound from testing but it is not possible now, because currect testing version 1.10.1-1 relies on libpython3.8 which relies on libc6 >= 2.29 whereas stable libc6 is 2.28-10. Thanks for this note! sorry i missed it when reporting 973052, but it looks like it's the same issue. Would you be up for trying a version of unbound that includes the patch from 973052 and letting me know whether the crash is still happening? I haven't seen "consistent" failures with the workload where i encountered the bug, so it'd be great to hear whether the patch solves the problem for you if you've got a repeatable workload. If you don't know how to rebuild the package with the extra patch, please respond here and maybe one of the debian packagers who is used to working with unbound can offer a proposed update. Regards, --dkg Informace obsa?en? v t?to e-mailov? zpr?v? a v?ech p?ilo?en?ch souborech jsou d?v?rn? a jsou ur?eny pouze pro pot?ebu adres?ta. Pros?me, abyste v p??pad?, ?e tento e-mail obdr??te omylem, neprodlen? upozornili odes?latele a tento e-mail odstranili z Va?eho syst?mu. Pokud nejste zam??len?m p??jemcem, berte pros?m na v?dom?, ?e zve?ejn?n?, kop?rov?n?, ???en? ?i p?ijet? jak?hokoliv opat?en? v souvislosti s obsahem t?to zpr?vy je zak?z?no a m??e b?t protipr?vn?. _ The information contained in this e-mail message and all attached files is confidential and is intended solely for the use of the individual or entity to whom they are addressed. Please notify the sender immediately if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in
Bug#962459: unbound: constantly crashing after about 3 minutes since start
Control: forcemerge 973052 962459 Hi Kebert-- On Mon 2020-06-08 12:28:46 +0200, Kebert Martin wrote: > unbound constantly crashing with: > [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_ > > The issue is fixed in unbound 1.9.2 but this version is not available in > debian packages. > > As a workaround I had unbound from testing but it is not possible now, > because currect testing version 1.10.1-1 relies on libpython3.8 which > relies on libc6 >= 2.29 whereas stable libc6 is 2.28-10. Thanks for this note! sorry i missed it when reporting 973052, but it looks like it's the same issue. Would you be up for trying a version of unbound that includes the patch from 973052 and letting me know whether the crash is still happening? I haven't seen "consistent" failures with the workload where i encountered the bug, so it'd be great to hear whether the patch solves the problem for you if you've got a repeatable workload. If you don't know how to rebuild the package with the extra patch, please respond here and maybe one of the debian packagers who is used to working with unbound can offer a proposed update. Regards, --dkg signature.asc Description: PGP signature
Bug#962459: unbound: constantly crashing after about 3 minutes since start
Package: unbound Version: 1.9.0-2+deb10u2 Severity: important Dear Maintainer, unbound constantly crashing with: [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_ The issue is fixed in unbound 1.9.2 but this version is not available in debian packages. As a workaround I had unbound from testing but it is not possible now, because currect testing version 1.10.1-1 relies on libpython3.8 which relies on libc6 >= 2.29 whereas stable libc6 is 2.28-10. -- System Information: Debian Release: 10.4 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable'), (100, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 4.19.0-9-amd64 (SMP w/2 CPU cores) Locale: LANG=cs_CZ.UTF-8, LC_CTYPE=cs_CZ.UTF-8 (charmap=UTF-8), LANGUAGE=cs_CZ.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages unbound depends on: ii adduser 3.118 ii dns-root-data 2019031302 ii libc6 2.28-10 ii libevent-2.1-6 2.1.8-stable-4 ii libfstrm0 0.4.0-1 ii libprotobuf-c1 1.3.1-1+b1 ii libpython3.73.7.3-2+deb10u1 ii libssl1.1 1.1.1d-0+deb10u3 ii libsystemd0 241-7~deb10u4 ii lsb-base10.2019051400 ii openssl 1.1.1d-0+deb10u3 ii unbound-anchor 1.9.0-2+deb10u2 unbound recommends no packages. Versions of packages unbound suggests: pn apparmor -- no debconf information