Bug#962459: unbound: constantly crashing after about 3 minutes since start

2021-03-01 Thread Volker Schwicking
Hi,

We saw this issue appearing on our unbound (1.9.0-2+deb10u2) nodes since Feb 
28th 8am CET as well.

Updating to the packages from

https://people.debian.org/~edmonds/unbound/1.9.6-0+deb10u0/

solved the issue for us.

Is there a specific query or command to reproduce this on the older version?

Im just wondering, why we started seeing this only since yesterday. We do take 
a bit of traffic on our unbound nodes and should have noticed this earlier.

Best regards
- Volker


Bug#962459: unbound: constantly crashing after about 3 minutes since start

2021-02-11 Thread Robert Edmonds
Kebert Martin wrote:
> Applied '0001-Apply-a-series-of-fixes-for-Unbound-1.9.0-suggested-.patch' 
> 
> Result:
> Oct 28 20:24:28 debian systemd[1]: Starting Unbound DNS server...
> Oct 28 20:24:28 debian package-helper[464]: /var/lib/unbound/root.key has
> content
> Oct 28 20:24:28 debian package-helper[464]: fail: the anchor is NOT ok and
> could not be fixed
> Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 0: subnet
> Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 1: validator
> Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 2: iterator
> Oct 28 20:24:28 debian systemd[1]: Started Unbound DNS server.
> Oct 28 20:24:28 debian unbound[468]: [468:0] info: start of service (unbound
> 1.9.0).
> ...
> Oct 28 20:31:31 debian kernel: unbound[470]: segfault at 1b0 ip
> 7fdb28876e48 sp 7fdb26fd6cf0 error 4 in libevent-2.1.so.6.0.2
> [7fdb28857000+54000]
> [...]

Hi, Kebert:

Thanks for checking that. Sorry it didn't work, and apologies for the
delay in getting back to you.

We're now looking into the possibility of updating the version of
unbound in buster to a newer upstream release that most likely already
includes the right combination of fixes for this issue, rather than
trying to backport the right set of fixes needed to the 1.9.0 release.

If you have an opportunity, could you give the candidate unbound package
available here a try?

https://people.debian.org/~edmonds/unbound/1.9.6-0+deb10u0/

Thanks!

-- 
Robert Edmonds
edmo...@debian.org



Bug#962459: unbound: constantly crashing after about 3 minutes since start

2020-10-28 Thread Kebert Martin
Applied '0001-Apply-a-series-of-fixes-for-Unbound-1.9.0-suggested-.patch'

Result:
Oct 28 20:24:28 debian systemd[1]: Starting Unbound DNS server...
Oct 28 20:24:28 debian package-helper[464]: /var/lib/unbound/root.key has 
content
Oct 28 20:24:28 debian package-helper[464]: fail: the anchor is NOT ok and 
could not be fixed
Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 0: subnet
Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 1: validator
Oct 28 20:24:28 debian unbound[468]: [468:0] notice: init module 2: iterator
Oct 28 20:24:28 debian systemd[1]: Started Unbound DNS server.
Oct 28 20:24:28 debian unbound[468]: [468:0] info: start of service (unbound 
1.9.0).
...
Oct 28 20:31:31 debian kernel: unbound[470]: segfault at 1b0 ip 
7fdb28876e48 sp 7fdb26fd6cf0 error 4 in 
libevent-2.1.so.6.0.2[7fdb28857000+54000]
Oct 28 20:31:31 debian kernel: Code: 00 00 41 55 41 54 41 89 d5 55 53 41 89 f4 
48 89 fb 48 83 ec 08 48 8b 05 76 51 23 00 8b 10 85 d2 0f 85 8c 00 00 00 48 8b 
6b 40 <48> 8b bd b0 01 00 00 48 85 ff 74 11 48 8b 05 2d 51 23 00 8b 00 85
Oct 28 20:31:31 debian systemd[1]: unbound.service: Main process exited, 
code=killed, status=11/SEGV
Oct 28 20:31:31 debian systemd[1]: unbound.service: Failed with result 'signal'.
Oct 28 20:31:31 debian systemd[1]: unbound.service: Service RestartSec=100ms 
expired, scheduling restart.
Oct 28 20:31:31 debian systemd[1]: unbound.service: Scheduled restart job, 
restart counter is at 1.
Oct 28 20:31:31 debian systemd[1]: Stopped Unbound DNS server.
Oct 28 20:31:31 debian systemd[1]: Starting Unbound DNS server...
Oct 28 20:31:31 debian package-helper[1994]: /var/lib/unbound/root.key has 
content
Oct 28 20:31:31 debian package-helper[1994]: success: the anchor is ok
Oct 28 20:31:31 debian unbound[1998]: [1998:0] notice: init module 0: subnet
Oct 28 20:31:31 debian unbound[1998]: [1998:0] notice: init module 1: validator
Oct 28 20:31:31 debian unbound[1998]: [1998:0] notice: init module 2: iterator
Oct 28 20:31:31 debian systemd[1]: Started Unbound DNS server.
Oct 28 20:31:31 debian unbound[1998]: [1998:0] info: start of service (unbound 
1.9.0).
...
Oct 28 20:32:41 debian kernel: unbound[2001]: segfault at 7fbb0009 ip 
560e7af6bfb0 sp 7fbb29274480 error 4 in unbound[560e7af52000+c6000]
Oct 28 20:32:41 debian kernel: Code: 24 20 0f b7 80 86 00 00 00 66 89 02 41 0f 
b6 76 20 49 8b 1e 83 e6 02 49 8b 47 28 48 8d 53 02 48 8d 0c ed 00 00 00 00 49 
89 16 <48> 8b 04 e8 48 3b 44 24 08 0f 8d 21 05 00 00 40 84 f6 0f 85 48 04
Oct 28 20:32:41 debian systemd[1]: unbound.service: Main process exited, 
code=killed, status=11/SEGV
Oct 28 20:32:41 debian systemd[1]: unbound.service: Failed with result 'signal'.
Oct 28 20:32:41 debian systemd[1]: unbound.service: Service RestartSec=100ms 
expired, scheduling restart.
Oct 28 20:32:41 debian systemd[1]: unbound.service: Scheduled restart job, 
restart counter is at 2.
Oct 28 20:32:41 debian systemd[1]: Stopped Unbound DNS server.
Oct 28 20:32:41 debian systemd[1]: Starting Unbound DNS server...
Oct 28 20:32:41 debian package-helper[2199]: /var/lib/unbound/root.key has 
content
Oct 28 20:32:41 debian package-helper[2199]: success: the anchor is ok
Oct 28 20:32:41 debian unbound[2203]: [2203:0] notice: init module 0: subnet
Oct 28 20:32:41 debian unbound[2203]: [2203:0] notice: init module 1: validator
Oct 28 20:32:41 debian unbound[2203]: [2203:0] notice: init module 2: iterator
Oct 28 20:32:41 debian systemd[1]: Started Unbound DNS server.
Oct 28 20:32:41 debian unbound[2203]: [2203:0] info: start of service (unbound 
1.9.0).



S pozdravem
Martin Kebert



Informace obsa?en? v t?to e-mailov? zpr?v? a v?ech p?ilo?en?ch souborech jsou 
d?v?rn? a jsou ur?eny pouze pro pot?ebu adres?ta. Pros?me, abyste v p??pad?, ?e 
tento e-mail obdr??te omylem, neprodlen? upozornili odes?latele a tento e-mail 
odstranili z Va?eho syst?mu. Pokud nejste zam??len?m p??jemcem, berte pros?m na 
v?dom?, ?e zve?ejn?n?, kop?rov?n?, ???en? ?i p?ijet? jak?hokoliv opat?en? v 
souvislosti s obsahem t?to zpr?vy je zak?z?no a m??e b?t protipr?vn?.

_

The information contained in this e-mail message and all attached files is 
confidential and is intended solely for the use of the individual or entity to 
whom they are addressed. Please notify the sender immediately if you have 
received this e-mail by mistake and delete this e-mail from your system. If you 
are not the intended recipient you are notified that disclosing, copying, 
distributing or taking any action in reliance on the contents of this 
information is prohibited and may be unlawful.


Bug#962459: unbound: constantly crashing after about 3 minutes since start

2020-10-28 Thread Robert Edmonds
Kebert Martin wrote:
> Hi,
> I tried the patch "p1_and_2.diff" from #973052.
> I'm not saying it was extensive test, but 7 minutes after start I got first
> crash:
> Oct 28 17:35:26 debian systemd[1]: Started Unbound DNS server.
> Oct 28 17:35:26 debian unbound[450]: [450:0] info: start of service (unbound
> 1.9.0).
> ...
> Oct 28 17:42:26 debian systemd[1]: unbound.service: Main process exited, code=
> killed, status=11/SEGV
> Oct 28 17:42:26 debian systemd[1]: unbound.service: Failed with result
> 'signal'.
> Oct 28 17:42:26 debian systemd[1]: unbound.service: Service RestartSec=100ms
> expired, scheduling restart.
> Oct 28 17:42:26 debian systemd[1]: unbound.service: Scheduled restart job,
> restart counter is at 1.
> ...
> and 10 minutes later flood (about 30/sec) of these messages:
> ...
> Oct 28 17:52:49 debian unbound[1885]: [warn] Epoll ADD(1) on fd 52 failed. Old
> events were 0; read change was 1 (add); w
> rite change was 0 (none); close change was 0 (none): Bad file descriptor
> Oct 28 17:52:49 debian unbound[1885]: [1885:3] error: read (in tcp s): Bad 
> file
> descriptor for  port 
> ...
> 
> and "unbound" stopped responding to "unbound-control" (even simple
> "unbound-control status" hangs).
> I can't decide whether it was caused by this patch or whether it is someting
> different.
> Anyway I installed version 1.10 back which works.

Hi, Kebert:

Instead of the "p1_and_2.diff" patch, can you try the attached patch
which includes additional fixes recommended by upstream? If this works
for you we can propose updating the version of unbound in buster with
these fixes.

Thanks!

-- 
Robert Edmonds
edmo...@debian.org
>From 0bf0258a54b9e7fd7d596bed3412bbf12ba532b6 Mon Sep 17 00:00:00 2001
From: Robert Edmonds 
Date: Wed, 28 Oct 2020 13:36:17 -0400
Subject: [PATCH] Apply a series of fixes for Unbound 1.9.0 suggested by
 upstream

Per https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4227#c8,
upstream recommends applying the following commits against 1.9.0:

348cbab016f824a336b65d0091310fe5cd58e762
2b47ca080eb91e209fb86cd1dc90a6aff32e2a1f
0b77c9d6763686264d44dfd926c8cb4f2f03a43a
6067ce6d2b82ce2e80055e578fdfd7ba3e67c523
af6c5dea43fc63452d49b2339e607365b6652987
a08fe8ca609b651c8d8c8379780aad508d492421

However, commit 0b77c9d6763686264d44dfd926c8cb4f2f03a43a contains a
complete revert of the code changes in
cae8361dcd2809c8e266d259370c9ab8660c2c0e (added post-1.9.0), so I
applied that patch as well in order to avoid needing to manually resolve
the textual conflict when attempting to apply
0b77c9d6763686264d44dfd926c8cb4f2f03a43a to 1.9.0.

Most hunks applied cleanly or with a small offset, excluding the
changelog entries. The git-apply session was as follows:

$ git describe
debian/1.9.0-2+deb10u2

$ git apply --verbose --exclude=doc/Changelog \
/tmp/up/348cbab016f824a336b65d0091310fe5cd58e762.diff \
/tmp/up/2b47ca080eb91e209fb86cd1dc90a6aff32e2a1f.diff \
/tmp/up/cae8361dcd2809c8e266d259370c9ab8660c2c0e.diff \
/tmp/up/0b77c9d6763686264d44dfd926c8cb4f2f03a43a.diff \
/tmp/up/6067ce6d2b82ce2e80055e578fdfd7ba3e67c523.diff \
/tmp/up/af6c5dea43fc63452d49b2339e607365b6652987.diff \
/tmp/up/a08fe8ca609b651c8d8c8379780aad508d492421.diff
Skipped patch 'doc/Changelog'.
Checking patch util/netevent.c...
Applied patch util/netevent.c cleanly.
Skipped patch 'doc/Changelog'.
Checking patch config.h.in...
Hunk #1 succeeded at 83 (offset -3 lines).
Hunk #2 succeeded at 167 (offset -3 lines).
Checking patch configure...
Hunk #1 succeeded at 19010 (offset -3 lines).
Checking patch configure.ac...
Hunk #1 succeeded at 1197 (offset -3 lines).
Checking patch util/ub_event.c...
Applied patch config.h.in cleanly.
Applied patch configure cleanly.
Applied patch configure.ac cleanly.
Applied patch util/ub_event.c cleanly.
Skipped patch 'doc/Changelog'.
Checking patch services/listen_dnsport.c...
Applied patch services/listen_dnsport.c cleanly.
Skipped patch 'doc/Changelog'.
Checking patch services/listen_dnsport.c...
Hunk #1 succeeded at 1779 (offset -7 lines).
Hunk #2 succeeded at 1857 (offset -7 lines).
Applied patch services/listen_dnsport.c cleanly.
Skipped patch 'doc/Changelog'.
Checking patch services/listen_dnsport.c...
Hunk #1 succeeded at 1746 (offset -6 lines).
Checking patch services/mesh.c...
Applied patch services/listen_dnsport.c cleanly.
Applied patch services/mesh.c cleanly.
Skipped patch 'doc/Changelog'.
Checking patch daemon/worker.c...
Hunk #1 succeeded at 770 (offset -2 lines).
Checking patch util/netevent.c...
Hunk #1 succeeded at 1551 (offset -16 lines).
Hunk #2 succeeded at 1617 (offset -16 lines).
Applied patch daemon/worker.c cleanly.
Applied patch util/netevent.c cleanly.
Skipped patch 'doc/Changelog'.
Checking patch services/mesh.c...
Hunk #1 succeeded at 1196 (offset 4 

Bug#962459: unbound: constantly crashing after about 3 minutes since start

2020-10-28 Thread Kebert Martin
Hi,
I tried the patch "p1_and_2.diff" from #973052.
I'm not saying it was extensive test, but 7 minutes after start I got first 
crash:
Oct 28 17:35:26 debian systemd[1]: Started Unbound DNS server.
Oct 28 17:35:26 debian unbound[450]: [450:0] info: start of service (unbound 
1.9.0).
...
Oct 28 17:42:26 debian systemd[1]: unbound.service: Main process exited, 
code=killed, status=11/SEGV
Oct 28 17:42:26 debian systemd[1]: unbound.service: Failed with result 'signal'.
Oct 28 17:42:26 debian systemd[1]: unbound.service: Service RestartSec=100ms 
expired, scheduling restart.
Oct 28 17:42:26 debian systemd[1]: unbound.service: Scheduled restart job, 
restart counter is at 1.
...
and 10 minutes later flood (about 30/sec) of these messages:
...
Oct 28 17:52:49 debian unbound[1885]: [warn] Epoll ADD(1) on fd 52 failed. Old 
events were 0; read change was 1 (add); w
rite change was 0 (none); close change was 0 (none): Bad file descriptor
Oct 28 17:52:49 debian unbound[1885]: [1885:3] error: read (in tcp s): Bad file 
descriptor for  port 
...

and "unbound" stopped responding to "unbound-control" (even simple 
"unbound-control status" hangs).
I can't decide whether it was caused by this patch or whether it is someting 
different.
Anyway I installed version 1.10 back which works.


BTW. In meantime second server had installed original "debian stable" version 
of unbound-1.9.0 (to compare with patched version) with:
...
Oct 28 17:48:45 debian2 unbound[519]: [err] evmap.c:381: Assertion nread >= 0 
failed in evmap_io_del_
Oct 28 17:48:45 debian2 systemd[1]: unbound.service: Main process exited, 
code=killed, status=6/ABRT
...
Oct 28 17:55:13 debian2 unbound[2811]: [err] evmap.c:381: Assertion nread >= 0 
failed in evmap_io_del_
Oct 28 17:55:13 debian2 systemd[1]: unbound.service: Main process exited, 
code=killed, status=6/ABRT
...
Oct 28 18:01:42 debian2 unbound[3951]: [err] evmap.c:381: Assertion nread >= 0 
failed in evmap_io_del_
Oct 28 18:01:42 debian2 systemd[1]: unbound.service: Main process exited, 
code=killed, status=6/ABRT
...
Oct 28 18:07:22 debian2 unbound[5187]: [err] evmap.c:381: Assertion nread >= 0 
failed in evmap_io_del_
Oct 28 18:07:22 debian2 systemd[1]: unbound.service: Main process exited, 
code=killed, status=6/ABRT
...
Oct 28 18:18:03 debian2 unbound[6196]: [err] evmap.c:381: Assertion nread >= 0 
failed in evmap_io_del_
Oct 28 18:18:03 debian2 systemd[1]: unbound.service: Main process exited, 
code=killed, status=6/ABRT
...
Oct 28 18:22:36 debian2 unbound[8178]: [err] evmap.c:381: Assertion nread >= 0 
failed in evmap_io_del_
Oct 28 18:22:36 debian2 systemd[1]: unbound.service: Main process exited, 
code=killed, status=6/ABRT
...

I'd say it is quite consistent (although frequency might depends on amount of 
traffic).


S pozdravem
Martin Kebert

28. 10. 2020 v 2:04, Daniel Kahn Gillmor 
mailto:d...@debian.org>>:

Control: forcemerge 973052 962459

Hi Kebert--

On Mon 2020-06-08 12:28:46 +0200, Kebert Martin wrote:
unbound constantly crashing with:
[err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_

The issue is fixed in unbound 1.9.2 but this version is not available in debian 
packages.

As a workaround I had unbound from testing but it is not possible now,
because currect testing version 1.10.1-1 relies on libpython3.8 which
relies on libc6 >= 2.29 whereas stable libc6 is 2.28-10.

Thanks for this note!  sorry i missed it when reporting 973052, but it
looks like it's the same issue.  Would you be up for trying a version of
unbound that includes the patch from 973052 and letting me know whether
the crash is still happening?

I haven't seen "consistent" failures with the workload where i
encountered the bug, so it'd be great to hear whether the patch solves
the problem for you if you've got a repeatable workload.

If you don't know how to rebuild the package with the extra patch,
please respond here and maybe one of the debian packagers who is used to
working with unbound can offer a proposed update.

Regards,

   --dkg


Informace obsa?en? v t?to e-mailov? zpr?v? a v?ech p?ilo?en?ch souborech jsou 
d?v?rn? a jsou ur?eny pouze pro pot?ebu adres?ta. Pros?me, abyste v p??pad?, ?e 
tento e-mail obdr??te omylem, neprodlen? upozornili odes?latele a tento e-mail 
odstranili z Va?eho syst?mu. Pokud nejste zam??len?m p??jemcem, berte pros?m na 
v?dom?, ?e zve?ejn?n?, kop?rov?n?, ???en? ?i p?ijet? jak?hokoliv opat?en? v 
souvislosti s obsahem t?to zpr?vy je zak?z?no a m??e b?t protipr?vn?.

_

The information contained in this e-mail message and all attached files is 
confidential and is intended solely for the use of the individual or entity to 
whom they are addressed. Please notify the sender immediately if you have 
received this e-mail by mistake and delete this e-mail from your system. If you 
are not the intended recipient you are notified that disclosing, copying, 
distributing or taking any action in 

Bug#962459: unbound: constantly crashing after about 3 minutes since start

2020-10-27 Thread Daniel Kahn Gillmor
Control: forcemerge 973052 962459 

Hi Kebert--

On Mon 2020-06-08 12:28:46 +0200, Kebert Martin wrote:
> unbound constantly crashing with:
> [err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_
>
> The issue is fixed in unbound 1.9.2 but this version is not available in 
> debian packages.
>
> As a workaround I had unbound from testing but it is not possible now,
> because currect testing version 1.10.1-1 relies on libpython3.8 which
> relies on libc6 >= 2.29 whereas stable libc6 is 2.28-10.

Thanks for this note!  sorry i missed it when reporting 973052, but it
looks like it's the same issue.  Would you be up for trying a version of
unbound that includes the patch from 973052 and letting me know whether
the crash is still happening?

I haven't seen "consistent" failures with the workload where i
encountered the bug, so it'd be great to hear whether the patch solves
the problem for you if you've got a repeatable workload.

If you don't know how to rebuild the package with the extra patch,
please respond here and maybe one of the debian packagers who is used to
working with unbound can offer a proposed update.

Regards,

--dkg


signature.asc
Description: PGP signature


Bug#962459: unbound: constantly crashing after about 3 minutes since start

2020-06-08 Thread Kebert Martin
Package: unbound
Version: 1.9.0-2+deb10u2
Severity: important

Dear Maintainer,

unbound constantly crashing with:
[err] evmap.c:381: Assertion nread >= 0 failed in evmap_io_del_

The issue is fixed in unbound 1.9.2 but this version is not available in debian 
packages.

As a workaround I had unbound from testing but it is not possible now,
because currect testing version 1.10.1-1 relies on libpython3.8 which
relies on libc6 >= 2.29 whereas stable libc6 is 2.28-10.



-- System Information:
Debian Release: 10.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable'), (100, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-9-amd64 (SMP w/2 CPU cores)
Locale: LANG=cs_CZ.UTF-8, LC_CTYPE=cs_CZ.UTF-8 (charmap=UTF-8), 
LANGUAGE=cs_CZ.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages unbound depends on:
ii  adduser 3.118
ii  dns-root-data   2019031302
ii  libc6   2.28-10
ii  libevent-2.1-6  2.1.8-stable-4
ii  libfstrm0   0.4.0-1
ii  libprotobuf-c1  1.3.1-1+b1
ii  libpython3.73.7.3-2+deb10u1
ii  libssl1.1   1.1.1d-0+deb10u3
ii  libsystemd0 241-7~deb10u4
ii  lsb-base10.2019051400
ii  openssl 1.1.1d-0+deb10u3
ii  unbound-anchor  1.9.0-2+deb10u2

unbound recommends no packages.

Versions of packages unbound suggests:
pn  apparmor  

-- no debconf information