Re: [nsd-users] Repeated crashes of NSD, without a clear explanation
On 07/01/2020 19:04, Stephane Bortzmeyer wrote: > Apparently, it was a lack of memory: Aah, ok! >> database: "" > > Currently under test and no problem yet (anyway, I'll add RAM). The "no-database" mode uses less memory, so that explains the stability. Regards, Anand ___ nsd-users mailing list nsd-users@lists.nlnetlabs.nl https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users
Re: [nsd-users] Repeated crashes of NSD, without a clear explanation
On Tue, Jan 07, 2020 at 09:11:56AM +0300, Anand Buddhdev via nsd-users wrote a message of 36 lines which said: > This suggests that an incoming XFR is triggering a bug. Have you saved > the contents of the nsd-xfr-1974 directory? If not, perhaps you can save > it the next time it happens. This may help the developers in figuring > out what causes the crash. Apparently, it was a lack of memory: [8374219.385014] Out of memory: Kill process 10677 (nsd) score 66 or sacrifice child [8374219.385758] Killed process 10678 (nsd) total-vm:37552kB, anon-rss:676kB, file-rss:0kB, shmem-rss:27344kB [8374219.386779] oom_reaper: reaped process 10678 (nsd), now anon-rss:0kB, file-rss:0kB, shmem-rss:27344kB > Finally, the database mode is no longer recommended. Could you try > running your instance of NSD with: > > database: "" Currently under test and no problem yet (anyway, I'll add RAM). ___ nsd-users mailing list nsd-users@lists.nlnetlabs.nl https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users
Re: [nsd-users] Repeated crashes of NSD, without a clear explanation
On 06/01/2020 22:39, Stephane Bortzmeyer via nsd-users wrote: Hi Stephane, > For now one week, one machine has NSD crashing after a few hours of > running, corrupting nsd.db. > > The log (verbosity 4) says: > > Jan 06 20:31:30 ada nsd[1974]: process 1975 exited with status 9 > Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.892] nsd[1974]: error: > process 1975 exited with status 9 > Jan 06 20:31:30 ada nsd[1974]: rmdir /tmp/nsd-xfr-1974 failed: Directory not > empty This suggests that an incoming XFR is triggering a bug. Have you saved the contents of the nsd-xfr-1974 directory? If not, perhaps you can save it the next time it happens. This may help the developers in figuring out what causes the crash. Also, is there any log above this, to indicate which zone it might be? Note that there are several newer versions of NSD since 4.1.26, so this bug may also have been fixed in a newer version. If you can upgrade, you may want to do that. Finally, the database mode is no longer recommended. Could you try running your instance of NSD with: database: "" Regards, Anand Buddhdev ___ nsd-users mailing list nsd-users@lists.nlnetlabs.nl https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users
[nsd-users] Repeated crashes of NSD, without a clear explanation
For now one week, one machine has NSD crashing after a few hours of running, corrupting nsd.db. The log (verbosity 4) says: Jan 06 20:31:30 ada nsd[1974]: process 1975 exited with status 9 Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.892] nsd[1974]: error: process 1975 exited with status 9 Jan 06 20:31:30 ada nsd[1974]: rmdir /tmp/nsd-xfr-1974 failed: Directory not empty Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.909] nsd[1974]: warning: rmdir /tmp/nsd-xfr-1974 failed: Directory not empty Jan 06 20:31:31 ada nsd[2195]: nsd starting (NSD 4.1.26) Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.418] nsd[2195]: notice: nsd starting (NSD 4.1.26) Jan 06 20:31:31 ada nsd[2195]: setup SSL certificates Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.421] nsd[2195]: info: setup SSL certificates Jan 06 20:31:31 ada nsd[2196]: /var/lib/nsd/nsd.db: not cleanly closed 0 Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.798] nsd[2196]: warning: /var/lib/nsd/nsd.db: not cleanly closed 0 Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.798] nsd[2196]: warning: can not use /var/lib/nsd/nsd.db, will create anew And then NSD stops. I have to start it manually, making it work for a few more hours. This machine worked fine, with the same set of zones, for several years (yes, of course, software was upgraded, but another Debian, machine, same version and same NSD, and almost same set of zones, has no problem). Debian "stable" 10.2, Linux kernel 4.19.0, NSD 4.1.26. As I said, a very similar machine works fine. % ls -alt /var/lib/nsd total 552 -rw--- 1 nsd nsd 589824 Jan 6 20:33 nsd.db -rw-r--r-- 1 nsd nsd6605 Jan 6 20:31 xfrd.state drwxr-xr-x 2 nsd nsd4096 Jan 6 20:31 . drwxr-xr-x 70 root root 4096 Jan 6 20:18 .. Deleting all /var/lib/nsd and starting from a fresh directory changes nothing. What can I investigate? ___ nsd-users mailing list nsd-users@lists.nlnetlabs.nl https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users