Re: [nsd-users] Repeated crashes of NSD, without a clear explanation

2020-01-07 Thread Anand Buddhdev via nsd-users
On 07/01/2020 19:04, Stephane Bortzmeyer wrote:

> Apparently, it was a lack of memory:

Aah, ok!

>> database: ""
> 
> Currently under test and no problem yet (anyway, I'll add RAM).

The "no-database" mode uses less memory, so that explains the stability.

Regards,
Anand

___
nsd-users mailing list
nsd-users@lists.nlnetlabs.nl
https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] Repeated crashes of NSD, without a clear explanation

2020-01-07 Thread Stephane Bortzmeyer via nsd-users
On Tue, Jan 07, 2020 at 09:11:56AM +0300,
 Anand Buddhdev via nsd-users  wrote 
 a message of 36 lines which said:

> This suggests that an incoming XFR is triggering a bug. Have you saved
> the contents of the nsd-xfr-1974 directory? If not, perhaps you can save
> it the next time it happens. This may help the developers in figuring
> out what causes the crash.

Apparently, it was a lack of memory:

[8374219.385014] Out of memory: Kill process 10677 (nsd) score 66 or sacrifice 
child
[8374219.385758] Killed process 10678 (nsd) total-vm:37552kB, anon-rss:676kB, 
file-rss:0kB, shmem-rss:27344kB
[8374219.386779] oom_reaper: reaped process 10678 (nsd), now anon-rss:0kB, 
file-rss:0kB, shmem-rss:27344kB

> Finally, the database mode is no longer recommended. Could you try
> running your instance of NSD with:
> 
> database: ""

Currently under test and no problem yet (anyway, I'll add RAM).

___
nsd-users mailing list
nsd-users@lists.nlnetlabs.nl
https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users


Re: [nsd-users] Repeated crashes of NSD, without a clear explanation

2020-01-06 Thread Anand Buddhdev via nsd-users
On 06/01/2020 22:39, Stephane Bortzmeyer via nsd-users wrote:

Hi Stephane,

> For now one week, one machine has NSD crashing after a few hours of
> running, corrupting nsd.db.
> 
> The log (verbosity 4) says:
> 
> Jan 06 20:31:30 ada nsd[1974]: process 1975 exited with status 9
> Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.892] nsd[1974]: error: 
> process 1975 exited with status 9
> Jan 06 20:31:30 ada nsd[1974]: rmdir /tmp/nsd-xfr-1974 failed: Directory not 
> empty

This suggests that an incoming XFR is triggering a bug. Have you saved
the contents of the nsd-xfr-1974 directory? If not, perhaps you can save
it the next time it happens. This may help the developers in figuring
out what causes the crash.

Also, is there any log above this, to indicate which zone it might be?

Note that there are several newer versions of NSD since 4.1.26, so this
bug may also have been fixed in a newer version. If you can upgrade, you
may want to do that.

Finally, the database mode is no longer recommended. Could you try
running your instance of NSD with:

database: ""

Regards,
Anand Buddhdev

___
nsd-users mailing list
nsd-users@lists.nlnetlabs.nl
https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users


[nsd-users] Repeated crashes of NSD, without a clear explanation

2020-01-06 Thread Stephane Bortzmeyer via nsd-users
For now one week, one machine has NSD crashing after a few hours of
running, corrupting nsd.db.

The log (verbosity 4) says:

Jan 06 20:31:30 ada nsd[1974]: process 1975 exited with status 9
Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.892] nsd[1974]: error: 
process 1975 exited with status 9
Jan 06 20:31:30 ada nsd[1974]: rmdir /tmp/nsd-xfr-1974 failed: Directory not 
empty
Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.909] nsd[1974]: warning: 
rmdir /tmp/nsd-xfr-1974 failed: Directory not empty
Jan 06 20:31:31 ada nsd[2195]: nsd starting (NSD 4.1.26)
Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.418] nsd[2195]: notice: nsd 
starting (NSD 4.1.26)
Jan 06 20:31:31 ada nsd[2195]: setup SSL certificates
Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.421] nsd[2195]: info: setup 
SSL certificates
Jan 06 20:31:31 ada nsd[2196]: /var/lib/nsd/nsd.db: not cleanly closed 0
Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.798] nsd[2196]: warning: 
/var/lib/nsd/nsd.db: not cleanly closed 0
Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.798] nsd[2196]: warning: 
can not use /var/lib/nsd/nsd.db, will create anew

And then NSD stops. I have to start it manually, making it work for a
few more hours.

This machine worked fine, with the same set of zones, for several
years (yes, of course, software was upgraded, but another Debian,
machine, same version and same NSD, and almost same set of zones, has
no problem).

Debian "stable" 10.2, Linux kernel 4.19.0, NSD 4.1.26. As I said, a
very similar machine works fine.

% ls -alt /var/lib/nsd
total 552
-rw---  1 nsd  nsd  589824 Jan  6 20:33 nsd.db
-rw-r--r--  1 nsd  nsd6605 Jan  6 20:31 xfrd.state
drwxr-xr-x  2 nsd  nsd4096 Jan  6 20:31 .
drwxr-xr-x 70 root root   4096 Jan  6 20:18 ..

Deleting all /var/lib/nsd and starting from a fresh directory changes
nothing.

What can I investigate?

___
nsd-users mailing list
nsd-users@lists.nlnetlabs.nl
https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users