nsd Using 100% CPU On Recent amd64 Snapshots

2014-09-09 Thread Scott Vanderbilt
Starting with the 18 Aug. amd64 snapshot (and continuing with the 8 
Sept. as well), my nsd server immediately pegs 3 of my CPU's 4 cores 
within seconds after starting. It won't even respond to nsd-control 
commands. Running on the 3 Aug. snapshot and for many versions prior to 
that, CPU usage was barely perceptible. So, it appears to be something 
new to recent snapshots. But frustratingly, I see no recent changes in 
nsd's source in cvs.


This instance of nsd is running in a slave configuration, and hosting 42 
zones. The master host also runs nsd with a recent amd64 snapshot, and 
it runs normally. There have been no changes in the nsd.conf for months 
prior to the latest snapshots. There are no errors in my nsd log file 
apart from some 'NOT IMPL errors' at start-up. I don't think these are 
relevant, since my understanding is that they are caused by nsd's 
initial IFXR attempts failing, but are then followed by successful AFXR 
transfers. At least that's what I gathered from a post in the nsd-users 
archives. [1]


My nsd.conf and dmesg follow.

Any ideas on how I can start to track down whether this is due to my 
configuration?


Many thanks.


[1] http://open.nlnetlabs.nl/pipermail/nsd-users/2013-January/001588.html

###

# $OpenBSD: nsd.conf,v 1.6 2013/11/26 12:54:42 sthen Exp $

server:
hide-version: yes
database: /var/nsd/db/nsd.db
username: _nsd
zonesdir: /var/nsd/zones
logfile: /var/log/nsd.log
pidfile: /var/nsd/run/nsd.pid
difffile: /var/nsd/run/ixfr.db
xfrdfile: /var/nsd/run/xfrd.state
statistics: 3600

remote-control:
control-enable: yes

key:
name: ns.datagenic.com.
algorithm: hmac-md5
secret: ...

zone:
name: datagenic.com
zonefile: db.datagenic.com
allow-notify: 163.228.162.199 ns.datagenic.com.
request-xfr: 163.228.162.199 ns.datagenic.com.


#   41 more identically configured zones omitted

###

OpenBSD 5.6-current (GENERIC.MP) #366: Mon Sep  8 17:13:38 MDT 2014
t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
RTC BIOS diagnostic error 80clock_battery
real mem = 1038864384 (990MB)
avail mem = 1002532864 (956MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xe4410 (25 entries)
bios0: vendor Intel Corp. version MOPNV10J.86A.0154.2009.1117.1624 
date 11/17/2009

bios0: Intel Corporation D510MO
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S3 S4 S5
acpi0: tables DSDT FACP APIC MCFG HPET SSDT
acpi0: wakeup devices SLPB(S4) PS2M(S4) PS2K(S4) UAR1(S4) UAR2(S4) 
P32_(S4) ILAN(S4) PEX0(S4) PEX1(S4) PEX2(S4) PEX3(S4) UHC1(S3) UHC2(S3) 
UHC3(S3) UHC4(S3) EHCI(S3) [...]

acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Atom(TM) CPU D510 @ 1.66GHz, 1666.98 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF

cpu0: 512KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 7 var ranges, 88 fixed ranges
cpu0: apic clock running at 166MHz
cpu0: mwait min=64, max=64, C-substates=0.1.0.0.0, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Atom(TM) CPU D510 @ 1.66GHz, 1666.69 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF

cpu1: 512KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Atom(TM) CPU D510 @ 1.66GHz, 1718.97 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF

cpu2: 512KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Atom(TM) CPU D510 @ 1.66GHz, 1666.70 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF

cpu3: 512KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 8 pa 0xfec0, version 20, 24 pins
ioapic0: misconfigured as apic 0, remapped to apid 8
acpimcfg0 at acpi0 addr 0xf800, bus 0-63
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 5 (P32_)
acpiprt1 at acpi0: bus 0 (PCI0)
acpiprt2 at acpi0: bus 1 (PEX0)
acpiprt3 at acpi0: bus 2 (PEX1)
acpiprt4 at acpi0: bus 3 (PEX2)
acpiprt5 

Re: nsd Using 100% CPU On Recent amd64 Snapshots

2014-09-09 Thread Philip Guenther
On Tue, Sep 9, 2014 at 9:34 AM, Scott Vanderbilt li...@datagenic.com wrote:
 Starting with the 18 Aug. amd64 snapshot (and continuing with the 8 Sept. as
 well), my nsd server immediately pegs 3 of my CPU's 4 cores within seconds
 after starting.

Hmm, crank NSD's log level and see if there's a debug log that shows
something looping?  Lacking that, ktrace a spinning process for a few
seconds then stop it, and combine that with fstat output to see what's
syscalls (if any...) are involved in the loop.


Philip Guenther