Hello,

I am currently trying to set up two caching nameservers and noticed an interesting behaviour.

The configuration is the following:
two FreeBSD/amd64 6-CURRENT machines, with single Opteron processors.

Bind was compiled from ports, without threading, with gcc34 (from ports), with -O2 -static. It runs in a jail, with nothing more than the config and a nearly empty devfs mount.

Machine A has a simple config of the following:
options {
        directory "/etc/bind";
        tcp-clients 256;
        recursive-clients 8192;
        max-cache-size 600M;
        minimal-responses yes;
        pid-file "/tmp/named.pid";
        forwarders { MACHINE_B_IP; };
};

Machine B has the same bind, but runs as an authoritative NS with a joker record of:
*       IN      TXT     "256xA"
in the . zone (so it answers 256 "A"s for everything).

The test:
from machine B I start a queryperf, this way:
queryperf -d list -s MACHINE_A_IP

where list has the following:
www.RANDOMNUMBER.hu TXT
[...] this is 9000000 times.

During the test, machine A starts to fill its cache up until about 860 MBs. Until that I see this in top: CPU states: 27.7% user, 0.0% nice, 58.1% system, 14.2% interrupt, 0.0% idle

On machine B queryperf receives answer within the default timeout (5 seconds).

After bind reaches about 860 MBs, it starts to eat CPU, so there is 100% user and nearly 0% system and interrupt usage.

queryperf starts to time out with the following:
[Timeout] Query timed out: msg id 64837
Warning: Received a response with an unexpected (maybe timed out) id: 64837

The server effectively dies, it can answer only a very little number of queries and with very low performance. If I stop queryperf, bind remains in the CPU eating state:
76423 bind        1 129    0   861M   862M RUN      8:30 97.71% named

Because the machine has much more RAM, I first tried with 1200M in the config. The server has reached its "zombie" state at around 1600 MB of usage and it was much unresponsive.

On another (real) server, I noticed similar behaviour this week. Bind started to eat all CPU resources, there were only "recursive quota reached" messages in the logs, but rndc status said only very low usage (for example 60/1024 on that server).

I can repeat this with and without patch-lib_dns_resolver.c.

If I stop the queries, the server starts to answer the queries in a few minutes, after it has finished its strange "CPU eating" loop.

ktrace says, it's doing this many-many times between two successful queries:
 76423 named    CALL  gettimeofday(0x7fffffffe450,0)
 76423 named    RET   gettimeofday 0

Any ideas?

Thanks,
--
Attila Nagy                                   e-mail: [EMAIL PROTECTED]
Free Software Network (FSN.HU)           phone @work: +361 371 3536
ISOs: http://www.fsn.hu/?f=download            cell.: +3630 306 6758
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to