--On Tuesday, May 13, 2008 02:45:06 AM -0700 Howard Chu <[EMAIL PROTECTED]> wrote:
> Bill MacAllister wrote: >> Attached is the output of db4.2_stat -CA of the database. >> >> Thanks for looking at this. >> > So far it just looks like a very busy server. Can you turn off the > network access to it and see if it settles down when the query traffic > stops? Last night the server tried to do a log rotation. When I look at the log now it is zero length and nothing is getting written to it. An ldapsearch on the server just hangs. I logged into the console, shutdown the network interface down and the CPU is still pinned. > It's a bit odd that a single transaction has so many pages of the > suPrivilegeGroup index locked. > > The backtrace is somewhat suspicious, there are several <value optimized > out> items in the trace. In thread 8, frames 5 and 6 the locker value is > odd; usually in BDB the locker ID associated with a transaction has bit > 31 set, yielding a very large 32 bit number. Also there is no locker with > that ID in the db_stat output you provided. > > It looks like you'll have to try this again with a non-optimized binary > to get a reliable backtrace. Yes, we were afraid of that. I will build a debug version of bdb. The real rub is that we don't seem to be able to make this happen on demand. I tried taking the log from the pinned server, turned the log into a shell script of ldapsearch commands, and pointed it at another server. I could not make the second server go CPU bound. So, we will just have to deploy the debug bdb support on our test servers and wait. Bill >> Bill >> >> --On Tuesday, May 13, 2008 01:20:49 AM -0700 Howard Chu<[EMAIL PROTECTED]> >> wrote: >> >>> [EMAIL PROTECTED] wrote: >>>> Full_Name: Bill MacAllister >>>> Version: 2.3.41-1su2 >>>> OS: debian etch kernel 2.6.18-4-amd64 >>>> URL: http://www.stanford.edu/~whm/ldap-test1-bt.txt >>>> Submission from: (NULL) (171.64.19.165) >>>> >>>> >>>> The slapd process will sometimes consume all of available CPU. We >>>> observed this when we upgraded our production servers from 2.3.35-2su2 >>>> to 2.3.41-1su2. The problem was bad enough that we downgraded the >>>> production servers to 2.3.35-2su2. We have been trying to provoke the >>>> problem in our test environment and have not been successful in >>>> making it happen on demand. Today, we noticed that one of our test >>>> servers went completely CPU bound. I took a backtrace. It is >>>> available at the URL below. The interesting thing about the problem >>>> is that although top shows a pinned CPU and a high load the server is >>>> still responsive and continues to answer LDAP searches. The test >>>> server that exhibits the problem is still CPU bound and has been for >>>> 2-3 hours now. We will leave this server in this state in case there >>>> is other information that we should harvest in resolving the problem. >>> Please also provide the output from db_stat -CA on the database in >>> question, thanks. -- Bill MacAllister <[EMAIL PROTECTED]> Systems Programmer, ITS Unix Systems, Stanford University
