Daniel, all,
Just wanted to post the solution to this issue. I wanted to wait a significant
amount of time to make sure we had this solved.
The root caused was the LDAP Caching mechanism. I am guessing there is a bug in
that code that causes the server to go haywire after n-number of items being
cached or looked up. Or perhaps some memory leak.
By disabling LDAP caching the server has been stable for 60+ days.
The last changes to http.conf I made were these:
< LDAPSharedCacheSize 500000
< LDAPCacheEntries 1024
---
> LDAPSharedCacheSize 0
> LDAPCacheEntries 16
352c352
< LDAPOpCacheEntries 1024
---
> LDAPOpCacheEntries 0
386c386
Hope this helps some other poor souls out there.
MJ
On Thursday, January 29, 2015 6:35 PM, Daniel <[email protected]> wrote:
2015-01-30 1:03 GMT+01:00 Mark Jacquet <[email protected]>:
Problem: Apache server will stay up for random amount of time, usually days,
but eventually enters a hung state. When hung the CPU load gradually spikes on
the machine
and new web server requests are unresponsive.
Error logs typically contain lots of these:
Wed Jan 28 16:06:58.667188 2015] [mpm_event:error] [pid 25336:tid 1]
AH00485: scoreboard is full, not at MaxRequestWorkers
I have done a lot of web research on this top and have found many cases where
others o=have had the same/similar issue but no real solutions. Seem very close
to this bug report: https://issues.apache.org/bugzilla/show_bug.cgi?id=53555
Environment:
LDOM (VM) SunOS myhostname 5.10 Generic_118833-36 sun4v sparc
SUNW,Sun-Fire-T200
8G RAM
http Conf:
StartServers 8
MinSpareServers Not set
MaxSpareServers Not set
ServerLimit 256
MaxRequestWorkers 100
MaxConnectionsPerChild 1000
KeepAlive On
TimeOut 3000
MaxKeepAliveRequests 50
KeepAliveTimeout 2
Current non-hung Score Board:
Server Version: Apache/2.4.10 (Unix)
Server MPM: event
Server Built: Oct 30 2014 16:29:03
Current Time: Wednesday, 28-Jan-2015 10:59:39 PST
Restart Time: Wednesday, 28-Jan-2015 09:49:21 PST
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 1 hour 10 minutes 17 seconds
Server load: 0.60 0.46 0.41
Total accesses: 1134 - Total Traffic: 2.2 GB
CPU Usage: u9.07 s16.94 cu609.51 cs69.31 - 16.7% CPU load
.269 requests/sec - 0.5 MB/second - 2.0 MB/request
1 requests currently being processed, 99 idle workers
PID Connections Threads Async connections
total accepting busy idle writing keep-alive closing
25337 0 yes 1 24 0 0 0
25338 1 yes 0 25 1 0 0
25339 1 yes 0 25 0 0 1
25340 1 yes 0 25 0 0 1
Sum 3 1 99 1 0 2
Any thoughts/comments on http conf tuning, OS patches, apache bug fixes
appreciated.
This is a production server, so you can imagine, having it go down at random
times (usually when I am asleep) is not fun!
Thanks.
MJ
Hello,
you have some odd values.
First you don't specify ThreadsPerChild, which by default is 64. Yet you do
specify the maxrequestworkers which represents the total of threads in all
child processes together, but you specify a maximun of 256 processes.
By a simple math, 256 process * 64 childs per process would yield 16384 threads
in total, yet you are just allowing a maximun of 100, so effectively your
server is just capable of starting 1 single process and thus, every time you
restart, having no "spare" processes available you will get scoreboard is full
message.
Consider something more logical like this for starters:
StartServers 1 <-- starts with 1 processServerLimit 5
<-- 4 more process available, 5 x 200 max threads = 1000 (as you can see
bellow, math matches maxrequestworkers)MinSpareThreads 25
MaxSpareThreads 100ThreadsPerChild 200 <-- threads per child
processThreadLimit 200 <---max threads per child
processMaxRequestWorkers 1000 <--- a total of 1000
threadsMaxConnectionsPerChild 10000000
This is an example, adjust to your needs.
--
Daniel FerradalIT Specialist
email [email protected] es.linkedin.com/in/danielferradal