We use Apache to serve software updates to hundreds of client
machines (Ubuntu Linux), which connect to our server on an automated
basis. These units connect both from inside our network, and from
client machines running on our customers' networks.
I started getting problem reports that clients were timing out when
connecting -- from our client logs I could see that it sometimes was
taking between five and 15 minutes to download a 200-byte file.
(During normal times, of course, the file downloads in less than 1
second.) These long downloads happen on client machines on customer
networks, AND on client machines on our internal network.
I've tweaked Apache configs to make the problem less severe, but my
main problem is that I still have NO IDEA what's going on here --
what's causing the problem or how I can fix it permanently. (If it's
a problem with a customer network/firewall, I would love to know what
to tell them -- and since we can't count on having properly
configured customer networks, we need to be able to minimize this
problem going forward.)
The Apache processes appear to be getting stuck in "Read" mode, and
are not released. At our heaviest, I'm seeing somewhere around
100-150 connections per MINUTE in access.log -- which should not be
overwhelming Apache.
I'm looking for an answer as to what's going on here, or suggestions
for further investigation and diagnosis.
The Apache server's internal ip is 192.168.61.10. It's on a DMZ;
internal network is 192.168.60.x. Firewall/router is a Juniper
Netscreen 25.
I hacked up a script to download a test file every five seconds, and
log server status when it failed the download. This is that script:
#!/bin/bash
ETERNITY="TRUE"
while [ $ETERNITY = "TRUE" ]
do
echo
echo "About to test stark at `date`."
curl --connect-timeout 10 --max-time 25 -s -S --stderr /Users/
schof/Desktop/problems.log http://example.dakim.com/testfile > /dev/
null
if [ $? -ne "0" ]; then
echo "Test FAILED at `date`"
ssh [EMAIL PROTECTED] netstat -n -p TCP >> /Users/schof/
Desktop/NetstatOutput`date +%s`.txt
curl http://example.dakim.com/server-status >> /Users/schof/
Desktop/ServerStatus`date +%s`.html
else
echo "Test succeeded at `date`."
sleep 5
fi
done
Here's a section of the netstat output: (I've replaced customer IP
addresses with obviously incorrect ones. (All 444.444.444.444 entries
were from the same IP address.)
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address
State PID/Program name
tcp 0 0 192.168.61.10:80
444.444.444.444:49576 SYN_RECV -
tcp 0 0 192.168.61.10:80
444.444.444.444:49577 SYN_RECV -
tcp 0 0 192.168.61.10:80 192.168.61.1:27486
SYN_RECV -
tcp 0 0 192.168.61.10:80
555.555.555.555:49952 SYN_RECV -
tcp 0 0 192.168.61.10:80
444.444.444.444:49497 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52825 ESTABLISHED22939/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49496 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52824 ESTABLISHED22968/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49499 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52827 ESTABLISHED22939/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49498 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52826 ESTABLISHED22997/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49501 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52829 ESTABLISHED22997/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49500 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52828 ESTABLISHED22968/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49503 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52831 ESTABLISHED22968/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49502 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52830 ESTABLISHED22939/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:52817 ESTABLISHED22968/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49489 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52816 ESTABLISHED22910/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49488 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52819 ESTABLISHED22910/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49491 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52818 ESTABLISHED22939/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49490 ESTABLISHED-
tcp 0 0 192.168.61.10:80
444.444.444.444:52821 ESTABLISHED22939/apache2
tcp 0 0 192.168.61.10:80
444.444.444.444:49493 ESTABLISHED-
And here's a section of the Apache Extended Server-Status page:
Apache Server Status for example.dakim.com
Server Version: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4
Server Built: Jul 28 2006 08:55:39
Current Time: Monday, 25-Jun-2007 20:21:27 PDT
Restart Time: Sunday, 24-Jun-2007 06:01:46 PDT
Parent Server Generation: 0
Server uptime: 1 day 14 hours 19 minutes 41 seconds
Total accesses: 156162 - Total Traffic: 1.4 GB
CPU Usage: u.03 s.02 cu0 cs0 - 3.62e-5% CPU load
1.13 requests/sec - 10.5 kB/second - 9.3 kB/request
150 requests currently being processed, 0 idle workers
RRRRRRRRRRRRRRRRRRRRRRRRR.......................................
RRRRRRRRRRRRRRRRRRRRRRRRR.......................................
RRRRRRRRRRRRRRRRRRRRWRRRR.......................................
RRRRRRRRRRRRRRRRRRRRRRRRR.......................................
RRRRRRRRRRRRRRRRRRRRRRRRR.......................................
RRRRRRRRRRRRRRRRRRRRRRRRR.......................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
Srv PID Acc M CPU SS Req Conn Child Slot
Client VHost Request
0-0 23058 0/0/3122 R 0.00 11 0 0.0 0.00
3.75 ? ? ..reading..
0-0 23058 0/0/2853 R 0.00 11 0 0.0 0.00
0.16 ? ? ..reading..
0-0 23058 0/0/2826 R 0.00 10 0 0.0 0.00
-1652.90 ? ? ..reading..
0-0 23058 0/0/2841 R 0.00 10 0 0.0 0.00
92.31 ? ? ..reading..
0-0 23058 0/0/2722 R 0.00 9 0 0.0 0.00
0.31 ? ? ..reading..
0-0 23058 0/3/2777 R 0.00 16 0 0.0 0.00
0.32 ? ? ..reading..
0-0 23058 0/0/2707 R 0.00 15 0 0.0 0.00
32.64 ? ? ..reading..
0-0 23058 0/1/2731 R 0.00 15 0 0.0 0.00
0.11 ? ? ..reading..
0-0 23058 0/0/2736 R 0.00 14 0 0.0 0.00
1.93 ? ? ..reading..
0-0 23058 0/0/2760 R 0.00 9 0 0.0 0.00
0.31 ? ? ..reading..
0-0 23058 0/0/2748 R 0.00 7 0 0.0 0.00
0.15 ? ? ..reading..
0-0 23058 0/0/2733 R 0.00 6 0 0.0 0.00
0.11 ? ? ..reading..
0-0 23058 0/0/2756 R 0.00 14 0 0.0 0.00
6.89 ? ? ..reading..
0-0 23058 0/2/2810 R 0.00 13 0 0.0 0.00
12.00 ? ? ..reading..
0-0 23058 0/0/2751 R 0.00 13 0 0.0 0.00
0.12 ? ? ..reading..
0-0 23058 0/0/2770 R 0.00 6 0 0.0 0.00
46.61 ? ? ..reading..
0-0 23058 0/0/2781 R 0.00 13 0 0.0 0.00
0.33 ? ? ..reading..
I can put the complete netstat and server-status files up for DL if
anyone requests.
Thanks very much!
--John Schofield
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: [EMAIL PROTECTED]
" from the digest: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]