I guess I'll start out with my setup.

Four XServe running 10.4.6. All dual processor G5s and all completely
patched with all software updates. (including the latest WO and Java
updates).

Two of these servers are running as App servers. The other two are
running OpenBase and are strictly database servers.

This system runs 3 WebObjects applications. One is a backend
management application for the system. It only has one instance
running and isn't used very often.

The second app is essentially a spider. It searches the web collecting
data relevant to our customer's needs.

The last app is a service provider. It is basically just a direct
action that returns data based on a customer's request.

The first app has one instance running and may get 10 request a day.

The second app has one instance running. It feeds off a queue in the
database that tells it what information it needs and does not have. A
cron job kicks off a direct action that runs through this queue and
looks for the required data and stores it into the system. It has a 55
minute timer that shuts itself off if it doesn't empty the queue
within 55 minutes. The action is fired again 5 minutes later and picks
up where the last one left off. This app currently runs all 55 minutes
because we've recently cleared a lot of data out of the system that we
need to refresh with new data.

The third app does get a substantial amount of traffic. For instance
we got 2,941,750 requests to that app in the last 24 hours alone.

About 3 months ago we started having performance issues. I pretty much
pinned down the issues to the fact that one database server wasn't
adequate for the system. So about 3 weeks ago server #4 arrived and
was set it up as a database server and dropped it into the system. We
redesigned the system to split the data up so we could run
approximately half the data on one server and half on the other
(approximately).

That system was deployed and ran fairly well for about 2 weeks. Last
week I started getting hanging requests and the system was grinding to
a halt. It wasn't crashing mind you, just taking way to long to
service requests and occasionally dropping requests here and there.

I made a change to application #2, which is the one that pretty much
runs constantly for 55 minutes out of each hour. I added a sleep of 10
milliseconds. I had come to the conclusion that this long running app
was monopolizing the cpu and causing the request for the other apps to
stack up and bog things down. Adding a sleep of 10 milliseconds in the
main loop seemed to have fix our problems for about 4 days.

I must admit that this was a guess. I didn't have any real proof that
this app was monopolizing resources. I also didn't know if sleeping
the long running thread for 10 milliseconds would have the desired
result. But it seemed to work. Of course it could have just been a
coincidence.

A little over a week ago. The app servers where running at an average
of about 15%-25% cpu all day. Today, they are running at a constant
50%-60%. Looking in "Server Admin" at the number of request per second
for Apache. There doesn't appear to be a large increase in web
traffic. Although I'm not entirely confident in Server Admins Apache
statistics considering it seems to measure the total number of
requests active in the system and not simply the total number of
request that have come in at that time.

The database servers (running OpenBase) don't show any wierd activity.
They run at a constant 5%-10% which leads me to believe they aren't
running anywhere near capacity.

Looking at the detail view of app #3 in the web monitor doesn't seem
to reveal any statistics that look different than any other day. The
total number of request don't seem much larger than normal. The
average request time doesn't seem abnormal and niether does average
idle time.

The application logs don't indicate a bunch of exceptions being thrown.

Outside of Server Admin, the web monitor and logs; I can't think of
where else to look for the problem.

The latest version of this software has been running steady for about
a month now with about the same amount of traffic I've getting today.
But it seems to be running really high CPU percentages for some
reason.

App #3 is running with 128 MB of RAM, which is the same amount it's
been running at for over 1 and a half years now. The application is
set for a minimum of 2 threads and a maximum of 20 threads with a
listening queue size of 1.

The adaptor has the following settings:
Load balance scheme: round robin.
Retries: 1
Dormant: 1
Send timeout: 10
Receive timeout: 10
Connect timeout: 5
Send Buffer size: 32768
Receive buffer size: 98304 (the response pages are between 60K and 100K)
Connection Pool Size: 1

I guess my question is where, outside of what I've already stated,
should I start to look for my bottleneck? Are there some tools in WO
or XServe that I haven't mentioned yet, that I should be looking at?

I think I (and Robert Walker who has been assisting me with this
project so far) have done a pretty good job with things so far. But I
guess I've hit the edge of my knowledge  because I can't seem to
locate the problem I'm having.

Any help would be greatly appreciated.

- Eric Stewart
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-deploy mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-deploy/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to