On Wed, 2008-04-23 at 23:33 +1000, Sonia Hamilton wrote:
> A question about uptime. My understanding of the load average figures is
> that a figure less than 2.0 on a 2 CPU machine means the CPUs don't have
> more work than they can keep up with (on average).

As an aside,

It is interesting to note that "maxing out" (1.0 on one CPU, 2.0 on two,
etc) the load average does not necessarily imply the best use of a
machine.

I know the gent who runs the image farm at Amazon. Unsurprisingly,
Amazon's high level IT people wanted all the systems to be fully
utilized - but the point of the image servers is *not* to be running
without any CPU idling, but to be serving images with extremely low
latency.

So in this case, they needed a very beefy set of machines that were
running at as close to idle as they could so that any new request coming
in could be acted on immediately.

He had such a bad time of trying to explain latency to senior IT that
finally he gave up and hacked the kernel so that it reported
artificially high load averages to trick out the external monitoring
agents so the IT managers wouldn't feel that they could lump more
services onto the box, thereby increasing their "utilization" at the
cost of completely destroying responsiveness.

AfC
Sydney

-- 
Andrew Frederick Cowie

Operational Dynamics is an operations and engineering consultancy
focusing on IT strategy, organizational architecture, systems
review, and effective procedures for change management. We actively
carry out research and development in these areas on behalf of our
clients, and enable successful use of open source in their mission
critical enterprises, worldwide.

http://www.operationaldynamics.com/

Sydney   New York   Toronto   London

Attachment: signature.asc
Description: This is a digitally signed message part

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to