Wietse Venema put forth on 10/22/2009 12:04 PM:
> Stan Hoeppner:
>> I think you've demonstrated it's not slower.  I'm wondering why it's not
>> faster, vs what you described as about equal, in performance.  Granted,
> 
> More than 25 years ago people discovered that it is incredibly hard
> to spread one program over multiple CPUs such that it keeps every
> CPU busy all the time.

And rediscovering it every day.  Given your employer, I'll use the
example of Roadrunner.  Ask that system's users how many of those
129,600 cores they are able to keep busy.  Granted, batch jobs probably
use a max of only a few thousand cores, say 12K or so, and I bet on
average they're each busy only 10-15% of the time due to MPI overhead
across that many nodes.

> This is the main reason why doubling the number CPUs does not always
> halve the execution time.

Depends on the application, but I heartily agree.  Obviously for smtp
the bottlenecks are traditionally disk and network, rarely, if ever,
CPU/memory.

> There are also hardware-level issues but their effect usually pales
> in comparison.

Absolutely agreed.  But they often have serious implications on
performance.  More likely are hardware device driver issues than actual
hardware issues.  Here's a good example due to an LSI Logic Linux SCSI
driver change from kernel 2.6.8 to 2.6.9:

>From http://bugs.gentoo.org/77334

"The 2.6.10-gentoo-r4 kernel still has the LSI logic SCSI regression
that has been present since 2.6.9

A run of hdparm -t of all my scsi disks shows that the LVD devices (on
channel 1) all are running at their max speed but the SE devices are
running at 1/4 speed (I'm only getting 3MB/sec whereas with the
2.6.8-gentoo-r10 kernel they are getting 14-18MB/sec)"


Imagine doing a distro "security update only" (which included a point
kernel upgrade) on your university Postfix server, with 20,000 mailboxes
and pretty heavy user load, late on a Saturday night, rebooting, and all
comes up fine.  However, you're unaware that it's now dropped your SCSI
throughput from ~18MB/s down to 3MB/s.  Think anyone will notice a
performance difference come Monday morning?  Never, ever, rule out even
the "remotest" of possibilities when troubleshooting.  Gremlins have a
nasty habit of hiding in extremely obscure locations.

In this hypothetical situation, how long would it take Wietse or Victor
to find this university's gremlin and vanquish it?  Pretend you don't
know the answer already, and start with the symptoms described to you in
a frantic phone call or email from you friend, the uni's Postfix op.

--
Stan

Reply via email to