Wietse Venema put forth on 10/22/2009 12:04 PM: > Stan Hoeppner: >> I think you've demonstrated it's not slower. I'm wondering why it's not >> faster, vs what you described as about equal, in performance. Granted, > > More than 25 years ago people discovered that it is incredibly hard > to spread one program over multiple CPUs such that it keeps every > CPU busy all the time.
And rediscovering it every day. Given your employer, I'll use the example of Roadrunner. Ask that system's users how many of those 129,600 cores they are able to keep busy. Granted, batch jobs probably use a max of only a few thousand cores, say 12K or so, and I bet on average they're each busy only 10-15% of the time due to MPI overhead across that many nodes. > This is the main reason why doubling the number CPUs does not always > halve the execution time. Depends on the application, but I heartily agree. Obviously for smtp the bottlenecks are traditionally disk and network, rarely, if ever, CPU/memory. > There are also hardware-level issues but their effect usually pales > in comparison. Absolutely agreed. But they often have serious implications on performance. More likely are hardware device driver issues than actual hardware issues. Here's a good example due to an LSI Logic Linux SCSI driver change from kernel 2.6.8 to 2.6.9: >From http://bugs.gentoo.org/77334 "The 2.6.10-gentoo-r4 kernel still has the LSI logic SCSI regression that has been present since 2.6.9 A run of hdparm -t of all my scsi disks shows that the LVD devices (on channel 1) all are running at their max speed but the SE devices are running at 1/4 speed (I'm only getting 3MB/sec whereas with the 2.6.8-gentoo-r10 kernel they are getting 14-18MB/sec)" Imagine doing a distro "security update only" (which included a point kernel upgrade) on your university Postfix server, with 20,000 mailboxes and pretty heavy user load, late on a Saturday night, rebooting, and all comes up fine. However, you're unaware that it's now dropped your SCSI throughput from ~18MB/s down to 3MB/s. Think anyone will notice a performance difference come Monday morning? Never, ever, rule out even the "remotest" of possibilities when troubleshooting. Gremlins have a nasty habit of hiding in extremely obscure locations. In this hypothetical situation, how long would it take Wietse or Victor to find this university's gremlin and vanquish it? Pretend you don't know the answer already, and start with the symptoms described to you in a frantic phone call or email from you friend, the uni's Postfix op. -- Stan