Thanks Barry - very informative, and gave me a chuckle :-) Randy
Barry Smith wrote: > > Randy, > > Please see > http://www-unix.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#computers > > Essentially what has happened is that chip hardware designers > (Intel, IBM, AMD) hit a wall > on how high they can make their clock speed. They then needed some other > way to try to > increase the "performance" of their chips; since they could continue to > make smaller circuits > they came up on putting multiple cores on a single chip, then they can > "double" or "quad" the > claimed performance very easily. Unfortunately the whole multicore > "solution" is really > half-assed since it is difficult to effectively use all the cores, > especially since the memory > bandwidth did not improve as fast. > > Now when a company comes out with a half-assed product, do they say, > "this is a half-assed product"? > Did Microsoft say Vista was "half-assed". No, they emphasis the positive > parts of their product and > hide the limitations. This has been true since Grog made his first > stone wheel in front of this cave. > So Intel mislead everyone on how great multi-cores are. > > When you buy earlier dual or quad products you are NOT gettting a > parallel system (even > though it has 2 cores) because the memory is NOT parallel. > > Things are getting a bit better, Intel now has systems with higher > memory bandwidth. > The thing you have to look for is MEMORY BANDWDITH PER CORE, the higher > that is the > better performance you get. > > Note this doesn't have anything to do with PETSc, any sparse solver has > the exact same > issues. > > Barry > > > > On Apr 15, 2008, at 7:19 PM, Randall Mackie wrote: >> I'm running my PETSc code on a cluster of quad core Xeon's connected >> by Infiniband. I hadn't much worried about the performance, because >> everything seemed to be working quite well, but today I was actually >> comparing performance (wall clock time) for the same problem, but on >> different combinations of CPUS. >> >> I find that my PETSc code is quite scalable until I start to use >> multiple cores/cpu. >> >> For example, the run time doesn't improve by going from 1 core/cpu >> to 4 cores/cpu, and I find this to be very strange, especially since >> looking at top or Ganglia, all 4 cpus on each node are running at 100% >> almost >> all of the time. I would have thought if the cpus were going all out, >> that I would still be getting much more scalable results. >> >> We are using mvapich-0.9.9 with infiniband. So, I don't know if >> this is a cluster/Xeon issue, or something else. >> >> Anybody with experience on this? >> >> Thanks, Randy M. >> >
