general question on speed using quad core Xeons

Randall Mackie Tue, 15 Apr 2008 17:41:09 -0700

Then what's the point of having 4 and 8 cores per cpu for parallel
computations then? I mean, I think I've done all I can to make
my code as efficient as possible.


I'm not quite sure I understand your comment about using blocks
or unassembled structures.


Randy


Matthew Knepley wrote:
> On Tue, Apr 15, 2008 at 7:19 PM, Randall Mackie <rlmackie862 at gmail.com> 
> wrote:
>> I'm running my PETSc code on a cluster of quad core Xeon's connected
>>  by Infiniband. I hadn't much worried about the performance, because
>>  everything seemed to be working quite well, but today I was actually
>>  comparing performance (wall clock time) for the same problem, but on
>>  different combinations of CPUS.
>>
>>  I find that my PETSc code is quite scalable until I start to use
>>  multiple cores/cpu.
>>
>>  For example, the run time doesn't improve by going from 1 core/cpu
>>  to 4 cores/cpu, and I find this to be very strange, especially since
>>  looking at top or Ganglia, all 4 cpus on each node are running at 100%
>> almost
>>  all of the time. I would have thought if the cpus were going all out,
>>  that I would still be getting much more scalable results.
> 
> Those a really coarse measures. There is absolutely no way that all cores
> are going 100%. Its easy to show by hand. Take the peak flop rate and
> this gives you the bandwidth needed to sustain that computation (if
> everything is perfect, like axpy). You will find that the chip bandwidth
> is far below this. A nice analysis is in
> 
>   http://www.mcs.anl.gov/~kaushik/Papers/pcfd99_gkks.pdf
> 
>>  We are using mvapich-0.9.9 with infiniband. So, I don't know if
>>  this is a cluster/Xeon issue, or something else.
> 
> This is actually mathematics! How satisfying. The only way to improve
> this is to change the data structure (e.g. use blocks) or change the
> algorithm (e.g. use spectral elements and unassembled structures)
> 
>   Matt
> 
>>  Anybody with experience on this?
>>
>>  Thanks, Randy M.
>>
>>
> 
> 
>

general question on speed using quad core Xeons

Reply via email to