general question on speed using quad core Xeons

Matthew Knepley Tue, 15 Apr 2008 21:34:33 -0500

On Tue, Apr 15, 2008 at 9:03 PM, Randall Mackie <rlmackie862 at gmail.com> 
wrote:
> Okay, but if I'm stuck with a big 3D finite difference code, written in
> PETSc
>  using Distributed Arrays, with 3 dof per node, then you're saying there is
>  really nothing I can do, except using blocking, to improve things on quad
>  core cpus? They talk about blocking using BAIJ format, and so is this the


Yes, just about.

>  same thing as creating MPIBAIJ matrices in PETSc? And is creating MPIBAIJ

Yes.

>  matrices in PETSc going to make a substantial difference in the speed?

That is the hope. You can just give MPIBAIJ as the argument to DAGetMatrix().

>  I'm sorry if I'm being dense, I'm just trying to understand if there is some
>  simple way I can utilize those extra cores on each cpu easily, and since
>  I'm not a computer scientist, some of these concepts are difficult.

I really believe extra cores are currently a con for scientific computing. There
are real mathematical barriers to their effective use.

  Matt

>  Thanks, Randy
>  Matthew Knepley wrote:
>
> > On Tue, Apr 15, 2008 at 7:41 PM, Randall Mackie <rlmackie862 at gmail.com>
> wrote:
> >
> > > Then what's the point of having 4 and 8 cores per cpu for parallel
> > >  computations then? I mean, I think I've done all I can to make
> > >  my code as efficient as possible.
> > >
> >
> > I really advise reading the paper. It explicitly treats the case of
> > blocking, and uses
> > a simple model to demonstrate all the points I made.
> >
> > With a single, scalar sparse matrix, there is definitely no point at
> > all of having
> > multiple cores. However, this will speed up things like finite element
> > integration.
> > So, for instance, making this integration dominate your cost (like
> > spectral element
> > codes do) will show nice speedup. Ulrich Ruede has a great talk about this
> on
> > his website.
> >
> >  Matt
> >
> >
> > >  I'm not quite sure I understand your comment about using blocks
> > >  or unassembled structures.
> > >
> > >
> > >  Randy
> > >
> > >
> > >
> > >
> > >  Matthew Knepley wrote:
> > >
> > >
> > > > On Tue, Apr 15, 2008 at 7:19 PM, Randall Mackie
> <rlmackie862 at gmail.com>
> > > >
> > > wrote:
> > >
> > > >
> > > > > I'm running my PETSc code on a cluster of quad core Xeon's connected
> > > > >  by Infiniband. I hadn't much worried about the performance, because
> > > > >  everything seemed to be working quite well, but today I was
> actually
> > > > >  comparing performance (wall clock time) for the same problem, but
> on
> > > > >  different combinations of CPUS.
> > > > >
> > > > >  I find that my PETSc code is quite scalable until I start to use
> > > > >  multiple cores/cpu.
> > > > >
> > > > >  For example, the run time doesn't improve by going from 1 core/cpu
> > > > >  to 4 cores/cpu, and I find this to be very strange, especially
> since
> > > > >  looking at top or Ganglia, all 4 cpus on each node are running at
> 100%
> > > > > almost
> > > > >  all of the time. I would have thought if the cpus were going all
> out,
> > > > >  that I would still be getting much more scalable results.
> > > > >
> > > > >
> > > > Those a really coarse measures. There is absolutely no way that all
> cores
> > > > are going 100%. Its easy to show by hand. Take the peak flop rate and
> > > > this gives you the bandwidth needed to sustain that computation (if
> > > > everything is perfect, like axpy). You will find that the chip
> bandwidth
> > > > is far below this. A nice analysis is in
> > > >
> > > >  http://www.mcs.anl.gov/~kaushik/Papers/pcfd99_gkks.pdf
> > > >
> > > >
> > > >
> > > > >  We are using mvapich-0.9.9 with infiniband. So, I don't know if
> > > > >  this is a cluster/Xeon issue, or something else.
> > > > >
> > > > >
> > > > This is actually mathematics! How satisfying. The only way to improve
> > > > this is to change the data structure (e.g. use blocks) or change the
> > > > algorithm (e.g. use spectral elements and unassembled structures)
> > > >
> > > >  Matt
> > > >
> > > >
> > > >
> > > > >  Anybody with experience on this?
> > > > >
> > > > >  Thanks, Randy M.
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
> >
> >
>
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

general question on speed using quad core Xeons

Reply via email to