On Tue, Apr 15, 2008 at 9:03 PM, Randall Mackie <rlmackie862 at gmail.com> wrote: > Okay, but if I'm stuck with a big 3D finite difference code, written in > PETSc > using Distributed Arrays, with 3 dof per node, then you're saying there is > really nothing I can do, except using blocking, to improve things on quad > core cpus? They talk about blocking using BAIJ format, and so is this the
Yes, just about. > same thing as creating MPIBAIJ matrices in PETSc? And is creating MPIBAIJ Yes. > matrices in PETSc going to make a substantial difference in the speed? That is the hope. You can just give MPIBAIJ as the argument to DAGetMatrix(). > I'm sorry if I'm being dense, I'm just trying to understand if there is some > simple way I can utilize those extra cores on each cpu easily, and since > I'm not a computer scientist, some of these concepts are difficult. I really believe extra cores are currently a con for scientific computing. There are real mathematical barriers to their effective use. Matt > Thanks, Randy > Matthew Knepley wrote: > > > On Tue, Apr 15, 2008 at 7:41 PM, Randall Mackie <rlmackie862 at gmail.com> > wrote: > > > > > Then what's the point of having 4 and 8 cores per cpu for parallel > > > computations then? I mean, I think I've done all I can to make > > > my code as efficient as possible. > > > > > > > I really advise reading the paper. It explicitly treats the case of > > blocking, and uses > > a simple model to demonstrate all the points I made. > > > > With a single, scalar sparse matrix, there is definitely no point at > > all of having > > multiple cores. However, this will speed up things like finite element > > integration. > > So, for instance, making this integration dominate your cost (like > > spectral element > > codes do) will show nice speedup. Ulrich Ruede has a great talk about this > on > > his website. > > > > Matt > > > > > > > I'm not quite sure I understand your comment about using blocks > > > or unassembled structures. > > > > > > > > > Randy > > > > > > > > > > > > > > > Matthew Knepley wrote: > > > > > > > > > > On Tue, Apr 15, 2008 at 7:19 PM, Randall Mackie > <rlmackie862 at gmail.com> > > > > > > > wrote: > > > > > > > > > > > > I'm running my PETSc code on a cluster of quad core Xeon's connected > > > > > by Infiniband. I hadn't much worried about the performance, because > > > > > everything seemed to be working quite well, but today I was > actually > > > > > comparing performance (wall clock time) for the same problem, but > on > > > > > different combinations of CPUS. > > > > > > > > > > I find that my PETSc code is quite scalable until I start to use > > > > > multiple cores/cpu. > > > > > > > > > > For example, the run time doesn't improve by going from 1 core/cpu > > > > > to 4 cores/cpu, and I find this to be very strange, especially > since > > > > > looking at top or Ganglia, all 4 cpus on each node are running at > 100% > > > > > almost > > > > > all of the time. I would have thought if the cpus were going all > out, > > > > > that I would still be getting much more scalable results. > > > > > > > > > > > > > > Those a really coarse measures. There is absolutely no way that all > cores > > > > are going 100%. Its easy to show by hand. Take the peak flop rate and > > > > this gives you the bandwidth needed to sustain that computation (if > > > > everything is perfect, like axpy). You will find that the chip > bandwidth > > > > is far below this. A nice analysis is in > > > > > > > > http://www.mcs.anl.gov/~kaushik/Papers/pcfd99_gkks.pdf > > > > > > > > > > > > > > > > > We are using mvapich-0.9.9 with infiniband. So, I don't know if > > > > > this is a cluster/Xeon issue, or something else. > > > > > > > > > > > > > > This is actually mathematics! How satisfying. The only way to improve > > > > this is to change the data structure (e.g. use blocks) or change the > > > > algorithm (e.g. use spectral elements and unassembled structures) > > > > > > > > Matt > > > > > > > > > > > > > > > > > Anybody with experience on this? > > > > > > > > > > Thanks, Randy M. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
