Slow speed after changing from serial to parallel (with ex2f.F)

Matthew Knepley Wed, 16 Apr 2008 08:48:37 -0500

On Wed, Apr 16, 2008 at 8:44 AM, Ben Tay <zonexo at gmail.com> wrote:
> Hi,
>
>  Am I right to say that despite all the hype about multi-core processors,
> they can't speed up solving of linear eqns? It's not possible to get a 2x
> speedup when using 2 cores. And is this true for all types of linear
> equation solver besides PETSc? What about parallel direct solvers (e.g.
> MUMPS) or those which uses openmp instead of mpich? Well, I just can't help
> feeling disappointed if that's the case...


Notice that Satish got much much better scaling than you did on our box here.
I think something is really wrong either with the installation of MPI
on that box
or something hardware-wise.

  Matt

>  Also, with a smart enough LSF scheduler, I will be assured of getting
> separate processors ie 1 core from each different processor instead of 2-4
> cores from just 1 processor. In that case, if I use 1 core from processor A
> and 1 core from processor B, I should be able to get a decent speedup of
> more than 1, is that so? This option is also better than using 2 or even 4
> cores from the same processor.
>
>  Thank you very much.
>
>  Satish Balay wrote:
>
> > On Wed, 16 Apr 2008, Ben Tay wrote:
> >
> >
> >
> > > Hi Satish, thank you very much for helping me run the ex2f.F code.
> > >
> > > I think I've a clearer picture now. I believe I'm running on Dual-Core
> Intel
> > > Xeon 5160. The quad core is only on atlas3-01 to 04 and there's only 4
> of
> > > them. I guess that the lower peak is because I'm using Xeon 5160, while
> you
> > > are using Xeon X5355.
> > >
> > >
> >
> > I'm still a bit puzzled. I just ran the same binary on a 2 dualcore
> > xeon 5130 machine [which should be similar to your 5160 machine] and
> > get the following:
> >
> > [balay at n001 ~]$ grep MatMult log*
> > log.1:MatMult             1192 1.0 1.0591e+01 1.0 3.86e+09 1.0 0.0e+00
> 0.0e+00 0.0e+00 14 11  0  0  0  14 11  0  0  0   364
> > log.2:MatMult             1217 1.0 6.3982e+00 1.0 1.97e+09 1.0 2.4e+03
> 4.8e+03 0.0e+00 14 11100100  0  14 11100100  0   615
> > log.4:MatMult              969 1.0 4.7780e+00 1.0 7.84e+08 1.0 5.8e+03
> 4.8e+03 0.0e+00 14 11100100  0  14 11100100  0   656
> > [balay at n001 ~]$
> >
> >
> > > You mention about the speedups for MatMult and compare between KSPSolve.
> Are
> > > these the only things we have to look at? Because I see that some other
> event
> > > such as VecMAXPY also takes up a sizable % of the time. To get an
> accurate
> > > speedup, do I just compare the time taken by KSPSolve between different
> no. of
> > > processors or do I have to look at other events such as MatMult as well?
> > >
> > >
> >
> > Sometimes we look at individual components like MatMult() VecMAXPY()
> > to understand whats hapenning in each stage - and at KSPSolve() to
> > look at the agregate performance for the whole solve [which includes
> > MatMult VecMAXPY etc..]. Perhaps I should have also looked at
> > VecMDot() aswell - at 48% of runtime - its the biggest contributor to
> > KSPSolve() for your run.
> >
> > Its easy to get lost in the details of log_summary. Looking for
> > anamolies is one thing. Plotting scalability charts for the solver is
> > something else..
> >
> >
> >
> > > In summary, due to load imbalance, my speedup is quite bad. So maybe
> I'll just
> > > send your results to my school's engineer and see if they could do
> anything.
> > > For my part, I guess I'll just 've to wait?
> > >
> > >
> >
> > Yes - load imbalance at MatMult level is bad. On 4 proc run you have
> > ratio = 3.6 . This implies - there is one of the mpi-tasks is 3.6
> > times slower than the other task [so all speedup is lost here]
> >
> > You could try the latest mpich2 [1.0.7] - just for this SMP
> > experiment, and see if it makes a difference. I've built mpich2 with
> > [default gcc/gfortran and]:
> >
> > ./configure --with-device=ch3:nemesis:newtcp -with-pm=gforker
> >
> > There could be something else going on on this machine thats messing
> > up load-balance for basic petsc example..
> >
> > Satish
> >
> >
> >
> >
>
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

Slow speed after changing from serial to parallel (with ex2f.F)

Reply via email to