Hi, Am I right to say that despite all the hype about multi-core processors, they can't speed up solving of linear eqns? It's not possible to get a 2x speedup when using 2 cores. And is this true for all types of linear equation solver besides PETSc? What about parallel direct solvers (e.g. MUMPS) or those which uses openmp instead of mpich? Well, I just can't help feeling disappointed if that's the case...
Also, with a smart enough LSF scheduler, I will be assured of getting separate processors ie 1 core from each different processor instead of 2-4 cores from just 1 processor. In that case, if I use 1 core from processor A and 1 core from processor B, I should be able to get a decent speedup of more than 1, is that so? This option is also better than using 2 or even 4 cores from the same processor. Thank you very much. Satish Balay wrote: > On Wed, 16 Apr 2008, Ben Tay wrote: > > >> Hi Satish, thank you very much for helping me run the ex2f.F code. >> >> I think I've a clearer picture now. I believe I'm running on Dual-Core Intel >> Xeon 5160. The quad core is only on atlas3-01 to 04 and there's only 4 of >> them. I guess that the lower peak is because I'm using Xeon 5160, while you >> are using Xeon X5355. >> > > I'm still a bit puzzled. I just ran the same binary on a 2 dualcore > xeon 5130 machine [which should be similar to your 5160 machine] and > get the following: > > [balay at n001 ~]$ grep MatMult log* > log.1:MatMult 1192 1.0 1.0591e+01 1.0 3.86e+09 1.0 0.0e+00 > 0.0e+00 0.0e+00 14 11 0 0 0 14 11 0 0 0 364 > log.2:MatMult 1217 1.0 6.3982e+00 1.0 1.97e+09 1.0 2.4e+03 > 4.8e+03 0.0e+00 14 11100100 0 14 11100100 0 615 > log.4:MatMult 969 1.0 4.7780e+00 1.0 7.84e+08 1.0 5.8e+03 > 4.8e+03 0.0e+00 14 11100100 0 14 11100100 0 656 > [balay at n001 ~]$ > > >> You mention about the speedups for MatMult and compare between KSPSolve. Are >> these the only things we have to look at? Because I see that some other event >> such as VecMAXPY also takes up a sizable % of the time. To get an accurate >> speedup, do I just compare the time taken by KSPSolve between different no. >> of >> processors or do I have to look at other events such as MatMult as well? >> > > Sometimes we look at individual components like MatMult() VecMAXPY() > to understand whats hapenning in each stage - and at KSPSolve() to > look at the agregate performance for the whole solve [which includes > MatMult VecMAXPY etc..]. Perhaps I should have also looked at > VecMDot() aswell - at 48% of runtime - its the biggest contributor to > KSPSolve() for your run. > > Its easy to get lost in the details of log_summary. Looking for > anamolies is one thing. Plotting scalability charts for the solver is > something else.. > > >> In summary, due to load imbalance, my speedup is quite bad. So maybe I'll >> just >> send your results to my school's engineer and see if they could do anything. >> For my part, I guess I'll just 've to wait? >> > > Yes - load imbalance at MatMult level is bad. On 4 proc run you have > ratio = 3.6 . This implies - there is one of the mpi-tasks is 3.6 > times slower than the other task [so all speedup is lost here] > > You could try the latest mpich2 [1.0.7] - just for this SMP > experiment, and see if it makes a difference. I've built mpich2 with > [default gcc/gfortran and]: > > ./configure --with-device=ch3:nemesis:newtcp -with-pm=gforker > > There could be something else going on on this machine thats messing > up load-balance for basic petsc example.. > > Satish > > >
