On Thu, Mar 30, 2017 at 3:05 AM, Jose E. Roman <[email protected]> wrote:
> > > El 30 mar 2017, a las 9:27, Toon Weyens <[email protected]> > escribió: > > > > Hi, thanks for the answer. > > > > I use MUMPS as a PC. The options -ksp_converged_reason, > -ksp_monitor_true_residual and -ksp_view are not used. > > > > The difference between the log_view outputs of running a simple solution > with 1, 2, 3 or 4 MPI procs is attached (debug version). > > > > I can see that with 2 procs it takes about 22 seconds, versus 7 seconds > for 1 proc. For 3 and 4 the situation is worse: 29 and 37 seconds. > > > > Looks like the difference is mainly in the BVmult and especially in the > BVorthogonalize routines: > > > > BVmult takes 1, 6.5, 10 or even a whopping 17 seconds for the different > number of proceses > > BVorthogonalize takes 1, 4, 6, 10. > > > > Calculating the preconditioner does not take more time for different > number of proceses, and applying it only slightly increases. So it cannot > be mumps' fault... > > > > Does this makes sense? Is there any way to improve this? > > > > Thanks! > > Cannot trust performance data in a debug build: > Yes, you should definitely make another build configured using --with-debugging=no. What do you get for STREAMS on this machine make streams NP=4 >From this data, it looks like you have already saturated the bandwidth at 2 procs. Thanks, Matt > > ########################################################## > # # > # WARNING!!! # > # # > # This code was compiled with a debugging option, # > # To get timing results run ./configure # > # using --with-debugging=no, the performance will # > # be generally two or three times faster. # > # # > ########################################################## > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
