Hi, thanks for the answer. I use MUMPS as a PC. The options -ksp_converged_reason, -ksp_monitor_true_residual and -ksp_view are not used.
The difference between the log_view outputs of running a simple solution with 1, 2, 3 or 4 MPI procs is attached (debug version). I can see that with 2 procs it takes about 22 seconds, versus 7 seconds for 1 proc. For 3 and 4 the situation is worse: 29 and 37 seconds. Looks like the difference is mainly in the BVmult and especially in the BVorthogonalize routines: BVmult takes 1, 6.5, 10 or even a whopping 17 seconds for the different number of proceses BVorthogonalize takes 1, 4, 6, 10. Calculating the preconditioner does not take more time for different number of proceses, and applying it only slightly increases. So it cannot be mumps' fault... Does this makes sense? Is there any way to improve this? Thanks! On Wed, Mar 29, 2017 at 3:20 PM Matthew Knepley <[email protected]> wrote: On Wed, Mar 29, 2017 at 6:58 AM, Toon Weyens <[email protected]> wrote: Dear Jose, Thanks for the answer. I am looking for the smallest real, indeed. I have, just now, accidentally figured out that I can get correct convergence by increasing NCV to higher values, so that's covered! I thought I had checked this before, but apparently not. It's converging well now, and rather fast (still about 8 times faster than Krylov-Schur). The issue now is that it scales rather badly: If I use 2 or more MPI processes, the time required to solve it goes up drastically. A small test case, on my Ubuntu 16.04 laptop, takes 10 seconds (blazing fast) for 1 MPI process, 25 for 2, 33 for 3, 59 for 4, etc... It is a machine with 8 cores, so i don't really understand why this is. For any scalability question, we need to see the output of -log_view -ksp_view -ksp_monitor_true_residual -ksp_converged_reason and other EPS options which I forget unfortunately. What seems likely here is that you are using a PC which is not scalable, so iteration would be going up. Thanks, Matt Are there other methods that can actually maintain the time required to solve for multiple MPI process? Or, preferable, decrease it (why else would I use multiple processes if not for memory restrictions)? I will never have to do something bigger than a generalized non-Hermitian ev problem of, let's say, 5000 blocks of 200x200 complex values per block, and a band size of about 11 blocks wide (so a few GB per matrix max). Thanks so much! On Wed, Mar 29, 2017 at 9:54 AM Jose E. Roman <[email protected]> wrote: > El 29 mar 2017, a las 9:08, Toon Weyens <[email protected]> escribió: > > I started looking for alternatives from the standard Krylov-Schur method to solve the generalized eigenvalue problem Ax = kBx in my code. These matrices have a block-band structure (typically 5, 7 or 9 blocks wide, with block sizes of the order 20) of size typically 1000 blocks. This eigenvalue problem results from the minimization of the energy of a perturbed plasma-vacuum system in order to investigate its stability. So far, I've not taken advantage of the Hermiticity of the problem. > > For "easier" problems, especially the Generalized Davidson method converges like lightning, sometimes up to 100 times faster than Krylov-Schur. > > However, for slightly more complicated problems, GD converges to the wrong eigenpair: There is certainly an eigenpair with an eigenvalue lower than 0 (i.e. unstable), but the solver never gets below some small, positive value, to which it wrongly converges. I would need to know the settings you are using. Are you doing smallest_real? Maybe you can try target_magnitude with harmonic extraction. > > Is it possible to improve this behavior? I tried changing the preconditioner, but it did not work. > > Might it be possible to use Krylov-Schur until reaching some precision, and then switching to JD to quickly converge? Yes, you can do this, using EPSSetInitialSpace() in the second solve. But, depending on the settings, this may not buy you much. Jose > > Thanks! -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
1_procs
Description: Binary data
2_procs
Description: Binary data
3_procs
Description: Binary data
4_procs
Description: Binary data
