On Tue, Aug 16, 2011 at 9:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > On Aug 16, 2011, at 5:14 PM, Jack Poulson wrote: > > > Hello all, > > > > I am working on a project that requires very fast sparse direct solves > and MUMPS and SuperLU_Dist haven't been cutting it. From what I've read, > when properly tuned, WSMP is significantly faster, particularly with > multiple right-hand sides on large machines. The obvious drawback is that > it's not open source, but the binaries seem to be readily available for most > platforms. > > > > Before I reinvent the wheel, I would like to check if anyone has already > done some work on adding it into PETSc. If not, its interface is quite > similar to MUMPS and I should be able to mirror most of that code. On the > other hand, there are a large number of platform-specific details that need > to be handled, so keeping things both portable and fast might be a > challenge. It seems that the CSC storage format should be used since it is > required for Hermitian matrices. > > > > Thanks, > > Jack > > Jack, > > By all means do it. That would be a nice thing to have. But be aware that > the WSMP folks have a reputation for exaggerating how much better their > software is so don't be surprised if after all that work it is not much > better. > > Good to know. I was somewhat worried about that, but perhaps it is a matter of getting all of the tuning parameters right. The manual does mention that performance is significantly degraded without tuning. I would sincerely hope no one would out right lie in their publications, e.g., this one: http://portal.acm.org/citation.cfm?id=1654061 > BTW: are you solving with many right hand sides? Maybe before you muck > with WSMP we should figure out how to get you access to the multiple right > hand side support of MUMPS (I don't know if SuperLU_Dist has it) so you can > speed up your current computations a good amount? Currently PETSc's > MatMatSolve() calls a separate solve for each right hand side with MUMPS. > > Barry > > I will eventually need to solve against many right-hand sides, but for now I am solving against one and it is still taking too long; in fact, not only does it take too long, but memory per core increased for fixed problem sizes as I increase the number of MPI processes (for both SuperLU_Dist and MUMPS). This was occurring for quasi2d Helmholtz problems over a couple hundred cores. My only logical explanation for this behavior is that the communication buffers grow proportional to the number of processes on each process, but I stress that this is just a guess. I tried reading through the MUMPS code and quickly gave up. Another problem with MUMPS is that requires the entire set of right-hand sides to reside on the root process...that will clearly not work for a billion degrees of freedom with several hundred RHSs. WSMP gets this part right and actually distributes those vectors. Jack -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110816/2aeeeabf/attachment.html>
