Thomas, I'm missing one point... You run N sequential factorizations (i.e. each has its own matrix to work with and no need to communicate?) independently within ONE node? Or there are N factorizations that run on N nodes?
Jed, > MUMPS uses MPI_Iprobe on MPI_COMM_WORLD (hard-coded). Any reason they do it that way? Which part of the code is that (i.e. analysis/factorization/solution.)? Regards, Alexander On 21.12.2012 16:51, Thomas Witkowski wrote: > I use a modified MPICH version. On the system I use for these > benchmarks I cannot use another MPI library. > > I'm not fixed to MUMPS. Superlu_dist, for example, works also > perfectly for this. But there is still the following problem I cannot > solve: When I increase the number of coarse space matrices, there > seems to be no scaling direct solver for this. Just to summaries: > - one coarse space matrix is created always by one "cluster" > consisting of four subdomanins/MPI tasks > - the four tasks are always local to one node, thus inter-node network > communication is not required for computing factorization and solve > - independent of the number of cluster, the coarse space matrices are > the same, have the same number of rows, nnz structure but possibly > different values > - there is NO load unbalancing > - the matrices must be factorized and there are a lot of solves (> > 100) with them > > It should be pretty clear, that computing LU factorization and solving > with it should scale perfectly. But at the moment, all direct solver I > tried (mumps, superlu_dist, pastix) are not able to scale. The loos of > scale is really worse, as you can see from the numbers I send before. > > Any ideas? Suggestions? Without a scaling solver method for these kind > of systems, my multilevel FETI-DP code is just more or less a joke, > only some orders of magnitude slower than standard FETI-DP method :) > > Thomas
