http://www.mcs.anl.gov/petsc/documentation/faq.html#computers
On Dec 3, 2012, at 1:08 PM, Jelena Slivka <slivkaje at gmail.com> wrote: > Thank you very much! > However, I have another question. I have a cluster of 4 nodes and each node > has 6 cores. If I run my code using 6 cores on one node (using the command > "mpiexec -n 6") it is much faster than running it on just one process (which > is expected). However, if I try running the code on multiple nodes (using > "mpiexec -f machinefile -ppn 4", where machinefile is the file which contains > the node names), it runs much slower than on just one process. This also > happens with tutorial examples. I have checked the number of iteration for > KSP solver when spread on multiple processors and it doesn't seem to be the > problem. Do you have any suggestions on what am I doing wrong? Are the > commands I am using wrong? > > > On Sat, Dec 1, 2012 at 6:03 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: > > We recommend following the directions > http://www.mcs.anl.gov/petsc/documentation/faq.html#schurcomplement for > computing a Schur complement; just skip the unneeded step. MUMPS supports a > parallel Cholesky but you can also use a parallel LU with MUMPS, PaSTIX or > SuperLU_Dist and those will work fine also. With current software Cholesky in > parallel is not tons better than LU so generally not worth monkeying with. > > Barry > > > On Dec 1, 2012, at 12:05 PM, Jelena Slivka <slivkaje at gmail.com> wrote: > > > Hello! > > I am trying to solve A*X = B where A and B are matrices, and then find > > trace of the resulting matrix X. My approach has been to partition matrix B > > in column vectors bi and then solve each system A*xi = bi. Then, for all > > vectors xi I would extract i-th element xi(i) and sum those elements in > > order to get Trace(X). > > Pseudo-code: > > 1) load matrices A and B > > 2) transpose matrix B (so that each right-hand side bi is in the row, as > > operation MatGetColumnVector is slow) > > 3) set up KSPSolve > > 4) create vector diagonal (in which xi(i) elements will be stored) > > 5) for each row i of matrix B owned by current process: > > - create vector bi by extracting row i from matrix B > > - apply KSPsolve to get xi > > - insert value xi(i) in diagonal vector (only the process which > > holds the ith value of vector x(i) should do so) > > 6) sum vector diagonal to get the trace. > > However, my code (attached, along with the test case) runs fine on one > > process, but hangs if started on multiple processes. Could you please help > > me figure out what am I doing wrong? > > Also, could you please tell me is it possible to use Cholesky factorization > > when running on multiple processes (I see that I cannot use it when I set > > the format of matrix A to MPIAIJ)? > > > > <Experiment.c><Abin><Bbin> > >
