Wen: Do you use superlu_dist as parallel direct solver? Suggest also install mumps. (need F90 compiler, configure petsc with '--download-blacs --download-scalapack --download-mumps'). When superlu_dist fails, switch to mumps (use runtime option '-pc_factor_mat_solver_package mumps'). If both solvers fail, something might be wrong with your model or code.
Hong > > > I reported this several days ago and I found my code just hanged inside > Super LU Dist solve. For the test purpose, I let my code keep on solving a > same linear system many times. My code will still hang at solving step but > not at the same stage every time. My code was distributed on 4 nodes and > each node had 4 processes(totally 16 processes). Before it gets stuck, one > process will disappear, which means that I can no longer see it by the top > command. The Other 15 processes are still running. I think those processes > might not know that one has been lost and just keep on waiting for it. It > looks like the cluster system kills that process without giving me any > error information. I am pretty sure that the memory is quite big enough for > my calculation (each core has 6GB), so I cannot figure out what causes. I > have very little knowledge about the cluster system and could you give me > any hints on this issue. Is this the problem with PETSc, Super LU or the > cluster? Thanks. > > Regards, > > Wen > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120405/568a8082/attachment.htm>
