Now I can checkpoint NPB by decreasing the process numbers like this:
before: mpirun -am ft-enable-cr -np 128 is.C.128
after: mpirun -am ft-enable-cr -np 2 is.C.2
but when I checkpoint hpl, it still keep hanging there(I waited more than
12 hours, but it still hang thereI )
the command is : mpirun -am ft-enable-cr -np 4 hpl
There is enough free memory we can use when running this command.
I can't make sure what's the problem.
I would appreciate it very much if you would help me with it.
users mailing list