Hi, I used : 
 mpirun -np 200 -rf  --output-filename /mypath/myapplication
But, no files are printed out.
Can "--debug" option help me hear ? 
When I tried :
-bash-3.2$ mpirun 
-debug--------------------------------------------------------------------------A
 suitable debugger could not be found in your PATH.  Check the valuesspecified 
in the orte_base_user_debugger MCA parameter for the list ofdebuggers that was 
searched.--------------------------------------------------------------------------Any
 help is really appreciated. 
thanks

From: r...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Sat, 26 Mar 2011 15:45:39 -0600
To: us...@open-mpi.org
Subject: Re: [OMPI users] OMPI error terminate w/o reasons



If you use that mpirun option, mpirun will place the output from each rank into 
a -separate- file for you. Give it:
mpirun --output-filename /myhome/debug/run01
and in /myhome/debug, you will find files:
run01.0run01.1...
each with the output from the indicated rank.


On Mar 26, 2011, at 3:41 PM, Jack Bryan wrote:The cluster can print out all 
output into one file. 
But, checking them for bugs is very hard. 
The cluster also print out possible error messages into one file. 

But, sometimes the error file is empty , sometimes it is signal 9.
If I only run dummy tasks on worker nodes, no errors. 
If I run real task, sometimes processes are terminated w/o any errors before 
the program normally exit.Sometimes, the program get signal 9 but no other 
error messages. 
It is weird. 
Any help is really appreciated. 
Jack
From: r...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Sat, 26 Mar 2011 15:18:53 -0600
To: us...@open-mpi.org
Subject: Re: [OMPI users] OMPI error terminate w/o reasons

I don't know, but Ashley may be able to help - or you can see his web site for 
instructions.
Alternatively, since you can put print statements into your code, have you 
considered using mpirun's option to direct output from each rank into its own 
file? Look at "mpirun -h" for the options.
   -output-filename|--output-filename <arg0>                           Redirect 
output from application processes into                         filename.rank

On Mar 26, 2011, at 2:48 PM, Jack Bryan wrote:Is it possible to enable padb to 
print out the stack trace and other program execute information into a file ?
I can run the program in gdb as this: 
mpirun -np 200 -e gdb ./myapplication 
How to make gdb print out the debug information to a file ? So that I can check 
it when the program is terminated. 
thanks
Jack

From: r...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Sat, 26 Mar 2011 13:56:13 -0600
To: us...@open-mpi.org
Subject: Re: [OMPI users] OMPI error terminate w/o reasons

You don't need to install anything on a system folder - you can just install it 
in your home directory, assuming that is accessible on the remote nodes.
As for the script - unless you can somehow modify it to allow you to run under 
a debugger, I am afraid you are completely out of luck.

On Mar 26, 2011, at 12:54 PM, Jack Bryan wrote:Hi, 
I am working on a cluster, where I am not allowed to install software on system 
folder. 
My Open MPI is 1.3.4. 
I have a very quick of the padb on http://padb.pittman.org.uk/ . 
Does it require some software install on the cluster in order to use it ? 
I cannot use command-line to run job on the lcuster , but only script.
thanks

From: r...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Sat, 26 Mar 2011 12:12:11 -0600
To: us...@open-mpi.org
Subject: Re: [OMPI users] OMPI error terminate w/o reasons

Have you tried a parallel debugger such as padb?
On Mar 26, 2011, at 10:34 AM, Jack Bryan wrote:Hi, 
I have tried this. But, the printout from 200 parallel processes make it very 
hard to locate the possible bug. 
They may not stop at the same point when the program got signal 9.
So, even though I can figure out the print out statements from all200 
processes, so many different locations where the processesare stopped make it 
harder to find out some hints about the bug. 
Are there some other programming tricks, which can help me narrow down to the 
doubt points ASAP.Any help is appreciated. 
Jack
From: r...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Sat, 26 Mar 2011 07:53:40 -0600
To: us...@open-mpi.org
Subject: Re: [OMPI users] OMPI error terminate w/o reasons

Try adding some print statements so you can see where the error occurs.
On Mar 25, 2011, at 11:49 PM, Jack Bryan wrote:Hi , All: 
I running a Open MPI (1.3.4) program by 200 parallel processes. 
But, the program is terminated with 
--------------------------------------------------------------------------mpirun
 noticed that process rank 0 with PID 77967 on node n342 exited on signal 9 
(Killed).--------------------------------------------------------------------------
After searching, the signal 9 means: 
the process is currently in an unworkable state and should be terminated with 
extreme prejudice
 If a process does not respond to any other termination signals, sending it a 
SIGKILL signal will almost always cause it to go away.
 The system will generate SIGKILL for a process itself under some unusual 
conditions where the program cannot possibly continue to run (even to run a 
signal handler). 
But, the error message does not indicate any possible reasons for the 
termination. 
There is a FOR loop in the main() program, if the loop number is small (< 200), 
the program works well, but if it becomes lager and larger, the program will 
got SIGKILL. 
The cluster where I am running the MPI program does not allow running debug 
tools. 
If I run it on a workstation, it will take a very very long time (for > 200 
loops) in order to get the error occur again. 
What can I do to find the possible bugs ? 
Any help is really appreciated. 
thanks
Jack




_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________ users mailing list 
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________ users mailing list 
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________ users mailing list 
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________ users mailing list 
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users                              
          

Reply via email to