Re: [OMPI users] Signal 13

2007-03-20 Thread Jeff Squyres
un command line.

2. execute or access the various binaries and/or libraries.  
This is

usually
caused when someone installs OpenMPI as root, and then tries  
to execute

as
a
non-root user. Best thing here is to either run through the  
installation
directory and add the correct permissions (assuming it is a  
system-level
install), or reinstall as the non-root user (if the install  
is solely for

you anyway).

You can also set --debug-daemons on the mpirun command line  
to get more

diagnostic output from the daemons and then send that along.

BTW: if possible, it helps us to advise you if we know which  
version of

OpenMPI you are using. ;-)

Hope that helps.
Ralph




On 3/15/07 1:51 PM, "David Bronke" <whitel...@gmail.com> wrote:

Ok, now that I've figured out what the signal means, I'm  
wondering
exactly what is running into permission problems... the  
program I'm
running doesn't use any functions except printf, sprintf,  
and MPI_*...
I was thinking that possibly changes to permissions on  
certain /dev
entries in newer distros might cause this, but I'm not even  
sure what

/dev entries would be used by MPI.

On 3/15/07, McCalla, Mac <macmcca...@hess.com> wrote:

Hi,
If the perror command is available on your system  
it will tell
you what the message is associated with the signal value.   
On my system

RHEL4U3, it is permission denied.

HTH,

mac mccalla

-Original Message-
From: users-boun...@open-mpi.org [mailto:users- 
boun...@open-mpi.org] On

Behalf Of David Bronke
Sent: Thursday, March 15, 2007 12:25 PM
To: us...@open-mpi.org
Subject: [OMPI users] Signal 13

I've been trying to get OpenMPI working on two of the  
computers at a

lab
I help administer, and I'm running into a rather large  
issue. When
running anything using mpirun as a normal user, I get the  
following

output:


$ mpirun --no-daemonize --host

localhost,localhost,localhost,localhost,localhost,localhost,localhost 
,l>>>>>>>>

o

calhost
/workspace/bronke/mpi/hello
mpirun noticed that job rank 0 with PID 0 on node  
"localhost" exited on

signal 13.
[trixie:18104] ERROR: A daemon on node localhost failed to  
start as

expected.
[trixie:18104] ERROR: There may be more information  
available from

[trixie:18104] ERROR: the remote shell (see above).
[trixie:18104] The daemon received a signal 13.
8 additional processes aborted (not shown)


However, running the same exact command line as root works  
fine:



$ sudo mpirun --no-daemonize --host

localhost,localhost,localhost,localhost,localhost,localhost,localhost 
,l>>>>>>>>

o

calhost
/workspace/bronke/mpi/hello
Password:
p is 8, my_rank is 0
p is 8, my_rank is 1
p is 8, my_rank is 2
p is 8, my_rank is 3
p is 8, my_rank is 6
p is 8, my_rank is 7
Greetings from process 1!

Greetings from process 2!

Greetings from process 3!

p is 8, my_rank is 5
p is 8, my_rank is 4
Greetings from process 4!

Greetings from process 5!

Greetings from process 6!

Greetings from process 7!


I've looked up signal 13, and have found that it is  
apparently SIGPIPE;

I also found a thread on the LAM-MPI site:
http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
However, this thread seems to indicate that the problem  
would be in the
application, (/workspace/bronke/mpi/hello in this case)  
but there are

no
pipes in use in this app, and the fact that it works as  
expected as

root
doesn't seem to fit either. I have tried running mpirun  
with --verbose
and it doesn't show any more output than without it, so  
I've run into a
sort of dead-end on this issue. Does anyone know of any  
way I can

figure
out what's going wrong or how I can fix it?

Thanks!
--
David H. Bronke
Lead Programmer
G33X Nexus Entertainment
http://games.g33xnexus.com/precursors/


v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8 
+9ORPa22s6MSr>>>>>>>>

7

p6
hackerkey.com
Support Web Standards! http://www.webstandards.org/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
David H. Bronke
Lead Programmer
G33X Nexus Entertainment
http://games.g33xnexus.com/precursors/

v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8 
+9ORPa22s6MSr7p6

hackerkey.com
Support Web Standards! http://www.webstandards.org/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Signal 13

2007-03-18 Thread David Bronke
gnal 13.
>>>>> mpirun noticed that job rank 6 with PID 0 on node "localhost" exited
>>>>> on signal 13.
>>>>> [trixie:25228] ERROR: A daemon on node localhost failed to start as
>>>>> expected.
>>>>> [trixie:25228] ERROR: There may be more information available from
>>>>> [trixie:25228] ERROR: the remote shell (see above).
>>>>> [trixie:25228] The daemon received a signal 13.
>>>>> 1 additional process aborted (not shown)
>>>>> [trixie:25228] sess_dir_finalize: found proc session dir empty - deleting
>>>>> [trixie:25228] sess_dir_finalize: found job session dir empty - deleting
>>>>> [trixie:25228] sess_dir_finalize: found univ session dir empty - deleting
>>>>> [trixie:25228] sess_dir_finalize: found top session dir empty - deleting
>>>>>
>>>>> On 3/15/07, Ralph H Castain <r...@lanl.gov> wrote:
>>>>>> It isn't a /dev issue. The problem is likely that the system lacks
>>>>>> sufficient permissions to either:
>>>>>>
>>>>>> 1. create the Open MPI session directory tree. We create a hierarchy of
>>>>>> subdirectories for temporary storage used for things like your shared
>>>>>> memory
>>>>>> file - the location of the head of that tree can be specified at run
>>>>>> time,
>>>>>> but has a series of built-in defaults it can search if you don't specify
>>>>>> it
>>>>>> (we look at your environmental variables - e.g., TMP or TMPDIR - as well
>>>>>> as
>>>>>> the typical Linux/Unix places). You might check to see what your tmp
>>>>>> directory is, and that you have write permission into it. Alternatively,
>>>>>> you
>>>>>> can specify your own location (where you know you have permissions!) by
>>>>>> setting --tmpdir your-dir on the mpirun command line.
>>>>>>
>>>>>> 2. execute or access the various binaries and/or libraries. This is
>>>>>> usually
>>>>>> caused when someone installs OpenMPI as root, and then tries to execute
>>>>>> as
>>>>>> a
>>>>>> non-root user. Best thing here is to either run through the installation
>>>>>> directory and add the correct permissions (assuming it is a system-level
>>>>>> install), or reinstall as the non-root user (if the install is solely for
>>>>>> you anyway).
>>>>>>
>>>>>> You can also set --debug-daemons on the mpirun command line to get more
>>>>>> diagnostic output from the daemons and then send that along.
>>>>>>
>>>>>> BTW: if possible, it helps us to advise you if we know which version of
>>>>>> OpenMPI you are using. ;-)
>>>>>>
>>>>>> Hope that helps.
>>>>>> Ralph
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 3/15/07 1:51 PM, "David Bronke" <whitel...@gmail.com> wrote:
>>>>>>
>>>>>>> Ok, now that I've figured out what the signal means, I'm wondering
>>>>>>> exactly what is running into permission problems... the program I'm
>>>>>>> running doesn't use any functions except printf, sprintf, and MPI_*...
>>>>>>> I was thinking that possibly changes to permissions on certain /dev
>>>>>>> entries in newer distros might cause this, but I'm not even sure what
>>>>>>> /dev entries would be used by MPI.
>>>>>>>
>>>>>>> On 3/15/07, McCalla, Mac <macmcca...@hess.com> wrote:
>>>>>>>> Hi,
>>>>>>>> If the perror command is available on your system it will tell
>>>>>>>> you what the message is associated with the signal value.  On my system
>>>>>>>> RHEL4U3, it is permission denied.
>>>>>>>>
>>>>>>>> HTH,
>>>>>>>>
>>>>>>>> mac mccalla
>>>>>>>>
>>>>>>>> -Original Message-
>>>>>>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>>>>>>>> Behalf Of David Bronke
>>>>>>>> Sent: Thursday, March 15, 

Re: [OMPI users] Signal 13

2007-03-18 Thread Ralph Castain
gt;>>> expected.
>>>>> [trixie:25228] ERROR: There may be more information available from
>>>>> [trixie:25228] ERROR: the remote shell (see above).
>>>>> [trixie:25228] The daemon received a signal 13.
>>>>> 1 additional process aborted (not shown)
>>>>> [trixie:25228] sess_dir_finalize: found proc session dir empty - deleting
>>>>> [trixie:25228] sess_dir_finalize: found job session dir empty - deleting
>>>>> [trixie:25228] sess_dir_finalize: found univ session dir empty - deleting
>>>>> [trixie:25228] sess_dir_finalize: found top session dir empty - deleting
>>>>> 
>>>>> On 3/15/07, Ralph H Castain <r...@lanl.gov> wrote:
>>>>>> It isn't a /dev issue. The problem is likely that the system lacks
>>>>>> sufficient permissions to either:
>>>>>> 
>>>>>> 1. create the Open MPI session directory tree. We create a hierarchy of
>>>>>> subdirectories for temporary storage used for things like your shared
>>>>>> memory
>>>>>> file - the location of the head of that tree can be specified at run
>>>>>> time,
>>>>>> but has a series of built-in defaults it can search if you don't specify
>>>>>> it
>>>>>> (we look at your environmental variables - e.g., TMP or TMPDIR - as well
>>>>>> as
>>>>>> the typical Linux/Unix places). You might check to see what your tmp
>>>>>> directory is, and that you have write permission into it. Alternatively,
>>>>>> you
>>>>>> can specify your own location (where you know you have permissions!) by
>>>>>> setting --tmpdir your-dir on the mpirun command line.
>>>>>> 
>>>>>> 2. execute or access the various binaries and/or libraries. This is
>>>>>> usually
>>>>>> caused when someone installs OpenMPI as root, and then tries to execute
>>>>>> as
>>>>>> a
>>>>>> non-root user. Best thing here is to either run through the installation
>>>>>> directory and add the correct permissions (assuming it is a system-level
>>>>>> install), or reinstall as the non-root user (if the install is solely for
>>>>>> you anyway).
>>>>>> 
>>>>>> You can also set --debug-daemons on the mpirun command line to get more
>>>>>> diagnostic output from the daemons and then send that along.
>>>>>> 
>>>>>> BTW: if possible, it helps us to advise you if we know which version of
>>>>>> OpenMPI you are using. ;-)
>>>>>> 
>>>>>> Hope that helps.
>>>>>> Ralph
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 3/15/07 1:51 PM, "David Bronke" <whitel...@gmail.com> wrote:
>>>>>> 
>>>>>>> Ok, now that I've figured out what the signal means, I'm wondering
>>>>>>> exactly what is running into permission problems... the program I'm
>>>>>>> running doesn't use any functions except printf, sprintf, and MPI_*...
>>>>>>> I was thinking that possibly changes to permissions on certain /dev
>>>>>>> entries in newer distros might cause this, but I'm not even sure what
>>>>>>> /dev entries would be used by MPI.
>>>>>>> 
>>>>>>> On 3/15/07, McCalla, Mac <macmcca...@hess.com> wrote:
>>>>>>>> Hi,
>>>>>>>> If the perror command is available on your system it will tell
>>>>>>>> you what the message is associated with the signal value.  On my system
>>>>>>>> RHEL4U3, it is permission denied.
>>>>>>>> 
>>>>>>>> HTH,
>>>>>>>> 
>>>>>>>> mac mccalla
>>>>>>>> 
>>>>>>>> -Original Message-
>>>>>>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>>>>>>>> Behalf Of David Bronke
>>>>>>>> Sent: Thursday, March 15, 2007 12:25 PM
>>>>>>>> To: us...@open-mpi.org
>>>>>>>> Subject: [OMPI users] Signal 13
>>>>>>>> 
>>>>>>>> I've been trying to get Open

Re: [OMPI users] Signal 13

2007-03-18 Thread David Bronke
so set --debug-daemons on the mpirun command line to get more
>>>> diagnostic output from the daemons and then send that along.
>>>>
>>>> BTW: if possible, it helps us to advise you if we know which version of
>>>> OpenMPI you are using. ;-)
>>>>
>>>> Hope that helps.
>>>> Ralph
>>>>
>>>>
>>>>
>>>>
>>>> On 3/15/07 1:51 PM, "David Bronke" <whitel...@gmail.com> wrote:
>>>>
>>>>> Ok, now that I've figured out what the signal means, I'm wondering
>>>>> exactly what is running into permission problems... the program I'm
>>>>> running doesn't use any functions except printf, sprintf, and MPI_*...
>>>>> I was thinking that possibly changes to permissions on certain /dev
>>>>> entries in newer distros might cause this, but I'm not even sure what
>>>>> /dev entries would be used by MPI.
>>>>>
>>>>> On 3/15/07, McCalla, Mac <macmcca...@hess.com> wrote:
>>>>>> Hi,
>>>>>> If the perror command is available on your system it will tell
>>>>>> you what the message is associated with the signal value.  On my system
>>>>>> RHEL4U3, it is permission denied.
>>>>>>
>>>>>> HTH,
>>>>>>
>>>>>> mac mccalla
>>>>>>
>>>>>> -Original Message-
>>>>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>>>>>> Behalf Of David Bronke
>>>>>> Sent: Thursday, March 15, 2007 12:25 PM
>>>>>> To: us...@open-mpi.org
>>>>>> Subject: [OMPI users] Signal 13
>>>>>>
>>>>>> I've been trying to get OpenMPI working on two of the computers at a lab
>>>>>> I help administer, and I'm running into a rather large issue. When
>>>>>> running anything using mpirun as a normal user, I get the following
>>>>>> output:
>>>>>>
>>>>>>
>>>>>> $ mpirun --no-daemonize --host
>>>>>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
>>>>>> calhost
>>>>>> /workspace/bronke/mpi/hello
>>>>>> mpirun noticed that job rank 0 with PID 0 on node "localhost" exited on
>>>>>> signal 13.
>>>>>> [trixie:18104] ERROR: A daemon on node localhost failed to start as
>>>>>> expected.
>>>>>> [trixie:18104] ERROR: There may be more information available from
>>>>>> [trixie:18104] ERROR: the remote shell (see above).
>>>>>> [trixie:18104] The daemon received a signal 13.
>>>>>> 8 additional processes aborted (not shown)
>>>>>>
>>>>>>
>>>>>> However, running the same exact command line as root works fine:
>>>>>>
>>>>>>
>>>>>> $ sudo mpirun --no-daemonize --host
>>>>>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
>>>>>> calhost
>>>>>> /workspace/bronke/mpi/hello
>>>>>> Password:
>>>>>> p is 8, my_rank is 0
>>>>>> p is 8, my_rank is 1
>>>>>> p is 8, my_rank is 2
>>>>>> p is 8, my_rank is 3
>>>>>> p is 8, my_rank is 6
>>>>>> p is 8, my_rank is 7
>>>>>> Greetings from process 1!
>>>>>>
>>>>>> Greetings from process 2!
>>>>>>
>>>>>> Greetings from process 3!
>>>>>>
>>>>>> p is 8, my_rank is 5
>>>>>> p is 8, my_rank is 4
>>>>>> Greetings from process 4!
>>>>>>
>>>>>> Greetings from process 5!
>>>>>>
>>>>>> Greetings from process 6!
>>>>>>
>>>>>> Greetings from process 7!
>>>>>>
>>>>>>
>>>>>> I've looked up signal 13, and have found that it is apparently SIGPIPE;
>>>>>> I also found a thread on the LAM-MPI site:
>>>>>> http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
>>>>>> However, this thread seems to indicate that the problem would be in the
>>>>>> application, (/workspace/bronke/mpi/hello in this case) but there are no
>>>>>> pipes in use in this app, and the fact that it works as expected as root
>>>>>> doesn't seem to fit either. I have tried running mpirun with --verbose
>>>>>> and it doesn't show any more output than without it, so I've run into a
>>>>>> sort of dead-end on this issue. Does anyone know of any way I can figure
>>>>>> out what's going wrong or how I can fix it?
>>>>>>
>>>>>> Thanks!
>>>>>> --
>>>>>> David H. Bronke
>>>>>> Lead Programmer
>>>>>> G33X Nexus Entertainment
>>>>>> http://games.g33xnexus.com/precursors/
>>>>>>
>>>>>> v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7
>>>>>> p6
>>>>>> hackerkey.com
>>>>>> Support Web Standards! http://www.webstandards.org/
>>>>>> ___
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>> ___
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>
>>>>
>>>>
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
David H. Bronke
Lead Programmer
G33X Nexus Entertainment
http://games.g33xnexus.com/precursors/

v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7p6
hackerkey.com
Support Web Standards! http://www.webstandards.org/


Re: [OMPI users] Signal 13

2007-03-15 Thread Ralph Castain
n)
> [trixie:25228] sess_dir_finalize: found proc session dir empty - deleting
> [trixie:25228] sess_dir_finalize: found job session dir empty - deleting
> [trixie:25228] sess_dir_finalize: found univ session dir empty - deleting
> [trixie:25228] sess_dir_finalize: found top session dir empty - deleting
> 
> On 3/15/07, Ralph H Castain <r...@lanl.gov> wrote:
>> It isn't a /dev issue. The problem is likely that the system lacks
>> sufficient permissions to either:
>> 
>> 1. create the Open MPI session directory tree. We create a hierarchy of
>> subdirectories for temporary storage used for things like your shared memory
>> file - the location of the head of that tree can be specified at run time,
>> but has a series of built-in defaults it can search if you don't specify it
>> (we look at your environmental variables - e.g., TMP or TMPDIR - as well as
>> the typical Linux/Unix places). You might check to see what your tmp
>> directory is, and that you have write permission into it. Alternatively, you
>> can specify your own location (where you know you have permissions!) by
>> setting --tmpdir your-dir on the mpirun command line.
>> 
>> 2. execute or access the various binaries and/or libraries. This is usually
>> caused when someone installs OpenMPI as root, and then tries to execute as a
>> non-root user. Best thing here is to either run through the installation
>> directory and add the correct permissions (assuming it is a system-level
>> install), or reinstall as the non-root user (if the install is solely for
>> you anyway).
>> 
>> You can also set --debug-daemons on the mpirun command line to get more
>> diagnostic output from the daemons and then send that along.
>> 
>> BTW: if possible, it helps us to advise you if we know which version of
>> OpenMPI you are using. ;-)
>> 
>> Hope that helps.
>> Ralph
>> 
>> 
>> 
>> 
>> On 3/15/07 1:51 PM, "David Bronke" <whitel...@gmail.com> wrote:
>> 
>>> Ok, now that I've figured out what the signal means, I'm wondering
>>> exactly what is running into permission problems... the program I'm
>>> running doesn't use any functions except printf, sprintf, and MPI_*...
>>> I was thinking that possibly changes to permissions on certain /dev
>>> entries in newer distros might cause this, but I'm not even sure what
>>> /dev entries would be used by MPI.
>>> 
>>> On 3/15/07, McCalla, Mac <macmcca...@hess.com> wrote:
>>>> Hi,
>>>> If the perror command is available on your system it will tell
>>>> you what the message is associated with the signal value.  On my system
>>>> RHEL4U3, it is permission denied.
>>>> 
>>>> HTH,
>>>> 
>>>> mac mccalla
>>>> 
>>>> -Original Message-
>>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>>>> Behalf Of David Bronke
>>>> Sent: Thursday, March 15, 2007 12:25 PM
>>>> To: us...@open-mpi.org
>>>> Subject: [OMPI users] Signal 13
>>>> 
>>>> I've been trying to get OpenMPI working on two of the computers at a lab
>>>> I help administer, and I'm running into a rather large issue. When
>>>> running anything using mpirun as a normal user, I get the following
>>>> output:
>>>> 
>>>> 
>>>> $ mpirun --no-daemonize --host
>>>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
>>>> calhost
>>>> /workspace/bronke/mpi/hello
>>>> mpirun noticed that job rank 0 with PID 0 on node "localhost" exited on
>>>> signal 13.
>>>> [trixie:18104] ERROR: A daemon on node localhost failed to start as
>>>> expected.
>>>> [trixie:18104] ERROR: There may be more information available from
>>>> [trixie:18104] ERROR: the remote shell (see above).
>>>> [trixie:18104] The daemon received a signal 13.
>>>> 8 additional processes aborted (not shown)
>>>> 
>>>> 
>>>> However, running the same exact command line as root works fine:
>>>> 
>>>> 
>>>> $ sudo mpirun --no-daemonize --host
>>>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
>>>> calhost
>>>> /workspace/bronke/mpi/hello
>>>> Password:
>>>> p is 8, my_rank is 0
>>>> p is 8, my_rank is 1
>>>> p is 8, my_rank is 2
>>>> p is 8, my_rank is 3
>

Re: [OMPI users] Signal 13

2007-03-15 Thread David Bronke
e non-root user (if the install is solely for
you anyway).

You can also set --debug-daemons on the mpirun command line to get more
diagnostic output from the daemons and then send that along.

BTW: if possible, it helps us to advise you if we know which version of
OpenMPI you are using. ;-)

Hope that helps.
Ralph




On 3/15/07 1:51 PM, "David Bronke" <whitel...@gmail.com> wrote:

> Ok, now that I've figured out what the signal means, I'm wondering
> exactly what is running into permission problems... the program I'm
> running doesn't use any functions except printf, sprintf, and MPI_*...
> I was thinking that possibly changes to permissions on certain /dev
> entries in newer distros might cause this, but I'm not even sure what
> /dev entries would be used by MPI.
>
> On 3/15/07, McCalla, Mac <macmcca...@hess.com> wrote:
>> Hi,
>> If the perror command is available on your system it will tell
>> you what the message is associated with the signal value.  On my system
>> RHEL4U3, it is permission denied.
>>
>> HTH,
>>
>> mac mccalla
>>
>> -----Original Message-
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>> Behalf Of David Bronke
>> Sent: Thursday, March 15, 2007 12:25 PM
>> To: us...@open-mpi.org
>> Subject: [OMPI users] Signal 13
>>
>> I've been trying to get OpenMPI working on two of the computers at a lab
>> I help administer, and I'm running into a rather large issue. When
>> running anything using mpirun as a normal user, I get the following
>> output:
>>
>>
>> $ mpirun --no-daemonize --host
>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
>> calhost
>> /workspace/bronke/mpi/hello
>> mpirun noticed that job rank 0 with PID 0 on node "localhost" exited on
>> signal 13.
>> [trixie:18104] ERROR: A daemon on node localhost failed to start as
>> expected.
>> [trixie:18104] ERROR: There may be more information available from
>> [trixie:18104] ERROR: the remote shell (see above).
>> [trixie:18104] The daemon received a signal 13.
>> 8 additional processes aborted (not shown)
>>
>>
>> However, running the same exact command line as root works fine:
>>
>>
>> $ sudo mpirun --no-daemonize --host
>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
>> calhost
>> /workspace/bronke/mpi/hello
>> Password:
>> p is 8, my_rank is 0
>> p is 8, my_rank is 1
>> p is 8, my_rank is 2
>> p is 8, my_rank is 3
>> p is 8, my_rank is 6
>> p is 8, my_rank is 7
>> Greetings from process 1!
>>
>> Greetings from process 2!
>>
>> Greetings from process 3!
>>
>> p is 8, my_rank is 5
>> p is 8, my_rank is 4
>> Greetings from process 4!
>>
>> Greetings from process 5!
>>
>> Greetings from process 6!
>>
>> Greetings from process 7!
>>
>>
>> I've looked up signal 13, and have found that it is apparently SIGPIPE;
>> I also found a thread on the LAM-MPI site:
>> http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
>> However, this thread seems to indicate that the problem would be in the
>> application, (/workspace/bronke/mpi/hello in this case) but there are no
>> pipes in use in this app, and the fact that it works as expected as root
>> doesn't seem to fit either. I have tried running mpirun with --verbose
>> and it doesn't show any more output than without it, so I've run into a
>> sort of dead-end on this issue. Does anyone know of any way I can figure
>> out what's going wrong or how I can fix it?
>>
>> Thanks!
>> --
>> David H. Bronke
>> Lead Programmer
>> G33X Nexus Entertainment
>> http://games.g33xnexus.com/precursors/
>>
>> v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7
>> p6
>> hackerkey.com
>> Support Web Standards! http://www.webstandards.org/
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
David H. Bronke
Lead Programmer
G33X Nexus Entertainment
http://games.g33xnexus.com/precursors/

v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7p6
hackerkey.com
Support Web Standards! http://www.webstandards.org/


Re: [OMPI users] Signal 13

2007-03-15 Thread Ralph H Castain
It isn't a /dev issue. The problem is likely that the system lacks
sufficient permissions to either:

1. create the Open MPI session directory tree. We create a hierarchy of
subdirectories for temporary storage used for things like your shared memory
file - the location of the head of that tree can be specified at run time,
but has a series of built-in defaults it can search if you don't specify it
(we look at your environmental variables - e.g., TMP or TMPDIR - as well as
the typical Linux/Unix places). You might check to see what your tmp
directory is, and that you have write permission into it. Alternatively, you
can specify your own location (where you know you have permissions!) by
setting --tmpdir your-dir on the mpirun command line.

2. execute or access the various binaries and/or libraries. This is usually
caused when someone installs OpenMPI as root, and then tries to execute as a
non-root user. Best thing here is to either run through the installation
directory and add the correct permissions (assuming it is a system-level
install), or reinstall as the non-root user (if the install is solely for
you anyway).

You can also set --debug-daemons on the mpirun command line to get more
diagnostic output from the daemons and then send that along.

BTW: if possible, it helps us to advise you if we know which version of
OpenMPI you are using. ;-)

Hope that helps.
Ralph




On 3/15/07 1:51 PM, "David Bronke" <whitel...@gmail.com> wrote:

> Ok, now that I've figured out what the signal means, I'm wondering
> exactly what is running into permission problems... the program I'm
> running doesn't use any functions except printf, sprintf, and MPI_*...
> I was thinking that possibly changes to permissions on certain /dev
> entries in newer distros might cause this, but I'm not even sure what
> /dev entries would be used by MPI.
> 
> On 3/15/07, McCalla, Mac <macmcca...@hess.com> wrote:
>> Hi,
>> If the perror command is available on your system it will tell
>> you what the message is associated with the signal value.  On my system
>> RHEL4U3, it is permission denied.
>> 
>> HTH,
>> 
>> mac mccalla
>> 
>> -Original Message-
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>> Behalf Of David Bronke
>> Sent: Thursday, March 15, 2007 12:25 PM
>> To: us...@open-mpi.org
>> Subject: [OMPI users] Signal 13
>> 
>> I've been trying to get OpenMPI working on two of the computers at a lab
>> I help administer, and I'm running into a rather large issue. When
>> running anything using mpirun as a normal user, I get the following
>> output:
>> 
>> 
>> $ mpirun --no-daemonize --host
>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
>> calhost
>> /workspace/bronke/mpi/hello
>> mpirun noticed that job rank 0 with PID 0 on node "localhost" exited on
>> signal 13.
>> [trixie:18104] ERROR: A daemon on node localhost failed to start as
>> expected.
>> [trixie:18104] ERROR: There may be more information available from
>> [trixie:18104] ERROR: the remote shell (see above).
>> [trixie:18104] The daemon received a signal 13.
>> 8 additional processes aborted (not shown)
>> 
>> 
>> However, running the same exact command line as root works fine:
>> 
>> 
>> $ sudo mpirun --no-daemonize --host
>> localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
>> calhost
>> /workspace/bronke/mpi/hello
>> Password:
>> p is 8, my_rank is 0
>> p is 8, my_rank is 1
>> p is 8, my_rank is 2
>> p is 8, my_rank is 3
>> p is 8, my_rank is 6
>> p is 8, my_rank is 7
>> Greetings from process 1!
>> 
>> Greetings from process 2!
>> 
>> Greetings from process 3!
>> 
>> p is 8, my_rank is 5
>> p is 8, my_rank is 4
>> Greetings from process 4!
>> 
>> Greetings from process 5!
>> 
>> Greetings from process 6!
>> 
>> Greetings from process 7!
>> 
>> 
>> I've looked up signal 13, and have found that it is apparently SIGPIPE;
>> I also found a thread on the LAM-MPI site:
>> http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
>> However, this thread seems to indicate that the problem would be in the
>> application, (/workspace/bronke/mpi/hello in this case) but there are no
>> pipes in use in this app, and the fact that it works as expected as root
>> doesn't seem to fit either. I have tried running mpirun with --verbose
>> and it doesn't show any more output than without it, so I've run into a
>> sort of dead-end on this issue. Does anyone

Re: [OMPI users] Signal 13

2007-03-15 Thread David Bronke

Ok, now that I've figured out what the signal means, I'm wondering
exactly what is running into permission problems... the program I'm
running doesn't use any functions except printf, sprintf, and MPI_*...
I was thinking that possibly changes to permissions on certain /dev
entries in newer distros might cause this, but I'm not even sure what
/dev entries would be used by MPI.

On 3/15/07, McCalla, Mac <macmcca...@hess.com> wrote:

Hi,
If the perror command is available on your system it will tell
you what the message is associated with the signal value.  On my system
RHEL4U3, it is permission denied.

HTH,

mac mccalla

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of David Bronke
Sent: Thursday, March 15, 2007 12:25 PM
To: us...@open-mpi.org
Subject: [OMPI users] Signal 13

I've been trying to get OpenMPI working on two of the computers at a lab
I help administer, and I'm running into a rather large issue. When
running anything using mpirun as a normal user, I get the following
output:


$ mpirun --no-daemonize --host
localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
calhost
/workspace/bronke/mpi/hello
mpirun noticed that job rank 0 with PID 0 on node "localhost" exited on
signal 13.
[trixie:18104] ERROR: A daemon on node localhost failed to start as
expected.
[trixie:18104] ERROR: There may be more information available from
[trixie:18104] ERROR: the remote shell (see above).
[trixie:18104] The daemon received a signal 13.
8 additional processes aborted (not shown)


However, running the same exact command line as root works fine:


$ sudo mpirun --no-daemonize --host
localhost,localhost,localhost,localhost,localhost,localhost,localhost,lo
calhost
/workspace/bronke/mpi/hello
Password:
p is 8, my_rank is 0
p is 8, my_rank is 1
p is 8, my_rank is 2
p is 8, my_rank is 3
p is 8, my_rank is 6
p is 8, my_rank is 7
Greetings from process 1!

Greetings from process 2!

Greetings from process 3!

p is 8, my_rank is 5
p is 8, my_rank is 4
Greetings from process 4!

Greetings from process 5!

Greetings from process 6!

Greetings from process 7!


I've looked up signal 13, and have found that it is apparently SIGPIPE;
I also found a thread on the LAM-MPI site:
http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
However, this thread seems to indicate that the problem would be in the
application, (/workspace/bronke/mpi/hello in this case) but there are no
pipes in use in this app, and the fact that it works as expected as root
doesn't seem to fit either. I have tried running mpirun with --verbose
and it doesn't show any more output than without it, so I've run into a
sort of dead-end on this issue. Does anyone know of any way I can figure
out what's going wrong or how I can fix it?

Thanks!
--
David H. Bronke
Lead Programmer
G33X Nexus Entertainment
http://games.g33xnexus.com/precursors/

v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7
p6
hackerkey.com
Support Web Standards! http://www.webstandards.org/
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
David H. Bronke
Lead Programmer
G33X Nexus Entertainment
http://games.g33xnexus.com/precursors/

v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7p6
hackerkey.com
Support Web Standards! http://www.webstandards.org/


Re: [OMPI users] Signal 13

2007-03-15 Thread Mike Houston
I've been having similar issues with brand new FC5/6 and RHEL5 machines, 
but our FC4/RHEL4 machines are just fine.  On the FC5/6 RHEL5 machines, 
I can get things to run as root.  There must be some ACL or security 
setting issue that's enabled by default on the newer distros.  If I 
figure it out this weekend, I'll let you know.  If anyone else knows the 
solution, please post to the list.


-Mike

David Bronke wrote:

I've been trying to get OpenMPI working on two of the computers at a
lab I help administer, and I'm running into a rather large issue. When
running anything using mpirun as a normal user, I get the following
output:


$ mpirun --no-daemonize --host
localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost
/workspace/bronke/mpi/hello
mpirun noticed that job rank 0 with PID 0 on node "localhost" exited
on signal 13.
[trixie:18104] ERROR: A daemon on node localhost failed to start as expected.
[trixie:18104] ERROR: There may be more information available from
[trixie:18104] ERROR: the remote shell (see above).
[trixie:18104] The daemon received a signal 13.
8 additional processes aborted (not shown)


However, running the same exact command line as root works fine:


$ sudo mpirun --no-daemonize --host
localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost
/workspace/bronke/mpi/hello
Password:
p is 8, my_rank is 0
p is 8, my_rank is 1
p is 8, my_rank is 2
p is 8, my_rank is 3
p is 8, my_rank is 6
p is 8, my_rank is 7
Greetings from process 1!

Greetings from process 2!

Greetings from process 3!

p is 8, my_rank is 5
p is 8, my_rank is 4
Greetings from process 4!

Greetings from process 5!

Greetings from process 6!

Greetings from process 7!


I've looked up signal 13, and have found that it is apparently
SIGPIPE; I also found a thread on the LAM-MPI site:
http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
However, this thread seems to indicate that the problem would be in
the application, (/workspace/bronke/mpi/hello in this case) but there
are no pipes in use in this app, and the fact that it works as
expected as root doesn't seem to fit either. I have tried running
mpirun with --verbose and it doesn't show any more output than without
it, so I've run into a sort of dead-end on this issue. Does anyone
know of any way I can figure out what's going wrong or how I can fix
it?

Thanks!
  


[OMPI users] Signal 13

2007-03-15 Thread David Bronke

I've been trying to get OpenMPI working on two of the computers at a
lab I help administer, and I'm running into a rather large issue. When
running anything using mpirun as a normal user, I get the following
output:


$ mpirun --no-daemonize --host
localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost
/workspace/bronke/mpi/hello
mpirun noticed that job rank 0 with PID 0 on node "localhost" exited
on signal 13.
[trixie:18104] ERROR: A daemon on node localhost failed to start as expected.
[trixie:18104] ERROR: There may be more information available from
[trixie:18104] ERROR: the remote shell (see above).
[trixie:18104] The daemon received a signal 13.
8 additional processes aborted (not shown)


However, running the same exact command line as root works fine:


$ sudo mpirun --no-daemonize --host
localhost,localhost,localhost,localhost,localhost,localhost,localhost,localhost
/workspace/bronke/mpi/hello
Password:
p is 8, my_rank is 0
p is 8, my_rank is 1
p is 8, my_rank is 2
p is 8, my_rank is 3
p is 8, my_rank is 6
p is 8, my_rank is 7
Greetings from process 1!

Greetings from process 2!

Greetings from process 3!

p is 8, my_rank is 5
p is 8, my_rank is 4
Greetings from process 4!

Greetings from process 5!

Greetings from process 6!

Greetings from process 7!


I've looked up signal 13, and have found that it is apparently
SIGPIPE; I also found a thread on the LAM-MPI site:
http://www.lam-mpi.org/MailArchives/lam/2004/08/8486.php
However, this thread seems to indicate that the problem would be in
the application, (/workspace/bronke/mpi/hello in this case) but there
are no pipes in use in this app, and the fact that it works as
expected as root doesn't seem to fit either. I have tried running
mpirun with --verbose and it doesn't show any more output than without
it, so I've run into a sort of dead-end on this issue. Does anyone
know of any way I can figure out what's going wrong or how I can fix
it?

Thanks!
--
David H. Bronke
Lead Programmer
G33X Nexus Entertainment
http://games.g33xnexus.com/precursors/

v3sw5/7Hhw5/6ln4pr6Ock3ma7u7+8Lw3/7Tm3l6+7Gi2e4t4Mb7Hen5g8+9ORPa22s6MSr7p6
hackerkey.com
Support Web Standards! http://www.webstandards.org/