Sorry - I forgot that you built from a tarball, and so debug isn't enabled by 
default. You need to configure --enable-debug.

On Dec 14, 2012, at 1:52 PM, Daniel Davidson <dani...@igb.uiuc.edu> wrote:

> Oddly enough, adding this debugging info, lowered the number of processes 
> that can be used down to 42 from 46.  When I run the MPI, it fails giving 
> only the information that follows:
> 
> [root@compute-2-1 ssh]# /home/apps/openmpi-1.6.3/bin/mpirun -host 
> compute-2-0,compute-2-1 -v  -np 44 --leave-session-attached -mca 
> odls_base_verbose 5 hostname
> [compute-2-1.local:44374] mca:base:select:( odls) Querying component [default]
> [compute-2-1.local:44374] mca:base:select:( odls) Query of component 
> [default] set priority to 1
> [compute-2-1.local:44374] mca:base:select:( odls) Selected component [default]
> [compute-2-0.local:28950] mca:base:select:( odls) Querying component [default]
> [compute-2-0.local:28950] mca:base:select:( odls) Query of component 
> [default] set priority to 1
> [compute-2-0.local:28950] mca:base:select:( odls) Selected component [default]
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> compute-2-1.local
> 
> 
> On 12/14/2012 03:18 PM, Ralph Castain wrote:
>> It wouldn't be ssh - in both cases, only one ssh is being done to each node 
>> (to start the local daemon). The only difference is the number of 
>> fork/exec's being done on each node, and the number of file descriptors 
>> being opened to support those fork/exec's.
>> 
>> It certainly looks like your limits are high enough. When you say it 
>> "fails", what do you mean - what error does it report? Try adding:
>> 
>> --leave-session-attached -mca odls_base_verbose 5
>> 
>> to your cmd line - this will report all the local proc launch debug and 
>> hopefully show you a more detailed error report.
>> 
>> 
>> On Dec 14, 2012, at 12:29 PM, Daniel Davidson <dani...@igb.uiuc.edu> wrote:
>> 
>>> I have had to cobble together two machines in our rocks cluster without 
>>> using the standard installation, they have efi only bios on them and rocks 
>>> doesnt like that, so it is the only workaround.
>>> 
>>> Everything works great now, except for one thing.  MPI jobs (openmpi or 
>>> mpich) fail when started from one of these nodes (via qsub or by logging in 
>>> and running the command) if 24 or more processors are needed on another 
>>> system.  However if the originator of the MPI job is the headnode or any of 
>>> the preexisting compute nodes, it works fine.  Right now I am guessing ssh 
>>> client or ulimit problems, but I cannot find any difference.  Any help 
>>> would be greatly appreciated.
>>> 
>>> compute-2-1 and compute-2-0 are the new nodes
>>> 
>>> Examples:
>>> 
>>> This works, prints 23 hostnames from each machine:
>>> [root@compute-2-1 ~]# /home/apps/openmpi-1.6.3/bin/mpirun -host 
>>> compute-2-0,compute-2-1 -np 46 hostname
>>> 
>>> This does not work, prints 24 hostnames for compute-2-1
>>> [root@compute-2-1 ~]# /home/apps/openmpi-1.6.3/bin/mpirun -host 
>>> compute-2-0,compute-2-1 -np 48 hostname
>>> 
>>> These both work, print 64 hostnames from each node
>>> [root@biocluster ~]# /home/apps/openmpi-1.6.3/bin/mpirun -host 
>>> compute-2-0,compute-2-1 -np 128 hostname
>>> [root@compute-0-2 ~]# /home/apps/openmpi-1.6.3/bin/mpirun -host 
>>> compute-2-0,compute-2-1 -np 128 hostname
>>> 
>>> [root@compute-2-1 ~]# ulimit -a
>>> core file size          (blocks, -c) 0
>>> data seg size           (kbytes, -d) unlimited
>>> scheduling priority             (-e) 0
>>> file size               (blocks, -f) unlimited
>>> pending signals                 (-i) 16410016
>>> max locked memory       (kbytes, -l) unlimited
>>> max memory size         (kbytes, -m) unlimited
>>> open files                      (-n) 4096
>>> pipe size            (512 bytes, -p) 8
>>> POSIX message queues     (bytes, -q) 819200
>>> real-time priority              (-r) 0
>>> stack size              (kbytes, -s) unlimited
>>> cpu time               (seconds, -t) unlimited
>>> max user processes              (-u) 1024
>>> virtual memory          (kbytes, -v) unlimited
>>> file locks                      (-x) unlimited
>>> 
>>> [root@compute-2-1 ~]# more /etc/ssh/ssh_config
>>> Host *
>>>        CheckHostIP             no
>>>        ForwardX11              yes
>>>        ForwardAgent            yes
>>>        StrictHostKeyChecking   no
>>>        UsePrivilegedPort       no
>>>        Protocol                2,1
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to