Sorry - I forgot that you built from a tarball, and so debug isn't enabled by default. You need to configure --enable-debug.
On Dec 14, 2012, at 1:52 PM, Daniel Davidson <dani...@igb.uiuc.edu> wrote: > Oddly enough, adding this debugging info, lowered the number of processes > that can be used down to 42 from 46. When I run the MPI, it fails giving > only the information that follows: > > [root@compute-2-1 ssh]# /home/apps/openmpi-1.6.3/bin/mpirun -host > compute-2-0,compute-2-1 -v -np 44 --leave-session-attached -mca > odls_base_verbose 5 hostname > [compute-2-1.local:44374] mca:base:select:( odls) Querying component [default] > [compute-2-1.local:44374] mca:base:select:( odls) Query of component > [default] set priority to 1 > [compute-2-1.local:44374] mca:base:select:( odls) Selected component [default] > [compute-2-0.local:28950] mca:base:select:( odls) Querying component [default] > [compute-2-0.local:28950] mca:base:select:( odls) Query of component > [default] set priority to 1 > [compute-2-0.local:28950] mca:base:select:( odls) Selected component [default] > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > compute-2-1.local > > > On 12/14/2012 03:18 PM, Ralph Castain wrote: >> It wouldn't be ssh - in both cases, only one ssh is being done to each node >> (to start the local daemon). The only difference is the number of >> fork/exec's being done on each node, and the number of file descriptors >> being opened to support those fork/exec's. >> >> It certainly looks like your limits are high enough. When you say it >> "fails", what do you mean - what error does it report? Try adding: >> >> --leave-session-attached -mca odls_base_verbose 5 >> >> to your cmd line - this will report all the local proc launch debug and >> hopefully show you a more detailed error report. >> >> >> On Dec 14, 2012, at 12:29 PM, Daniel Davidson <dani...@igb.uiuc.edu> wrote: >> >>> I have had to cobble together two machines in our rocks cluster without >>> using the standard installation, they have efi only bios on them and rocks >>> doesnt like that, so it is the only workaround. >>> >>> Everything works great now, except for one thing. MPI jobs (openmpi or >>> mpich) fail when started from one of these nodes (via qsub or by logging in >>> and running the command) if 24 or more processors are needed on another >>> system. However if the originator of the MPI job is the headnode or any of >>> the preexisting compute nodes, it works fine. Right now I am guessing ssh >>> client or ulimit problems, but I cannot find any difference. Any help >>> would be greatly appreciated. >>> >>> compute-2-1 and compute-2-0 are the new nodes >>> >>> Examples: >>> >>> This works, prints 23 hostnames from each machine: >>> [root@compute-2-1 ~]# /home/apps/openmpi-1.6.3/bin/mpirun -host >>> compute-2-0,compute-2-1 -np 46 hostname >>> >>> This does not work, prints 24 hostnames for compute-2-1 >>> [root@compute-2-1 ~]# /home/apps/openmpi-1.6.3/bin/mpirun -host >>> compute-2-0,compute-2-1 -np 48 hostname >>> >>> These both work, print 64 hostnames from each node >>> [root@biocluster ~]# /home/apps/openmpi-1.6.3/bin/mpirun -host >>> compute-2-0,compute-2-1 -np 128 hostname >>> [root@compute-0-2 ~]# /home/apps/openmpi-1.6.3/bin/mpirun -host >>> compute-2-0,compute-2-1 -np 128 hostname >>> >>> [root@compute-2-1 ~]# ulimit -a >>> core file size (blocks, -c) 0 >>> data seg size (kbytes, -d) unlimited >>> scheduling priority (-e) 0 >>> file size (blocks, -f) unlimited >>> pending signals (-i) 16410016 >>> max locked memory (kbytes, -l) unlimited >>> max memory size (kbytes, -m) unlimited >>> open files (-n) 4096 >>> pipe size (512 bytes, -p) 8 >>> POSIX message queues (bytes, -q) 819200 >>> real-time priority (-r) 0 >>> stack size (kbytes, -s) unlimited >>> cpu time (seconds, -t) unlimited >>> max user processes (-u) 1024 >>> virtual memory (kbytes, -v) unlimited >>> file locks (-x) unlimited >>> >>> [root@compute-2-1 ~]# more /etc/ssh/ssh_config >>> Host * >>> CheckHostIP no >>> ForwardX11 yes >>> ForwardAgent yes >>> StrictHostKeyChecking no >>> UsePrivilegedPort no >>> Protocol 2,1 >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users