Weird - looks like it has done a comm_spawn and having trouble connecting 
between the jobs. I can check the basic code and make sure it is working - I 
seem to recall someone else recently talking about Rmpi changes causing 
problems (different ones than this, IIRC), so you might want to search our user 
archives for rmpi to see what they ran into. Not sure what rmpi changed, or why.

On Jul 26, 2012, at 2:41 PM, Brock Palen wrote:

> I have ran into a problem using Rmpi with OpenMPI (trying to get snow 
> running).
> 
> I built OpenMPI following another post where I built static:
> 
> ./configure --prefix=$INSTALL/gcc-4.4.6-static 
> --mandir=$INSTALL/gcc-4.4.6-static/man --with-tm=/usr/local/torque/ 
> --with-openib --with-psm --enable-static CC=gcc CXX=g++ FC=gfortran 
> F77=gfortran
> 
> Rmpi/snow work fine when I run on a single node.  When I span more than one 
> node I get nasty errors (pasted below).
> 
> I tested this mpi install with a simple hello world and that works.  Any 
> thoughts what is different about Rmpi/snow that could cause this?
> 
> [nyx0400.engin.umich.edu:11927] [[48116,0],4] ORTE_ERROR_LOG: Not found in 
> file routed_binomial.c at line 386
> [nyx0400.engin.umich.edu:11927] [[48116,0],4]:route_callback tried routing 
> message from [[48116,2],16] to [[48116,1],0]:16, can't find route
> [nyx0405.engin.umich.edu:07707] [[48116,0],8] ORTE_ERROR_LOG: Not found in 
> file routed_binomial.c at line 386
> [nyx0405.engin.umich.edu:07707] [[48116,0],8]:route_callback tried routing 
> message from [[48116,2],32] to [[48116,1],0]:16, can't find route
> [0] 
> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f)
>  [0x2b7e9209e0df]
> [1] 
> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x9f77a)
>  [0x2b7e9206577a]
> [2] 
> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(mca_oob_tcp_msg_recv_complete+0x27f)
>  [0x2b7e920404af]
> [3] 
> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(+0x7bed2)
>  [0x2b7e92041ed2]
> [4] 
> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_event_base_loop+0x238)
>  [0x2b7e92087e38]
> [5] 
> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(orte_daemon+0x8d8)
>  [0x2b7e92016768]
> [6] func:orted(main+0x66) [0x400966]
> [7] func:/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d39c1ecdd]
> [8] func:orted() [0x400839]
> [nyx0397.engin.umich.edu:06959] [[48116,0],1] ORTE_ERROR_LOG: Not found in 
> file routed_binomial.c at line 386
> [nyx0397.engin.umich.edu:06959] [[48116,0],1]:route_callback tried routing 
> message from [[48116,2],7] to [[48116,1],0]:16, can't find route
> [nyx0401.engin.umich.edu:07782] [[48116,0],5] ORTE_ERROR_LOG: Not found in 
> file routed_binomial.c at line 386
> [nyx0401.engin.umich.edu:07782] [[48116,0],5]:route_callback tried routing 
> message from [[48116,2],23] to [[48116,1],0]:16, can't find route
> [nyx0406.engin.umich.edu:07743] [[48116,0],9] ORTE_ERROR_LOG: Not found in 
> file routed_binomial.c at line 386
> [nyx0406.engin.umich.edu:07743] [[48116,0],9]:route_callback tried routing 
> message from [[48116,2],39] to [[48116,1],0]:16, can't find route
> [0] 
> func:/home/software/rhel6/openmpi-1.6.0/gcc-4.4.6-static/lib/libopen-rte.so.4(opal_backtrace_print+0x1f)
>  [0x2ae2ad17d0df]
> 
> 
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to