Works like a charm, thanks Tim! -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Tim Prins Sent: 01 April 2007 14:50 To: Open MPI Users Subject: Re: [OMPI users] Torque/OpenMPI[Scanned]
Hi Barry, The problem is the line: ncpus=`wc -l $PBS_NODEFILE` wc will print out the file name after the count. So ncpus gets "16 / var/spool/torque/aux//350.wc01" and your mpirun command will look like: mpirun -np 16 /var/spool/torque/aux//350.wc01 /home/test/hpcc-1.0.0/hpcc So mpirun will try to execute /var/spool/torque/aux//350.wc01 A solution to this is that Open MPI will run on every available slot if -np is not passed. So you could just use the script: HPCC_HOME=/home/test/hpcc-1.0.0 mpirun $HPCC_HOME/hpcc This will launch one process on every CPU reported by Torque. Alternatively, you could have wc read from stdin instead of from a file: ncpus=`wc -l < $PBS_NODEFILE` this will avoid the filename being printed. Hope this helps, Tim On Apr 1, 2007, at 9:16 AM, Barry Evans wrote: > Hello, > > > > Having a bit of trouble running Open MPI 1.2 under Torque 2.1.8. > > > > My Script contains the following: > > ----------------------------------------------- > > HPCC_HOME=/home/test/hpcc-1.0.0 > > ncpus=`wc -l $PBS_NODEFILE` > > mpirun -np $ncpus $HPCC_HOME/hpcc > > ----------------------------------------------- > > > > > > When I try to run on 4 nodes, 4 cpus each I receive the following > in my err file: > > > > [node003:04409] [0,0,4] ORTE_ERROR_LOG: Not found in file > odls_default_module.c at line 1188 > > [node008:06691] [0,0,1] ORTE_ERROR_LOG: Not found in file > odls_default_module.c at line 1188 > > [node007:04352] [0,0,2] ORTE_ERROR_LOG: Not found in file > odls_default_module.c at line 1188 > > ---------------------------------------------------------------------- > ---- > > Failed to find or execute the following executable: > > > > Host: node007 > > Executable: /var/spool/torque/aux//350.wc01 > > > > Cannot continue. > > ---------------------------------------------------------------------- > ---- > > [no------------------------------------------------------------------- > ------- > > Failed to find or execute the following executable: > > > > Host: node004 > > Executable: /var/spool/torque/aux//350.wc01 > > > > Cannot continue. > > ---------------------------------------------------------------------- > ---- > > de004:04364] [0,0,3] ORTE_ERROR_LOG: Not found in file > odls_default_module.c at line 1188 > > ---------------------------------------------------------------------- > ---- > > Failed to find or execute the following executable: > > > > Host: node003 > > Executable: /var/spool/torque/aux//350.wc01 > > > > Cannot continue. > > ---------------------------------------------------------------------- > ---- > > ---------------------------------------------------------------------- > ---- > > Failed to find or execute the following executable: > > > > Host: node008 > > Executable: /var/spool/torque/aux//350.wc01 > > > > Cannot continue. > > ---------------------------------------------------------------------- > ---- > > [node007:04352] [0,0,2] ORTE_ERROR_LOG: Not found in file orted.c > at line 588 > > [node008:06691] [0,0,1] ORTE_ERROR_LOG: Not found in file orted.c > at line 588 > > [node004:04364] [0,0,3] ORTE_ERROR_LOG: Not found in file orted.c > at line 588 > > [node003:04409] [0,0,4] ORTE_ERROR_LOG: Not found in file orted.c > at line 588 > > > > > > Has anyone seen this before? It seems odd that openmpi would be > trying to execute what is effectively the host file. I stuck a > sleep in to make sure the file was being distributed, and sure > enough, it was there. I am able to run mvapich through torque > without issue and openmpi from the command line. > > > > Cheers, > > Barry Evans > > Technical Manager > > OCF plc > > +44 (0)7970 148 121 > > bev...@ocf.co.uk > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users