Re: [OMPI users] automatically creating a machinefile
no idea of Rocks, but with PBS and SLURM, I always do this directly in the job submission script. Below is an example of an admittedly spaghetti-code script that does this -- assuming proper (un)commenting -- for PBS and SLURM and OpenMPI and MPICH2, for one particular machine that I have been toying around with lately ... Dominik #!/bin/bash PBS #PBS -N feast #PBS -l nodes=25:ppn=2 #PBS -q batch #PBS -l walltime=2:00:00 #job should not rerun if it fails #PBS -r n ### SLURM # @ job_name = feaststrong1 # @ initialdir = . # @ output = feaststrong1_%j.out # @ error = feaststrong1_%j.err # @ total_tasks = 50 # @ cpus_per_task = 1 # @ wall_clock_limit = 2:00:00 # modules module purge module load gcc/4.6.2 module load openmpi/1.5.4 #module load mpich2/1.4.1 # cd into wdir cd $HOME/feast/feast/feast/applications/poisson_coproc # PBS with MPICH2 # create machine files to isolate the master process #cat $PBS_NODEFILE > nodes.txt ## extract slaves #sort -u nodes.txt > temp.txt #lines=`wc -l temp.txt | awk '{print $1}'` #((lines=$lines - 1)) #tail -n $lines temp.txt > slavetemp.txt #cat slavetemp.txt | awk '{print $0 ":2"}' > slaves.txt ## extract master #head -n 1 temp.txt > mastertemp.txt #cat mastertemp.txt | awk '{print $0 ":1"}' > master.txt ## merge into one dual nodefile #cat master.txt > dual.hostfile #cat slaves.txt >> dual.hostfile ## same for single hostfile #tail -n $lines temp.txt > slavetemp.txt #cat slavetemp.txt | awk '{print $0 ":1"}' > slaves.txt ## extract master #head -n 1 temp.txt > mastertemp.txt #cat mastertemp.txt | awk '{print $0 ":1"}' > master.txt ## merge into one single nodefile #cat master.txt > single.hostfile #cat slaves.txt >> single.hostfile ## and clean up #rm -f slavetemp.txt mastertemp.txt master.txt slaves.txt temp.txt nodes.txt # 4 nodes #mpiexec -n 7 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np007.dat #mkdir arm-strongscaling-series1-L8-nodes04 #mv feastlog.* arm-strongscaling-series1-L8-nodes04 # 7 nodes #mpiexec -n 13 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np013.dat #mkdir arm-strongscaling-series1-L8-nodes07 #mv feastlog.* arm-strongscaling-series1-L8-nodes07 # 13 nodes #mpiexec -n 25 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np025.dat #mkdir arm-strongscaling-series1-L8-nodes13 #mv feastlog.* arm-strongscaling-series1-L8-nodes13 # 25 nodes #mpiexec -n 49 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np049.dat #mkdir arm-strongscaling-series1-L8-nodes25 #mv feastlog.* arm-strongscaling-series1-L8-nodes25 ## SLURM # figure out which nodes we got srun /bin/hostname | sort > availhosts3.txt lines=`wc -l availhosts3.txt | awk '{print $1}'` ((lines=$lines - 2)) tail -n $lines availhosts3.txt > slaves3.txt head -n 1 availhosts3.txt > master3.txt cat master3.txt > hostfile3.txt cat slaves3.txt >> hostfile3.txt # DGDG: SLURM -m arbitrary not supported by OpenMPI #export SLURM_HOSTFILE=./hostfile3.txt # 4 nodes #mpirun -np 7 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi master.dat.strongscaling.m6.L8.np007.dat mpirun -np 7 --hostfile hostfile3.txt ./feastgpu-ompi master.dat.strongscaling.m6.L8.np007.dat #mpiexec -n 7 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np007.dat #srun -n 7 -m arbitrary ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np007.dat mkdir arm-strongscaling-series1-L8-nodes04 mv feastlog.* arm-strongscaling-series1-L8-nodes04 # 7 nodes #mpirun -np 13 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi master.dat.strongscaling.m6.L8.np013.dat mpirun -np 13 --hostfile hostfile3.txt ./feastgpu-ompi master.dat.strongscaling.m6.L8.np013.dat #mpiexec -n 13 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np013.dat #srun -n 13 -m arbitrary ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np013.dat mkdir arm-strongscaling-series1-L8-nodes07 mv feastlog.* arm-strongscaling-series1-L8-nodes07 # 13 nodes #mpirun -np 25 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi master.dat.strongscaling.m6.L8.np025.dat mpirun -np 25 --hostfile hostfile3.txt ./feastgpu-ompi master.dat.strongscaling.m6.L8.np025.dat #mpiexec -n 25 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np025.dat #srun -n 25 -m arbitrary ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np025.dat mkdir arm-strongscaling-series1-L8-nodes13 mv feastlog.* arm-strongscaling-series1-L8-nodes13 # 25 nodes #mpirun -np 49 --hostfile hostfile3.txt ./trace.sh ./feastgpu-ompi master.dat.strongscaling.m6.L8.np049.dat mpirun -np 49 --hostfile hostfile3.txt ./feastgpu-ompi master.dat.strongscaling.m6.L8.np049.dat #mpiexec -n 49 -f dual.hostfile ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np049.dat #srun -n 49 -m arbitrary ./feastgpu-mpich2 master.dat.strongscaling.m6.L8.np049.dat mkdir arm-strongscaling-series1-L8-nodes25 mv feastlog.* arm-strongscaling-series1-L8-nodes25
Re: [OMPI users] Getting MPI to access processes on a 2nd computer.
Hi, The Open MPI potentially uses WMI to launch remote processes, so the WMI has to be configured correctly. There are two links talking about how to set it up in README.WINDOWS file: http://msdn.microsoft.com/en-us/library/aa393266(VS.85).aspx http://community.spiceworks.com/topic/578 For testing whether it works or not, you can use following command: wmic /node:remote_node_ip process call create notepad.exe then log onto the other Windows, check in the task manager if the notepad.exe process is created (don't forget to delete it afterwards). If that works, this command will also work: mpirun -np 2 -host host1 host2 notepad.exe Please try to run the above two test commands, if they all works you application should also work. Just let me know if you have any question or trouble with that. Shiqing On 2012-07-03 8:53 PM, vimalmat...@eaton.com wrote: Hi, I'm trying to run an MPI code using processes on a remote machine. I've connected the 2 machines using a crossover cable and they are communicating with each other(I'm getting ping replies and I can access drives on one another). When I run mpiexec --host /system_name/ MPI_Test.exe, I get the following error: C:\OpenMPI\openmpi-1.6\build\Debug>mpiexec -host SOUMIWHP4500449 MPI_Test.exe connecting to SOUMIWHP4500449 username:C9995799 password:** Save Credential?(Y/N) N [SOUMIWHP5003567:01728] Could not connect to namespace cimv2 on node SOUMIWHP450 0449. Error code =-2147023174 -- mpiexec was unable to start the specified application as it encountered an error . More information may be available above. -- [SOUMIWHP5003567:01728] [[38316,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ..\..\..\open mpi-1.6\orte\mca\rml\oob\rml_oob_send.c at line 145 [SOUMIWHP5003567:01728] [[38316,0],0] attempted to send to [[38316,0],1]: tag 1 [SOUMIWHP5003567:01728] [[38316,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ..\..\..\open mpi-1.6\orte\orted\orted_comm.c at line 126 Could anyone tell me what I'm missing? I've configured MPI on VS Express 2010 and I'm able to run MPI programs on one system. On the other computer, I pasted the MPI_Test.exe file in the same location as the calling computer. Thanks, Vimal ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- --- Shiqing Fan High Performance Computing Center Stuttgart (HLRS) Tel: ++49(0)711-685-87234 Nobelstrasse 19 Fax: ++49(0)711-685-65832 70569 Stuttgart http://www.hlrs.de/organization/people/shiqing-fan/ email: f...@hlrs.de