I saw that the right options to run the mpiexec are -n $NSLOTS and -host 'file' But now when i submit a simulation, slots get allocated from SGE (i suppose), but the simulation doesn't start. When i log onto the mnode01 i see only one process running 'mpirun', but i expected to see the ls-dyna_mpp*.
[petar@rocks test]$ ssh mnode01 ps -e f -o pid,ppid,pgrp,command 21989 1777 21989 \_ sge_shepherd-32746 -bg 22026 21989 22026 \_ /bin/bash /opt/gridengine/default/spool/mnode01/job_scripts/32746 22027 22026 22026 \_ /bin/sh /export/apps/platform_mpi/bin/mpiexec -n 16 -host /home/tmp/32746.1.test.q/machines /export/apps/lsdyna/ls-dyna_mpp_s_r6_1_2_85274_x64_redhat54_ifort120_sse2_platformmpi.exe i=/home/petar/mmew/test/main.k 22030 22027 22026 \_ /export/apps/platform_mpi/bin/mpirun -f /home/tmp/32746.1.test.q/mpiexec.22027 [petar@rocks test]$ ssh mnode01 cat /home/tmp/32746.1.test.q/mpiexec.22027 -np 16 -h /home/tmp/32746.1.test.q/machines /export/apps/lsdyna/ls-dyna_mpp_s_r6_1_2_85274_x64_redhat54_ifort120_sse2_platformmpi.exe i=/home/petar/mmew/test/main.k Does anyone know, why the job doesn't run. Thanks, Petar On 03/07/2014 01:20 PM, Petar Penchev wrote: > Hi Reuti, > > thanks for the quick reply. > > I have added the -catch_rsh to the PE and now when i start a sim > (mpiexec -np $NSLOTS...) in the lsdyna.out file i see 'Error: Unknown > option -np'. When i use 'mpirun -np $NSLOTS...' i see this 'mpirun: rsh: > Command not found' in the lsdyna.err. > > Petar > > [petar@rocks test]$ cat lsdyna.err > mpirun: rsh: Command not found > > [petar@rocks test]$ cat lsdyna.out > -catch_rsh > /opt/gridengine/default/spool/mnode01/active_jobs/32738.1/pe_hostfile > mnode01 > mnode01 > mnode01 > mnode01 > mnode01 > mnode01 > mnode01 > mnode01 > mnode02 > mnode02 > mnode02 > mnode02 > mnode02 > mnode02 > mnode02 > mnode02 > Error: Unknown option -np > > [root@rocks test]# qconf -mp pmpi > pe_name pmpi > slots 9999 > user_lists NONE > xuser_lists NONE > start_proc_args /opt/gridengine/mpi/pmpi/startpmpi.sh -catch_rsh > $pe_hostfile > stop_proc_args /opt/gridengine/mpi/pmpi/stoppmpi.sh > allocation_rule $fill_up > control_slaves FALSE > job_is_first_task TRUE > urgency_slots min > accounting_summary TRUE > > > > On 03/07/2014 12:49 PM, Reuti wrote: >> Hi, >> >> Am 07.03.2014 um 12:28 schrieb Petar Penchev: >> >>> I have a rocks-cluster 6.1 using OGS2011.11p1 and i am trying to use the >>> PlatformMPI parallel libraries. My problem is that when i submit a job >>> using qsub test.sh, the job starts only on one node with 16 processes >>> and not on both nodes. The -pe pmpi, which i am using for now is only a >>> copy of mpi. >> The definition of the PE pmpi does also include the -catch_rsh? The recent >> IBM/Platform-MPI can cope with a machine file in the MPICH(1) format, which >> is created by the /usr/sge/mpi/startmpi.sh >> >> In addition you need the following settings for a tight integration. Please >> try: >> >> ... >> export MPI_REMSH=rsh >> export MPI_TMPDIR=$TMPDIR >> mpiexec -np $NSLOTS -machinefile $TMPDIR/machines $BIN $ARGS >> >> -- Reuti >> >> >>> What am i missing? Dose anyone have a working -pe submit script, or some >>> hints how to make this working? >>> >>> Thanks in advance, >>> Petar >>> >>> [root@rocks mpi]# test.sh >>> #!/bin/bash >>> #$ -N lsdyna >>> #$ -S /bin/bash >>> #$ -pe pmpi 16 >>> #$ -cwd >>> #$ -o lsdyna.out >>> #$ -e lsdyna.err >>> ### >>> #$ -q test.q >>> ### -notify >>> export MPI_ROOT=/export/apps/platform_mpi >>> export LD_LIBRARY_PATH=/export/apps/platform_mpi/lib/linux_amd64 >>> export PATH=/export/apps/platform_mpi/bin >>> BIN="/export/apps/lsdyna/ls-dyna_mpp_s_r6_1_2_85274_x64_redhat54_ifort120_sse2_platformmpi.exe" >>> ARGS="i=test.k" >>> mpirun -np $NSLOTS $BIN $ARGS >>> >>> >>> [root@rocks mpi]# qconf -sq test.q >>> qname test.q >>> hostlist mnode01 mnode02 >>> seq_no 0 >>> load_thresholds np_load_avg=1.75 >>> suspend_thresholds NONE >>> nsuspend 1 >>> suspend_interval 00:05:00 >>> priority 0 >>> min_cpu_interval 00:05:00 >>> processors UNDEFINED >>> qtype BATCH INTERACTIVE >>> ckpt_list NONE >>> pe_list pmpi >>> rerun FALSE >>> slots 8 >>> tmpdir /tmp >>> shell /bin/bash >>> prolog NONE >>> epilog NONE >>> shell_start_mode unix_behavior >>> starter_method NONE >>> suspend_method NONE >>> resume_method NONE >>> terminate_method NONE >>> notify 00:00:60 >>> owner_list NONE >>> user_lists NONE >>> xuser_lists NONE >>> subordinate_list NONE >>> complex_values NONE >>> projects NONE >>> xprojects NONE >>> calendar NONE >>> initial_state default >>> s_rt INFINITY >>> h_rt INFINITY >>> s_cpu INFINITY >>> h_cpu INFINITY >>> s_fsize INFINITY >>> h_fsize INFINITY >>> s_data INFINITY >>> h_data INFINITY >>> s_stack INFINITY >>> h_stack INFINITY >>> s_core INFINITY >>> h_core INFINITY >>> s_rss INFINITY >>> h_rss INFINITY >>> s_vmem INFINITY >>> h_vmem INFINITY >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
