Am 07.03.2014 um 13:20 schrieb Petar Penchev: > I have added the -catch_rsh to the PE and now when i start a sim
Good. > (mpiexec -np $NSLOTS...) in the lsdyna.out file i see 'Error: Unknown > option -np'. When i use 'mpirun -np $NSLOTS...' i see this 'mpirun: rsh: > Command not found' in the lsdyna.err. Aha, indeed. This MPI variant provides only `mpirun` in my installation. But I wonder: do you have a second MPI library installed: `which mpiexec`? The path to `rsh` is set up by the wrapper, so it should be accessible when you job starts. Can you please add to your jobscript: echo $PATH The $TMPDIR of the job on the node should be included there, and therein the `rsh` should exist. BTW: I'm not sure about your application, but several ones need all environment variable from the master node of the parallel job also be set for the slaves. This can be achieved by including "-V" for `qrsh -inherit ...` near the end in /opt/gridengine/mpi/pmpi/rsh (You copied rsh/hostname to pmpi too?) > Petar > > [petar@rocks test]$ cat lsdyna.err > mpirun: rsh: Command not found > > [petar@rocks test]$ cat lsdyna.out > -catch_rsh > /opt/gridengine/default/spool/mnode01/active_jobs/32738.1/pe_hostfile > mnode01 > mnode01 > mnode01 > mnode01 > mnode01 > mnode01 > mnode01 > mnode01 > mnode02 > mnode02 > mnode02 > mnode02 > mnode02 > mnode02 > mnode02 > mnode02 > Error: Unknown option -np > > [root@rocks test]# qconf -mp pmpi > pe_name pmpi > slots 9999 > user_lists NONE > xuser_lists NONE > start_proc_args /opt/gridengine/mpi/pmpi/startpmpi.sh -catch_rsh > $pe_hostfile > stop_proc_args /opt/gridengine/mpi/pmpi/stoppmpi.sh > allocation_rule $fill_up > control_slaves FALSE control_slaves TRUE Otherwise the `qrsh -inherit ...` will fail. -- Reuti > job_is_first_task TRUE > urgency_slots min > accounting_summary TRUE > > > > On 03/07/2014 12:49 PM, Reuti wrote: >> Hi, >> >> Am 07.03.2014 um 12:28 schrieb Petar Penchev: >> >>> I have a rocks-cluster 6.1 using OGS2011.11p1 and i am trying to use the >>> PlatformMPI parallel libraries. My problem is that when i submit a job >>> using qsub test.sh, the job starts only on one node with 16 processes >>> and not on both nodes. The -pe pmpi, which i am using for now is only a >>> copy of mpi. >> The definition of the PE pmpi does also include the -catch_rsh? The recent >> IBM/Platform-MPI can cope with a machine file in the MPICH(1) format, which >> is created by the /usr/sge/mpi/startmpi.sh >> >> In addition you need the following settings for a tight integration. Please >> try: >> >> ... >> export MPI_REMSH=rsh >> export MPI_TMPDIR=$TMPDIR >> mpiexec -np $NSLOTS -machinefile $TMPDIR/machines $BIN $ARGS >> >> -- Reuti >> >> >>> What am i missing? Dose anyone have a working -pe submit script, or some >>> hints how to make this working? >>> >>> Thanks in advance, >>> Petar >>> >>> [root@rocks mpi]# test.sh >>> #!/bin/bash >>> #$ -N lsdyna >>> #$ -S /bin/bash >>> #$ -pe pmpi 16 >>> #$ -cwd >>> #$ -o lsdyna.out >>> #$ -e lsdyna.err >>> ### >>> #$ -q test.q >>> ### -notify >>> export MPI_ROOT=/export/apps/platform_mpi >>> export LD_LIBRARY_PATH=/export/apps/platform_mpi/lib/linux_amd64 >>> export PATH=/export/apps/platform_mpi/bin >>> BIN="/export/apps/lsdyna/ls-dyna_mpp_s_r6_1_2_85274_x64_redhat54_ifort120_sse2_platformmpi.exe" >>> ARGS="i=test.k" >>> mpirun -np $NSLOTS $BIN $ARGS >>> >>> >>> [root@rocks mpi]# qconf -sq test.q >>> qname test.q >>> hostlist mnode01 mnode02 >>> seq_no 0 >>> load_thresholds np_load_avg=1.75 >>> suspend_thresholds NONE >>> nsuspend 1 >>> suspend_interval 00:05:00 >>> priority 0 >>> min_cpu_interval 00:05:00 >>> processors UNDEFINED >>> qtype BATCH INTERACTIVE >>> ckpt_list NONE >>> pe_list pmpi >>> rerun FALSE >>> slots 8 >>> tmpdir /tmp >>> shell /bin/bash >>> prolog NONE >>> epilog NONE >>> shell_start_mode unix_behavior >>> starter_method NONE >>> suspend_method NONE >>> resume_method NONE >>> terminate_method NONE >>> notify 00:00:60 >>> owner_list NONE >>> user_lists NONE >>> xuser_lists NONE >>> subordinate_list NONE >>> complex_values NONE >>> projects NONE >>> xprojects NONE >>> calendar NONE >>> initial_state default >>> s_rt INFINITY >>> h_rt INFINITY >>> s_cpu INFINITY >>> h_cpu INFINITY >>> s_fsize INFINITY >>> h_fsize INFINITY >>> s_data INFINITY >>> h_data INFINITY >>> s_stack INFINITY >>> h_stack INFINITY >>> s_core INFINITY >>> h_core INFINITY >>> s_rss INFINITY >>> h_rss INFINITY >>> s_vmem INFINITY >>> h_vmem INFINITY >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
