Aha, indeed. This MPI variant provides only `mpirun` in my installation. But I wonder: do you have a second MPI library installed: `which mpiexec`?
In fact i have also other MPI libraries (openMPI, PlatformMPI and HP-MPI) and i an controlling which one to use through modules. 'which mpiexec' returns: '/export/apps/platform_mpi/bin/mpiexec' (You copied rsh/hostname to pmpi too?) Yes, both are there. control_slaves TRUE now this is also set so it should be accessible when you job starts. As you suggested i have added in my submit script 'export PATH=/export/apps/platform_mpi/bin:$PATH' and now the rsh error disappeared. Adding only the job tmp dir didn't work (export PATH=/export/apps/platform_mpi/bin:$TMPDIR). The output is now echo $PATH /export/apps/platform_mpi/bin:/home/tmp/33108.1.test.q:/usr/local/bin:/bin:/usr/bin But i have another problem. After I submit a simulation, in the log file i have this error: "10.197.9.32: Connection refused" (this is the ip of mnode02) and in the error log this: "mpirun: Warning one or more remote shell commands exited with non-zero status, which may indicate a remote access problem." Which protocol is using mpirun to comunicate between nodes? I checked and i can ssh-log without password from the head on the nodes and between the nodes. Thanks, Petar On 03/07/2014 02:39 PM, Reuti wrote: > Am 07.03.2014 um 13:20 schrieb Petar Penchev: > >> I have added the -catch_rsh to the PE and now when i start a sim > Good. > > >> (mpiexec -np $NSLOTS...) in the lsdyna.out file i see 'Error: Unknown >> option -np'. When i use 'mpirun -np $NSLOTS...' i see this 'mpirun: rsh: >> Command not found' in the lsdyna.err. > Aha, indeed. This MPI variant provides only `mpirun` in my installation. But > I wonder: do you have a second MPI library installed: `which mpiexec`? > > The path to `rsh` is set up by the wrapper, so it should be accessible when > you job starts. Can you please add to your jobscript: > > echo $PATH > > The $TMPDIR of the job on the node should be included there, and therein the > `rsh` should exist. > > BTW: I'm not sure about your application, but several ones need all > environment variable from the master node of the parallel job also be set for > the slaves. This can be achieved by including "-V" for `qrsh -inherit ...` > near the end in /opt/gridengine/mpi/pmpi/rsh > > (You copied rsh/hostname to pmpi too?) > > >> Petar >> >> [petar@rocks test]$ cat lsdyna.err >> mpirun: rsh: Command not found >> >> [petar@rocks test]$ cat lsdyna.out >> -catch_rsh >> /opt/gridengine/default/spool/mnode01/active_jobs/32738.1/pe_hostfile >> mnode01 >> mnode01 >> mnode01 >> mnode01 >> mnode01 >> mnode01 >> mnode01 >> mnode01 >> mnode02 >> mnode02 >> mnode02 >> mnode02 >> mnode02 >> mnode02 >> mnode02 >> mnode02 >> Error: Unknown option -np >> >> [root@rocks test]# qconf -mp pmpi >> pe_name pmpi >> slots 9999 >> user_lists NONE >> xuser_lists NONE >> start_proc_args /opt/gridengine/mpi/pmpi/startpmpi.sh -catch_rsh >> $pe_hostfile >> stop_proc_args /opt/gridengine/mpi/pmpi/stoppmpi.sh >> allocation_rule $fill_up >> control_slaves FALSE > control_slaves TRUE > > Otherwise the `qrsh -inherit ...` will fail. > > -- Reuti > > >> job_is_first_task TRUE >> urgency_slots min >> accounting_summary TRUE >> >> >> >> On 03/07/2014 12:49 PM, Reuti wrote: >>> Hi, >>> >>> Am 07.03.2014 um 12:28 schrieb Petar Penchev: >>> >>>> I have a rocks-cluster 6.1 using OGS2011.11p1 and i am trying to use the >>>> PlatformMPI parallel libraries. My problem is that when i submit a job >>>> using qsub test.sh, the job starts only on one node with 16 processes >>>> and not on both nodes. The -pe pmpi, which i am using for now is only a >>>> copy of mpi. >>> The definition of the PE pmpi does also include the -catch_rsh? The recent >>> IBM/Platform-MPI can cope with a machine file in the MPICH(1) format, which >>> is created by the /usr/sge/mpi/startmpi.sh >>> >>> In addition you need the following settings for a tight integration. Please >>> try: >>> >>> ... >>> export MPI_REMSH=rsh >>> export MPI_TMPDIR=$TMPDIR >>> mpiexec -np $NSLOTS -machinefile $TMPDIR/machines $BIN $ARGS >>> >>> -- Reuti >>> >>> >>>> What am i missing? Dose anyone have a working -pe submit script, or some >>>> hints how to make this working? >>>> >>>> Thanks in advance, >>>> Petar >>>> >>>> [root@rocks mpi]# test.sh >>>> #!/bin/bash >>>> #$ -N lsdyna >>>> #$ -S /bin/bash >>>> #$ -pe pmpi 16 >>>> #$ -cwd >>>> #$ -o lsdyna.out >>>> #$ -e lsdyna.err >>>> ### >>>> #$ -q test.q >>>> ### -notify >>>> export MPI_ROOT=/export/apps/platform_mpi >>>> export LD_LIBRARY_PATH=/export/apps/platform_mpi/lib/linux_amd64 >>>> export PATH=/export/apps/platform_mpi/bin >>>> BIN="/export/apps/lsdyna/ls-dyna_mpp_s_r6_1_2_85274_x64_redhat54_ifort120_sse2_platformmpi.exe" >>>> ARGS="i=test.k" >>>> mpirun -np $NSLOTS $BIN $ARGS >>>> >>>> >>>> [root@rocks mpi]# qconf -sq test.q >>>> qname test.q >>>> hostlist mnode01 mnode02 >>>> seq_no 0 >>>> load_thresholds np_load_avg=1.75 >>>> suspend_thresholds NONE >>>> nsuspend 1 >>>> suspend_interval 00:05:00 >>>> priority 0 >>>> min_cpu_interval 00:05:00 >>>> processors UNDEFINED >>>> qtype BATCH INTERACTIVE >>>> ckpt_list NONE >>>> pe_list pmpi >>>> rerun FALSE >>>> slots 8 >>>> tmpdir /tmp >>>> shell /bin/bash >>>> prolog NONE >>>> epilog NONE >>>> shell_start_mode unix_behavior >>>> starter_method NONE >>>> suspend_method NONE >>>> resume_method NONE >>>> terminate_method NONE >>>> notify 00:00:60 >>>> owner_list NONE >>>> user_lists NONE >>>> xuser_lists NONE >>>> subordinate_list NONE >>>> complex_values NONE >>>> projects NONE >>>> xprojects NONE >>>> calendar NONE >>>> initial_state default >>>> s_rt INFINITY >>>> h_rt INFINITY >>>> s_cpu INFINITY >>>> h_cpu INFINITY >>>> s_fsize INFINITY >>>> h_fsize INFINITY >>>> s_data INFINITY >>>> h_data INFINITY >>>> s_stack INFINITY >>>> h_stack INFINITY >>>> s_core INFINITY >>>> h_core INFINITY >>>> s_rss INFINITY >>>> h_rss INFINITY >>>> s_vmem INFINITY >>>> h_vmem INFINITY >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
