Hi all
I defined mpifillamd :
----------------
pe_name            mpifillamd
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /opt/gridengine/mpi/startmpi.sh $pe_hostfile
stop_proc_args     /opt/gridengine/mpi/stopmpi.sh
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary TRUE
----------------
and  mpi48amd :
pe_name            mpi48amd
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /opt/gridengine/mpi/startmpi.sh $pe_hostfile
stop_proc_args     /opt/gridengine/mpi/stopmpi.sh
allocation_rule    48
control_slaves     TRUE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary TRUE
-----------------
hosts have  48 cores.
my job is:
-----
#!/bin/sh
#$ -S /bin/bash
#$ -N AMD_NS_100
#$ -cwd
#$ -l h_vmem=1.4G,excl=1  (request exclusive host)
#$ -j y
#$ -pe mpi48amd 96
mpirun --mca btl ^openib,sm -np $NSLOTS ./a.out
------------------
1) then send a job with mpifillamd , anything is ok, but not with pe
mpi48amd and multiple ".btr" file was created. why?

one of .btr file is:
----------
a.out:19402 terminated with signal 11 at PC=4636f1 SP=7ffff6536678.
Backtrace:
./a.out(initial_comm_cell_+0x611)[0x4636f1]
./a.out(input_+0xfe7)[0x41a657]
-----------
output is:
....
 TASK WIRH RANK          48 HASICMAXP =       96000
 TASK WIRH RANK          49 HASICMAXP =       96000
 TASK WIRH RANK          50 HASICMAXP =       96000
 TASK WIRH RANK          51 HASICMAXP =       96000

a.out:19391 terminated with signal 11 at PC=4636f1 SP=7ffff9274878.
Backtrace:
 TASK WIRH RANK          52 HASICMAXP =       96000
 TASK WIRH RANK          53 HASICMAXP =       96000
./a.out(initial_comm_cell_+0x611)[0x4636f1]
./a.out(input_+0xfe7)[0x41a657]
./a.out(MAIN__+0x313)[0x40b013]
./a.out(main+0x3c)[0x40acec]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x31aee1d994]
./a.out[0x40abf9]

...
--------------
2) In that case (hosts have 48 slots, job request 96 slots and exclusive
host), are  mpifillamd and mpi48amd different?


thx
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to