Hi,
I am not sure how this can be related to SLURM but it only occurs when
using a batch job.
I have a code using both MPI & OpenMP - in the following case I will
be considering only OpenMP so will be running on one socket.
When executing my code by ssh'ing directly on the hosts I clearly see
the speed-up when going from one thread to four threads:
gualleon583a@bndligpu06:~$ export OMP_NUM_THREADS=1; date time ./a.out
1000000000; date
Fri Aug 19 17:55:23 IST 2011
ID[ 0/ 1] Thread 0/1
# intervals: 1000000000
pi is approximately 3.1415926535899708, Error is 0.0000000000001776
Fri Aug 19 17:55:43 IST 2011
gualleon583a@bndligpu01:~$ export OMP_NUM_THREADS=4; date; time
./a.out 1000000000; date
Mon Aug 22 09:58:16 IST 2011
ID[ 0/ 1] Thread 1/4
ID[ 0/ 1] Thread 0/4
ID[ 0/ 1] Thread 3/4
ID[ 0/ 1] Thread 2/4
# intervals: 1000000000
pi is approximately 3.1415926535898211, Error is 0.0000000000000280
real 0m5.056s
user 0m19.910s
sys 0m0.020s
Mon Aug 22 09:58:21 IST 2011
When running the job I got everything running for about 20 seconds.
Do you have any idea what could be causing this behavior. Could it be
due to affinity causing all threads running on just one core ?
gualleon583a@gpumaster:~$ export OMP_NUM_THREADS=1
gualleon583a@gpumaster:~$ srun -E --ntasks=1 --ntasks-per-node=1 time
./a.out 1000000000
--------------------------------------------------------------------------
While trying to determine what resources are available, the SLURM
resource allocator expects to find the following environment variables:
SLURM_NODELIST
SLURM_TASKS_PER_NODE
However, it was unable to find the following environment variable:
SLURM_TASKS_PER_NODE
--------------------------------------------------------------------------
[bndligpu01:01946] [[39780,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/ras/base/ras_base_allocate.c at line 133
[bndligpu01:01946] [[39780,0],0] ORTE_ERROR_LOG: Not found in file
../../../orte/orted/orted_main.c at line 531
--------------------------------------------------------------------------
[[39780,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: bndligpu01
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ID[ 0/ 1] Thread 0/1
# intervals: 1000000000
pi is approximately 3.1415926535899708, Error is 0.0000000000001776
19.92user 0.00system 0:19.99elapsed 99%CPU (0avgtext+0avgdata 14048maxresident)k
16inputs+48outputs (0major+1236minor)pagefaults 0swaps
gualleon583a@gpumaster:~$ export OMP_NUM_THREADS=4
gualleon583a@gpumaster:~$ srun -E --ntasks=1 --ntasks-per-node=1 time
./a.out 1000000000
--------------------------------------------------------------------------
While trying to determine what resources are available, the SLURM
resource allocator expects to find the following environment variables:
SLURM_NODELIST
SLURM_TASKS_PER_NODE
However, it was unable to find the following environment variable:
SLURM_TASKS_PER_NODE
--------------------------------------------------------------------------
[bndligpu01:01968] [[39758,0],0] ORTE_ERROR_LOG: Not found in file
../../../../../orte/mca/ras/base/ras_base_allocate.c at line 133
[bndligpu01:01968] [[39758,0],0] ORTE_ERROR_LOG: Not found in file
../../../orte/orted/orted_main.c at line 531
--------------------------------------------------------------------------
[[39758,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: bndligpu01
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
ID[ 0/ 1] Thread 0/4
ID[ 0/ 1] Thread 3/4
ID[ 0/ 1] Thread 2/4
ID[ 0/ 1] Thread 1/4
# intervals: 1000000000
pi is approximately 3.1415926535898211, Error is 0.0000000000000280
19.99user 0.00system 0:20.06elapsed 99%CPU (0avgtext+0avgdata 14128maxresident)k
48inputs+48outputs (0major+1242minor)pagefaults 0swaps
gualleon583a@gpumaster:~$
--
PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net