Dear all,
I was trying to run a simple mpi script via qsub, then I received the error
below after which the job run correctly!
I received no error when I tried to run the same script directly without qsub.
Is there a way to fix this error message?
Thanks in advance,
madel
job script:
#$ -cwd
#$ -j y
#$ -N hello-mpi
#$ -o $JOB_NAME.o$JOB_ID
#$ -pe impi 16
mpirun --rsh=ssh -np 16 ./hello.bin
============================================
Error Message:
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
This will severely limit memory registrations.
comp040.local:5827: create_cq Cannot allocate memory
comp040.local:5826: create_cq Cannot allocate memory
comp040.local:5829: create_cq Cannot allocate memory
comp040.local:5833: create_cq Cannot allocate memory
comp040.local:5828: create_cq Cannot allocate memory
comp040.local:5831: create_cq Cannot allocate memory
comp040.local:5832: create_cq Cannot allocate memory
comp040.local:5830: create_cq Cannot allocate memory
comp040.local:5832: open_hca: getaddr_netdev ERROR: No such file or directory.
Is ib1 configured?
comp040.local:5832: dapls_ib_open_hca failed 120000
comp040.local:5827: open_hca: getaddr_netdev ERROR: No such file or directory.
Is ib1 configured?
comp040.local:5827: dapls_ib_open_hca failed 120000
comp040.local:5832: open_hca: getaddr_netdev ERROR: No such device. Is ib2
configured?
comp040.local:5832: dapls_ib_open_hca failed 120000
comp040.local:5829: open_hca: getaddr_netdev ERROR: No such file or directory.
Is ib1 configured?
comp040.local:5829: dapls_ib_open_hca failed 120000
comp040.local:5827: open_hca: getaddr_netdev ERROR: No such device. Is ib2
configured?
comp040.local:5827: dapls_ib_open_hca failed 120000
comp040.local:5832: open_hca: getaddr_netdev ERROR: No such device. Is ib3
configured?
comp040.local:5832: dapls_ib_open_hca failed 120000
comp040.local:5829: open_hca: getaddr_netdev ERROR: No such device. Is ib2
configured?
comp040.local:5829: dapls_ib_open_hca failed 120000
comp040.local:5826: open_hca: getaddr_netdev ERROR: No such file or directory.
Is ib1 configured?
comp040.local:5826: dapls_ib_open_hca failed 120000
comp040.local:5827: open_hca: getaddr_netdev ERROR: No such device. Is ib3
configured?
comp040.local:5827: dapls_ib_open_hca failed 120000
comp040.local:5828: open_hca: getaddr_netdev ERROR: No such file or directory.
Is ib1 configured?
comp040.local:5828: comp040.local:5832: open_hca: getaddr_netdev ERROR: No
such device. Is bond0 configured?
comp040.local:5832: dapls_ib_open_hca failed 120000
dapls_ib_open_hca failed 120000
comp040.local:5829: open_hca: getaddr_netdev ERROR: No such device. Is ib3
configured?
comp040.local:5829: dapls_ib_open_hca failed 120000
comp040.local:5826: open_hca: getaddr_netdev ERROR: No such device. Is ib2
configured?
comp040.local:5826: dapls_ib_open_hca failed 120000
comp040.local:5833: open_hca: getaddr_netdev ERROR: No such file or directory.
Is ib1 configured?
comp040.local:5833: dapls_ib_open_hca failed 120000
comp040.local:5831: open_hca: getaddr_netdev ERROR: No such file or directory.
Is ib1 configured?
comp040.local:5831: dapls_ib_open_hca failed 120000
comp040.local:5827: open_hca: getaddr_netdev ERROR: No such device. Is bond0
configured?
comp040.local:5827: dapls_ib_open_hca failed 120000
comp040.local:5830: open_hca: getaddr_netdev ERROR: No such file or directory.
Is ib1 configured?
comp040.local:5830: dapls_ib_open_hca failed 120000
comp040.local:5833: open_hca: getaddr_netdev ERROR: No such device. Is ib2
configured?
comp040.local:5833: dapls_ib_open_hca failed 120000
comp040.local:5829: open_hca: getaddr_netdev ERROR: No such device. Is bond0
configured?
comp040.local:5829: dapls_ib_open_hca failed 120000
comp040.local:5826: open_hca: getaddr_netdev ERROR: No such device. Is ib3
configured?
comp040.local:5826: dapls_ib_open_hca failed 120000
comp040.local:5828: open_hca: getaddr_netdev ERROR: No such device. Is ib2
configured?
comp040.local:5828: dapls_ib_open_hca failed 120000
comp040.local:5831: open_hca: getaddr_netdev ERROR: No such device. Is ib2
configured?
comp040.local:5831: dapls_ib_open_hca failed 120000
comp040.local:5833: open_hca: getaddr_netdev ERROR: No such device. Is ib3
configured?
comp040.local:5833: dapls_ib_open_hca failed 120000
comp040.local:5830: open_hca: getaddr_netdev ERROR: No such device. Is ib2
configured?
comp040.local:5830: dapls_ib_open_hca failed 120000
comp040.local:5826: open_hca: getaddr_netdev ERROR: No such device. Is bond0
configured?
comp040.local:5826: dapls_ib_open_hca failed 120000
comp040.local:5828: open_hca: getaddr_netdev ERROR: No such device. Is ib3
configured?
comp040.local:5828: dapls_ib_open_hca failed 120000
comp040.local:5831: open_hca: getaddr_netdev ERROR: No such device. Is ib3
configured?
comp040.local:5831: dapls_ib_open_hca failed 120000
comp040.local:5833: open_hca: getaddr_netdev ERROR: No such device. Is bond0
configured?
comp040.local:5833: dapls_ib_open_hca failed 120000
comp040.local:5828: open_hca: getaddr_netdev ERROR: No such device. Is bond0
configured?
comp040.local:5828: dapls_ib_open_hca failed 120000
comp040.local:5831: open_hca: getaddr_netdev ERROR: No such device. Is bond0
configured?
comp040.local:5831: dapls_ib_open_hca failed 120000
comp040.local:5830: open_hca: getaddr_netdev ERROR: No such device. Is ib3
configured?
comp040.local:5830: dapls_ib_open_hca failed 120000
comp040.local:5830: open_hca: getaddr_netdev ERROR: No such device. Is bond0
configured?
comp040.local:5830: dapls_ib_open_hca failed 120000
Hello world: rank 8 of 16 running on comp047.local
Hello world: rank 13 of 16 running on comp047.local
Hello world: rank 10 of 16 running on comp047.local
Hello world: rank 11 of 16 running on comp047.local
Hello world: rank 15 of 16 running on comp047.local
Hello world: rank 9 of 16 running on comp047.local
Hello world: rank 12 of 16 running on comp047.local
Hello world: rank 7 of 16 running on comp040.local
Hello world: rank 14 of 16 running on comp047.local
Hello world: rank 6 of 16 running on comp040.local
Hello world: rank 0 of 16 running on comp040.local
Hello world: rank 4 of 16 running on comp040.local
Hello world: rank 2 of 16 running on comp040.local
Hello world: rank 5 of 16 running on comp040.local
Hello world: rank 1 of 16 running on comp040.local
Hello world: rank 3 of 16 running on comp040.local
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users