I have not seen this before. I assume that for some reason, the shared memory transport layer cannot create the file it uses for communicating within a node. Open MPI then selects some other transport (TCP, openib) to communicate within the node so the program runs fine.

The code has not changed that much from 1.2 to 1.3, but it is a little different. Let me see if I can reproduce the problem.

Rolf

Mostyn Lewis wrote:
Sort of ditto but with SVN release at 20123 (and earlier):

e.g.

[r2250_46:30018] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_46_0/25682/1/shared_mem_pool.r2250_46
 failed with errno=2
[r2250_63:05292] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_63_0/25682/1/shared_mem_pool.r2250_63
 failed with errno=2
[r2250_57:17527] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_57_0/25682/1/shared_mem_pool.r2250_57
 failed with errno=2
[r2250_68:13553] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_68_0/25682/1/shared_mem_pool.r2250_68
 failed with errno=2
[r2250_50:06541] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_50_0/25682/1/shared_mem_pool.r2250_50
 failed with errno=2
[r2250_49:29237] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_49_0/25682/1/shared_mem_pool.r2250_49
 failed with errno=2
[r2250_66:19066] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_66_0/25682/1/shared_mem_pool.r2250_66
 failed with errno=2
[r2250_58:24902] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_58_0/25682/1/shared_mem_pool.r2250_58
 failed with errno=2
[r2250_69:27426] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_69_0/25682/1/shared_mem_pool.r2250_69
 failed with errno=2
[r2250_60:30560] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_60_0/25682/1/shared_mem_pool.r2250_60
 failed with errno=2

File not found in sm.

10 of them across 32 nodes (8 cores per node (2 sockets x quad-core))
"Apparently harmless"?

DM

On Tue, 27 Jan 2009, Prentice Bisbal wrote:

I just installed OpenMPI 1.3 with tight integration for SGE. Version
1.2.8 was working just fine for several months in the same arrangement.

Now that I've upgraded to 1.3, I get the following errors in my standard
error file:

mca_common_sm_mmap_init: open /tmp/968.1.all.q/openmpi-sessions-prent
ice@node09.aurora_0/21400/1/shared_mem_pool.node09.aurora failed with
errno=2
[node23.aurora:20601] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node23.aurora_0/21400/1/shared_mem_pool.node23.aurora failed with
errno=2
[node46.aurora:12118] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node46.aurora_0/21400/1/shared_mem_pool.node46.aurora failed with
errno=2
[node15.aurora:12421] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node15.aurora_0/21400/1/shared_mem_pool.node15.aurora failed with
errno=2
[node20.aurora:12534] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node20.aurora_0/21400/1/shared_mem_pool.node20.aurora failed with
errno=2
[node16.aurora:12573] mca_common_sm_mmap_init: open
/tmp/968.1.all.q/openmpi-sessions-prent
ice@node16.aurora_0/21400/1/shared_mem_pool.node16.aurora failed with
errno=2

I've tested 3-4 different times, and the number of hosts that produces
this error varies, as well as which hosts produce this error. My program
seems to run fun, but it's just a simple "Hello, World!" program. Any
ideas? Is this a bug in 1.3?


-- Prentice
--
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to