Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread Samuel K. Gutierrez
Good point - I forgot about that. -- Samuel K. Gutierrez Los Alamos National Laboratory On Jun 2, 2010, at 11:40 AM, Jeff Squyres wrote: Don't forget that the RML is also used to broadcast the success/ failure of the creation of the shared memory segment. If the RML goes away, be sure that

Re: [hwloc-devel] 1.0.1rc1

2010-06-02 Thread Jeff Squyres
I posted it: http://www.open-mpi.org/software/hwloc/v1.0/ If I hear nothing else, I'll release 1.0.1 tomorrow morning. On Jun 2, 2010, at 12:52 PM, Samuel Thibault wrote: > Jeff Squyres, le Tue 01 Jun 2010 14:03:47 -0400, a écrit : > > So do we like 1.0.1rc1? > > Seems all good to me. I

Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread Samuel K. Gutierrez
Hi George, That may work - I'll try it. Thanks! -- Samuel K. Gutierrez Los Alamos National Laboratory On Jun 2, 2010, at 10:59 AM, George Bosilca wrote: How about ftok ? The init function takes a file_name as argument, and this file name is unique per instance of the shared memory region

Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread George Bosilca
How about ftok ? The init function takes a file_name as argument, and this file name is unique per instance of the shared memory region we want to create. We can use this file name with ftok to create a unique key_t that can be used by shmget to retrieve the shared memory identifier. george.

Re: [OMPI devel] BTL add procs errors

2010-06-02 Thread George Bosilca
On Jun 2, 2010, at 12:18 , Jeff Squyres wrote: > On Jun 2, 2010, at 12:02 PM, Ashley Pittman wrote: > >>> Ah, this is the key. If I have one process (out of many) fail the >>> create_cq() function, I get a segv during finalize. I'll dig. >> >> Is there an assumption that if process A claims

Re: [OMPI devel] BTL add procs errors

2010-06-02 Thread Jeff Squyres
On Jun 2, 2010, at 12:02 PM, Ashley Pittman wrote: > > Ah, this is the key. If I have one process (out of many) fail the > > create_cq() function, I get a segv during finalize. I'll dig. > > Is there an assumption that if process A claims to be able to communicate > with process B that

Re: [OMPI devel] BTL add procs errors

2010-06-02 Thread Ashley Pittman
On 2 Jun 2010, at 16:49, Jeff Squyres wrote: > On Jun 2, 2010, at 11:29 AM, Sylvain Jeaugey wrote: > >> But it made me progress on why I'm crashing : in my case, only a subset of >> processes have their create_cq fail. > > Ah, this is the key. If I have one process (out of many) fail the >

Re: [OMPI devel] BTL add procs errors

2010-06-02 Thread Jeff Squyres
On Jun 2, 2010, at 11:29 AM, Sylvain Jeaugey wrote: > But it made me progress on why I'm crashing : in my case, only a subset of > processes have their create_cq fail. Ah, this is the key. If I have one process (out of many) fail the create_cq() function, I get a segv during finalize. I'll

Re: [OMPI devel] BTL add procs errors

2010-06-02 Thread Sylvain Jeaugey
On Wed, 2 Jun 2010, Jeff Squyres wrote: Don't you mean return NULL? This function is supposed to return a (struct ibv_cq *). Oops. My bad. Yes, it should return NULL. And it seems that if I make ibv_create_cq always return NULL, the scenario described by George works smoothly : returned

Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread Jeff Squyres
On Jun 2, 2010, at 10:44 AM, George Bosilca wrote: > > Not sure what you mean here. common/sm may create new shmem segments at > > any time (e.g., during coll sm). The RML message exchange is to ensure > > that only 1 process creates and initializes the segment and then all the > > others

Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread George Bosilca
On Jun 2, 2010, at 09:28 , Jeff Squyres wrote: > On Jun 2, 2010, at 5:38 AM, George Bosilca wrote: > >> I think adding support for sysv shared memory is a good thing. However, I >> have some strong objections over the implementation in the hg tree. Here are >> 2 of the major ones: >> >> 1)

Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread Samuel K. Gutierrez
On Jun 2, 2010, at 7:28 AM, Jeff Squyres wrote: On Jun 2, 2010, at 5:38 AM, George Bosilca wrote: I think adding support for sysv shared memory is a good thing. However, I have some strong objections over the implementation in the hg tree. Here are 2 of the major ones: 1) the sysv shared

Re: [OMPI devel] RFC: Remove all other paffinity components

2010-06-02 Thread Jeff Squyres
To follow up on this RFC... This RFC also got discussed on the weekly call (and in several other discussions). Again, no one seemed to hate it. That being said, hwloc still needs a bit more soak time; I just committed the 32 bit fix the other day. So this one will happen eventually (i.e.,

Re: [OMPI devel] RFC: move hwloc code base to opal/hwloc

2010-06-02 Thread Jeff Squyres
To follow up on this RFC... We discussed this RFC on the weekly call and no one seemed to hate it. But there was a desire to: a) be able to compile out hwloc for environments that don't want/need it (e.g., embedded environments) b) have some degree of isolation in case hwloc ever dies c) have

Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread Jeff Squyres
On Jun 2, 2010, at 5:38 AM, George Bosilca wrote: > I think adding support for sysv shared memory is a good thing. However, I > have some strong objections over the implementation in the hg tree. Here are > 2 of the major ones: > > 1) the sysv shared memory creation is __atomic__ based on the

Re: [OMPI devel] BTL add procs errors

2010-06-02 Thread Jeff Squyres
On Jun 2, 2010, at 5:08 AM, Sylvain Jeaugey wrote: > It must be because create_cq actually creates cqs. Try to apply this > patch which makes create_cq_compat() *not* creates the cqs and return an > error instead : > > diff

Re: [OMPI devel] Wrong documentation for MPI_Comm_size manual page

2010-06-02 Thread Jeff Squyres
Absolutely correct. I've fixed it on the dev trunk and filed tickets to get the fix moved into the release branches. Thanks! On Jun 2, 2010, at 4:41 AM, Number Cruncher wrote: > I'm working on some intercommunicator stuff at the moment. According to > MPI-2.2 standard: > "An

Re: [OMPI devel] RFC: System V Shared Memory for Open MPI

2010-06-02 Thread George Bosilca
I think adding support for sysv shared memory is a good thing. However, I have some strong objections over the implementation in the hg tree. Here are 2 of the major ones: 1) the sysv shared memory creation is __atomic__ based on the flags used. Therefore, all the RML messages exchange is

Re: [OMPI devel] BTL add procs errors

2010-06-02 Thread George Bosilca
I don't have any IB nodes, but I'm interested to see how this happens. What I would like to understand here is how do we get back in the OpenIB code if the add_procs failed for the BTL ... george. On Jun 2, 2010, at 05:08 , Sylvain Jeaugey wrote: > On Tue, 1 Jun 2010, Jeff Squyres wrote: >

Re: [OMPI devel] BTL add procs errors

2010-06-02 Thread Sylvain Jeaugey
On Tue, 1 Jun 2010, Jeff Squyres wrote: On May 31, 2010, at 5:10 AM, Sylvain Jeaugey wrote: In my case, the error happens in : mca_btl_openib_add_procs() mca_btl_openib_size_queues() adjust_cq() ibv_create_cq_compat() ibv_create_cq() Can you nail this down

[OMPI devel] Wrong documentation for MPI_Comm_size manual page

2010-06-02 Thread Number Cruncher
I'm working on some intercommunicator stuff at the moment. According to MPI-2.2 standard: "An inter-communication is a point-to-point communication between processes in different groups" [Section 6.6] yet the "man" page for MPI_Comm_size reads: "If the communicator is an

Re: [OMPI devel] BTL add procs errors

2010-06-02 Thread Sylvain Jeaugey
Couldn't explain it better. Thanks Jeff for the summary ! On Tue, 1 Jun 2010, Jeff Squyres wrote: On May 31, 2010, at 10:27 AM, Ralph Castain wrote: Just curious - your proposed fix sounds exactly like what was done in the OPAL SOS work. Are you therefore proposing to use SOS to provide a