Good point - I forgot about that.
--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Jun 2, 2010, at 11:40 AM, Jeff Squyres wrote:
Don't forget that the RML is also used to broadcast the success/
failure of the creation of the shared memory segment.
If the RML goes away, be sure that
I posted it:
http://www.open-mpi.org/software/hwloc/v1.0/
If I hear nothing else, I'll release 1.0.1 tomorrow morning.
On Jun 2, 2010, at 12:52 PM, Samuel Thibault wrote:
> Jeff Squyres, le Tue 01 Jun 2010 14:03:47 -0400, a écrit :
> > So do we like 1.0.1rc1?
>
> Seems all good to me. I
Hi George,
That may work - I'll try it.
Thanks!
--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Jun 2, 2010, at 10:59 AM, George Bosilca wrote:
How about ftok ? The init function takes a file_name as argument,
and this file name is unique per instance of the shared memory
region
How about ftok ? The init function takes a file_name as argument, and this file
name is unique per instance of the shared memory region we want to create. We
can use this file name with ftok to create a unique key_t that can be used by
shmget to retrieve the shared memory identifier.
george.
On Jun 2, 2010, at 12:18 , Jeff Squyres wrote:
> On Jun 2, 2010, at 12:02 PM, Ashley Pittman wrote:
>
>>> Ah, this is the key. If I have one process (out of many) fail the
>>> create_cq() function, I get a segv during finalize. I'll dig.
>>
>> Is there an assumption that if process A claims
On Jun 2, 2010, at 12:02 PM, Ashley Pittman wrote:
> > Ah, this is the key. If I have one process (out of many) fail the
> > create_cq() function, I get a segv during finalize. I'll dig.
>
> Is there an assumption that if process A claims to be able to communicate
> with process B that
On 2 Jun 2010, at 16:49, Jeff Squyres wrote:
> On Jun 2, 2010, at 11:29 AM, Sylvain Jeaugey wrote:
>
>> But it made me progress on why I'm crashing : in my case, only a subset of
>> processes have their create_cq fail.
>
> Ah, this is the key. If I have one process (out of many) fail the
>
On Jun 2, 2010, at 11:29 AM, Sylvain Jeaugey wrote:
> But it made me progress on why I'm crashing : in my case, only a subset of
> processes have their create_cq fail.
Ah, this is the key. If I have one process (out of many) fail the create_cq()
function, I get a segv during finalize. I'll
On Wed, 2 Jun 2010, Jeff Squyres wrote:
Don't you mean return NULL? This function is supposed to return a (struct
ibv_cq *).
Oops. My bad. Yes, it should return NULL. And it seems that if I make
ibv_create_cq always return NULL, the scenario described by George works
smoothly : returned
On Jun 2, 2010, at 10:44 AM, George Bosilca wrote:
> > Not sure what you mean here. common/sm may create new shmem segments at
> > any time (e.g., during coll sm). The RML message exchange is to ensure
> > that only 1 process creates and initializes the segment and then all the
> > others
On Jun 2, 2010, at 09:28 , Jeff Squyres wrote:
> On Jun 2, 2010, at 5:38 AM, George Bosilca wrote:
>
>> I think adding support for sysv shared memory is a good thing. However, I
>> have some strong objections over the implementation in the hg tree. Here are
>> 2 of the major ones:
>>
>> 1)
On Jun 2, 2010, at 7:28 AM, Jeff Squyres wrote:
On Jun 2, 2010, at 5:38 AM, George Bosilca wrote:
I think adding support for sysv shared memory is a good thing.
However, I have some strong objections over the implementation in
the hg tree. Here are 2 of the major ones:
1) the sysv shared
To follow up on this RFC...
This RFC also got discussed on the weekly call (and in several other
discussions). Again, no one seemed to hate it. That being said, hwloc still
needs a bit more soak time; I just committed the 32 bit fix the other day.
So this one will happen eventually (i.e.,
To follow up on this RFC...
We discussed this RFC on the weekly call and no one seemed to hate it. But
there was a desire to:
a) be able to compile out hwloc for environments that don't want/need it (e.g.,
embedded environments)
b) have some degree of isolation in case hwloc ever dies
c) have
On Jun 2, 2010, at 5:38 AM, George Bosilca wrote:
> I think adding support for sysv shared memory is a good thing. However, I
> have some strong objections over the implementation in the hg tree. Here are
> 2 of the major ones:
>
> 1) the sysv shared memory creation is __atomic__ based on the
On Jun 2, 2010, at 5:08 AM, Sylvain Jeaugey wrote:
> It must be because create_cq actually creates cqs. Try to apply this
> patch which makes create_cq_compat() *not* creates the cqs and return an
> error instead :
>
> diff
Absolutely correct. I've fixed it on the dev trunk and filed tickets to get
the fix moved into the release branches.
Thanks!
On Jun 2, 2010, at 4:41 AM, Number Cruncher wrote:
> I'm working on some intercommunicator stuff at the moment. According to
> MPI-2.2 standard:
> "An
I think adding support for sysv shared memory is a good thing. However, I have
some strong objections over the implementation in the hg tree. Here are 2 of
the major ones:
1) the sysv shared memory creation is __atomic__ based on the flags used.
Therefore, all the RML messages exchange is
I don't have any IB nodes, but I'm interested to see how this happens. What I
would like to understand here is how do we get back in the OpenIB code if the
add_procs failed for the BTL ...
george.
On Jun 2, 2010, at 05:08 , Sylvain Jeaugey wrote:
> On Tue, 1 Jun 2010, Jeff Squyres wrote:
>
On Tue, 1 Jun 2010, Jeff Squyres wrote:
On May 31, 2010, at 5:10 AM, Sylvain Jeaugey wrote:
In my case, the error happens in :
mca_btl_openib_add_procs()
mca_btl_openib_size_queues()
adjust_cq()
ibv_create_cq_compat()
ibv_create_cq()
Can you nail this down
I'm working on some intercommunicator stuff at the moment. According to
MPI-2.2 standard:
"An inter-communication is a point-to-point communication between
processes in different groups" [Section 6.6]
yet the "man" page for MPI_Comm_size reads:
"If the communicator is an
Couldn't explain it better. Thanks Jeff for the summary !
On Tue, 1 Jun 2010, Jeff Squyres wrote:
On May 31, 2010, at 10:27 AM, Ralph Castain wrote:
Just curious - your proposed fix sounds exactly like what was done in
the OPAL SOS work. Are you therefore proposing to use SOS to provide a
22 matches
Mail list logo