Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Ralph Castain
The design is supposed to be that each node knows precisely how many daemons 
are involved in each collective, and who is going to talk to them. The 
signature contains the info required to ensure the receiver knows which 
collective this message relates to, and just happens to also allow them to 
lookup the number of daemons involved (the base function takes care of that for 
them).

So there is no need for a "pending" list - if you receive a message about a 
collective you don't yet know about, you just put it on the ongoing collective 
list. You should only receive it if you are going to be involved - i.e., you 
have local procs that are going to participate. So you wait until your local 
procs participate, and then pass your collected bucket along.

I suspect the link to the local procs isn't being correctly dealt with, else 
you couldn't be hanging. Or the rcd isn't correctly passing incoming messages 
to the base functions to register the collective.

I'll look at it over the weekend and can resolve it then.


On Sep 11, 2014, at 5:23 PM, Gilles Gouaillardet 
 wrote:

> Ralph,
> 
> you are right, this was definetly not the right fix (at least with 4
> nodes or more)
> 
> i finally understood what is going wrong here :
> to make it simple, the allgather recursive doubling algo is not
> implemented with
> MPI_Recv(...,peer,...) like functions but with
> MPI_Recv(...,MPI_ANY_SOURCE,...) like functions
> and that makes things slightly more complicated :
> right now :
> - with two nodes : if node 1 is late, it gets stuck in the allgather
> - with four nodes : if node 0 is first, then node 2 and 3 while node 1
> is still late, then node 0
> will likely leaves the allgather though it did not receive anything
> from  node 1
> - and so on
> 
> i think i can fix that from now
> 
> Cheers,
> 
> Gilles
> 
> On 2014/09/11 23:47, Ralph Castain wrote:
>> Yeah, that's not the right fix, I'm afraid. I've made the direct component 
>> the default again until I have time to dig into this deeper.
>> 
>> On Sep 11, 2014, at 4:02 AM, Gilles Gouaillardet 
>>  wrote:
>> 
>>> Ralph,
>>> 
>>> the root cause is when the second orted/mpirun runs rcd_finalize_coll,
>>> it does not invoke pmix_server_release
>>> because allgather_stub was not previously invoked since the the fence
>>> was not yet entered.
>>> /* in rcd_finalize_coll, coll->cbfunc is NULL */
>>> 
>>> the attached patch is likely not the right fix, it was very lightly
>>> tested, but so far, it works for me ...
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> On 2014/09/11 16:11, Gilles Gouaillardet wrote:
 Ralph,
 
 things got worst indeed :-(
 
 now a simple hello world involving two hosts hang in mpi_init.
 there is still a race condition : if a tasks a call fence long after task 
 b,
 then task b will never leave the fence
 
 i ll try to debug this ...
 
 Cheers,
 
 Gilles
 
 On 2014/09/11 2:36, Ralph Castain wrote:
> I think I now have this fixed - let me know what you see.
> 
> 
> On Sep 9, 2014, at 6:15 AM, Ralph Castain  wrote:
> 
>> Yeah, that's not the correct fix. The right way to fix it is for all 
>> three components to have their own RML tag, and for each of them to 
>> establish a persistent receive. They then can use the signature to tell 
>> which collective the incoming message belongs to.
>> 
>> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.
>> 
>> 
>> On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet 
>>  wrote:
>> 
>>> Folks,
>>> 
>>> Since r32672 (trunk), grpcomm/rcd is the default module.
>>> the attached spawn.c test program is a trimmed version of the
>>> spawn_with_env_vars.c test case
>>> from the ibm test suite.
>>> 
>>> when invoked on two nodes :
>>> - the program hangs with -np 2
>>> - the program can crash with np > 2
>>> error message is
>>> [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
>>> AND TAG -33 - ABORTING
>>> 
>>> here is my full command line (from node0) :
>>> 
>>> mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
>>> coll ^ml ./spawn
>>> 
>>> a simple workaround is to add the following extra parameter to the
>>> mpirun command line :
>>> --mca grpcomm_rcd_priority 0
>>> 
>>> my understanding it that the race condition occurs when all the
>>> processes call MPI_Finalize()
>>> internally, the pmix module will have mpirun/orted issue two ALLGATHER
>>> involving mpirun and orted
>>> (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
>>> the error message is very explicit : this is not (currently) supported
>>> 
>>> i wrote the attached rml.patch which is really a workaround and not a 

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
Ralph,

you are right, this was definetly not the right fix (at least with 4
nodes or more)

i finally understood what is going wrong here :
to make it simple, the allgather recursive doubling algo is not
implemented with
MPI_Recv(...,peer,...) like functions but with
MPI_Recv(...,MPI_ANY_SOURCE,...) like functions
and that makes things slightly more complicated :
right now :
- with two nodes : if node 1 is late, it gets stuck in the allgather
- with four nodes : if node 0 is first, then node 2 and 3 while node 1
is still late, then node 0
will likely leaves the allgather though it did not receive anything
from  node 1
- and so on

i think i can fix that from now

Cheers,

Gilles

On 2014/09/11 23:47, Ralph Castain wrote:
> Yeah, that's not the right fix, I'm afraid. I've made the direct component 
> the default again until I have time to dig into this deeper.
>
> On Sep 11, 2014, at 4:02 AM, Gilles Gouaillardet 
>  wrote:
>
>> Ralph,
>>
>> the root cause is when the second orted/mpirun runs rcd_finalize_coll,
>> it does not invoke pmix_server_release
>> because allgather_stub was not previously invoked since the the fence
>> was not yet entered.
>> /* in rcd_finalize_coll, coll->cbfunc is NULL */
>>
>> the attached patch is likely not the right fix, it was very lightly
>> tested, but so far, it works for me ...
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/09/11 16:11, Gilles Gouaillardet wrote:
>>> Ralph,
>>>
>>> things got worst indeed :-(
>>>
>>> now a simple hello world involving two hosts hang in mpi_init.
>>> there is still a race condition : if a tasks a call fence long after task b,
>>> then task b will never leave the fence
>>>
>>> i ll try to debug this ...
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 2014/09/11 2:36, Ralph Castain wrote:
 I think I now have this fixed - let me know what you see.


 On Sep 9, 2014, at 6:15 AM, Ralph Castain  wrote:

> Yeah, that's not the correct fix. The right way to fix it is for all 
> three components to have their own RML tag, and for each of them to 
> establish a persistent receive. They then can use the signature to tell 
> which collective the incoming message belongs to.
>
> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.
>
>
> On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet 
>  wrote:
>
>> Folks,
>>
>> Since r32672 (trunk), grpcomm/rcd is the default module.
>> the attached spawn.c test program is a trimmed version of the
>> spawn_with_env_vars.c test case
>> from the ibm test suite.
>>
>> when invoked on two nodes :
>> - the program hangs with -np 2
>> - the program can crash with np > 2
>> error message is
>> [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
>> AND TAG -33 - ABORTING
>>
>> here is my full command line (from node0) :
>>
>> mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
>> coll ^ml ./spawn
>>
>> a simple workaround is to add the following extra parameter to the
>> mpirun command line :
>> --mca grpcomm_rcd_priority 0
>>
>> my understanding it that the race condition occurs when all the
>> processes call MPI_Finalize()
>> internally, the pmix module will have mpirun/orted issue two ALLGATHER
>> involving mpirun and orted
>> (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
>> the error message is very explicit : this is not (currently) supported
>>
>> i wrote the attached rml.patch which is really a workaround and not a 
>> fix :
>> in this case, each job will invoke an ALLGATHER but with a different tag
>> /* that works for a limited number of jobs only */
>>
>> i did not commit this patch since this is not a fix, could someone
>> (Ralph ?) please review the issue and comment ?
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15780.php
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post: 
 http://www.open-mpi.org/community/lists/devel/2014/09/15794.php
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/09/15804.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: 

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32711 - trunk/opal/mca/pmix/cray

2014-09-11 Thread Pritchard Jr., Howard
thanks, it was bad cut/paste

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Tim Mattox
Sent: Thursday, September 11, 2014 2:54 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32711 - 
trunk/opal/mca/pmix/cray

I'm sure that is not what you meant to do...
the assignment to NULL should occur AFTER the free()...

On Thu, Sep 11, 2014 at 4:30 PM, 
> wrote:
Author: hppritcha (Howard Pritchard)
Date: 2014-09-11 16:30:40 EDT (Thu, 11 Sep 2014)
New Revision: 32711
URL: https://svn.open-mpi.org/trac/ompi/changeset/32711

Log:
Fix potential double free in cray pmi cray_fini

Text files modified:
   trunk/opal/mca/pmix/cray/pmix_cray.c | 1 +
   1 files changed, 1 insertions(+), 0 deletions(-)

Modified: trunk/opal/mca/pmix/cray/pmix_cray.c
==
--- trunk/opal/mca/pmix/cray/pmix_cray.cThu Sep 11 10:51:30 2014
(r32710)
+++ trunk/opal/mca/pmix/cray/pmix_cray.c2014-09-11 16:30:40 EDT (Thu, 
11 Sep 2014)  (r32711)
@@ -257,6 +257,7 @@
 }

 if (NULL != pmix_lranks) {
+pmix_lranks = NULL;
 free(pmix_lranks);
 }

___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full



--
Tim Mattox, Ph.D. - tmat...@gmail.com


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32711 - trunk/opal/mca/pmix/cray

2014-09-11 Thread Tim Mattox
I'm sure that is not what you meant to do...
the assignment to NULL should occur AFTER the free()...

On Thu, Sep 11, 2014 at 4:30 PM,  wrote:

> Author: hppritcha (Howard Pritchard)
> Date: 2014-09-11 16:30:40 EDT (Thu, 11 Sep 2014)
> New Revision: 32711
> URL: https://svn.open-mpi.org/trac/ompi/changeset/32711
>
> Log:
> Fix potential double free in cray pmi cray_fini
>
> Text files modified:
>trunk/opal/mca/pmix/cray/pmix_cray.c | 1 +
>1 files changed, 1 insertions(+), 0 deletions(-)
>
> Modified: trunk/opal/mca/pmix/cray/pmix_cray.c
>
> ==
> --- trunk/opal/mca/pmix/cray/pmix_cray.cThu Sep 11 10:51:30 2014
>   (r32710)
> +++ trunk/opal/mca/pmix/cray/pmix_cray.c2014-09-11 16:30:40 EDT
> (Thu, 11 Sep 2014)  (r32711)
> @@ -257,6 +257,7 @@
>  }
>
>  if (NULL != pmix_lranks) {
> +pmix_lranks = NULL;
>  free(pmix_lranks);
>  }
>
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>



-- 
Tim Mattox, Ph.D. - tmat...@gmail.com


Re: [OMPI devel] Need to know your Github ID

2014-09-11 Thread Thomas Naughton

naughtont -> naughtont3

Thanks,
--tjn

 _
  Thomas Naughton  naught...@ornl.gov
  Research Associate   (865) 576-4184


On Wed, 10 Sep 2014, Jeff Squyres (jsquyres) wrote:


As the next step of the planned migration to Github, I need to know:

- Your Github ID (so that you can be added to the new OMPI git repo)
- Your SVN ID (so that I can map SVN->Github IDs, and therefore map Trac 
tickets to appropriate owners)

Here's the list of SVN IDs who have committed over the past year -- I'm 
guessing that most of these people will need Github IDs:

adrian
alekseys
alex
alinas
amikheev
bbenton
bosilca (done)
bouteill
brbarret
bwesarg
devendar
dgoodell (done)
edgar
eugene
ggouaillardet
hadi
hjelmn
hpcchris
hppritcha
igoru
jjhursey (done)
jladd
jroman
jsquyres (done)
jurenz
kliteyn
manjugv
miked (done)
mjbhaskar
mpiteam (done)
naughtont
osvegis
pasha
regrant
rfaucett
rhc (done)
rolfv (done)
samuel
shiqing
swise
tkordenbrock
vasily
vvenkates
vvenkatesan
yaeld
yosefe

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/09/15788.php



Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Ralph Castain
Yeah, that's not the right fix, I'm afraid. I've made the direct component the 
default again until I have time to dig into this deeper.

On Sep 11, 2014, at 4:02 AM, Gilles Gouaillardet 
 wrote:

> Ralph,
> 
> the root cause is when the second orted/mpirun runs rcd_finalize_coll,
> it does not invoke pmix_server_release
> because allgather_stub was not previously invoked since the the fence
> was not yet entered.
> /* in rcd_finalize_coll, coll->cbfunc is NULL */
> 
> the attached patch is likely not the right fix, it was very lightly
> tested, but so far, it works for me ...
> 
> Cheers,
> 
> Gilles
> 
> On 2014/09/11 16:11, Gilles Gouaillardet wrote:
>> Ralph,
>> 
>> things got worst indeed :-(
>> 
>> now a simple hello world involving two hosts hang in mpi_init.
>> there is still a race condition : if a tasks a call fence long after task b,
>> then task b will never leave the fence
>> 
>> i ll try to debug this ...
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 2014/09/11 2:36, Ralph Castain wrote:
>>> I think I now have this fixed - let me know what you see.
>>> 
>>> 
>>> On Sep 9, 2014, at 6:15 AM, Ralph Castain  wrote:
>>> 
 Yeah, that's not the correct fix. The right way to fix it is for all three 
 components to have their own RML tag, and for each of them to establish a 
 persistent receive. They then can use the signature to tell which 
 collective the incoming message belongs to.
 
 I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.
 
 
 On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet 
  wrote:
 
> Folks,
> 
> Since r32672 (trunk), grpcomm/rcd is the default module.
> the attached spawn.c test program is a trimmed version of the
> spawn_with_env_vars.c test case
> from the ibm test suite.
> 
> when invoked on two nodes :
> - the program hangs with -np 2
> - the program can crash with np > 2
> error message is
> [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
> AND TAG -33 - ABORTING
> 
> here is my full command line (from node0) :
> 
> mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
> coll ^ml ./spawn
> 
> a simple workaround is to add the following extra parameter to the
> mpirun command line :
> --mca grpcomm_rcd_priority 0
> 
> my understanding it that the race condition occurs when all the
> processes call MPI_Finalize()
> internally, the pmix module will have mpirun/orted issue two ALLGATHER
> involving mpirun and orted
> (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
> the error message is very explicit : this is not (currently) supported
> 
> i wrote the attached rml.patch which is really a workaround and not a fix 
> :
> in this case, each job will invoke an ALLGATHER but with a different tag
> /* that works for a limited number of jobs only */
> 
> i did not commit this patch since this is not a fix, could someone
> (Ralph ?) please review the issue and comment ?
> 
> 
> Cheers,
> 
> Gilles
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15780.php
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/09/15794.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15804.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15805.php



Re: [OMPI devel] clang alignment warnings

2014-09-11 Thread Ralph Castain
I'm not convinced it is a bug in clang, Jeff - we know that Siegmar has been 
getting segfaults in the mca var code, though it isn't clear if those are 
alignment issues or not (looked like them, but can't say with certainty). May 
just need to ask him to run the current trunk and see if the problems persist 
now that I fixed the one in opal/dss.


On Sep 11, 2014, at 5:34 AM, Jeff Squyres (jsquyres)  wrote:

> I re-ran the test, just to ensure I had the line numbers right (I have some 
> local edits in my SVN copy):
> 
> -
> mca_base_var.c:681:18: runtime error: member access within misaligned address 
> 0x2b338409 for type 'mca_base_var_storage_t', which requires 8 byte 
> alignment
> -
> 
> This is referring to the bool conversion.  According to opal_config.h, bool 
> has an alignment of 1.  The ...8409 address is definitely 1-byte aligned.  :-)
> 
> And this one:
> 
> -
> mca_base_var.c:668:18: runtime error: member access within misaligned address 
> 0x2bc90ccc for type 'mca_base_var_storage_t', which requires 8 byte 
> alignment
> -
> 
> is referring to the int conversion.  According to opal_config, ints are 
> 4-byte aligned.  The ...0ccc address is 4-byte aligned.
> 
> Note that I also get similar warnings about OB1:
> 
> -
> pml_ob1_hdr.h:462:17: runtime error: member access within misaligned address 
> 0x2aaabb35f2cc for type 'mca_pml_ob1_hdr_t' (aka 'union mca_pml_ob1_hdr_t'), 
> which requires 8 byte alignment
> -
> 
> mca_pml_ob1_hdr_t is also a union.
> 
> Is this a bug in the clang alignment sanitizer?
> 
> 
> 
> 
> On Sep 10, 2014, at 4:41 PM, George Bosilca  wrote:
> 
>> It complains about 0x2b1b1ed9 being misaligned which seems as a valid 
>> complaint. What is the dst value when this trigger? What is var->mbv_storage?
>> 
>>  George.
>> 
>> 
>> On Thu, Sep 11, 2014 at 5:29 AM, Jeff Squyres (jsquyres) 
>>  wrote:
>> On Sep 10, 2014, at 4:23 PM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>>> Regardless, what do we do about this?
>> 
>> To answer my own question, I guess we can replace:
>> 
>>   dst->intval = int_value
>> 
>> with
>> 
>>   int *bogus = (int*) dst;
>>   *bogus = int_value;
>> 
>> and do a similar thing for the bool.
>> 
>> Seems kludgey, and kinda defeats the point of having a union, though.
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15799.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15800.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15808.php



Re: [OMPI devel] clang alignment warnings

2014-09-11 Thread Jeff Squyres (jsquyres)
I re-ran the test, just to ensure I had the line numbers right (I have some 
local edits in my SVN copy):

-
mca_base_var.c:681:18: runtime error: member access within misaligned address 
0x2b338409 for type 'mca_base_var_storage_t', which requires 8 byte 
alignment
-

This is referring to the bool conversion.  According to opal_config.h, bool has 
an alignment of 1.  The ...8409 address is definitely 1-byte aligned.  :-)

And this one:

-
mca_base_var.c:668:18: runtime error: member access within misaligned address 
0x2bc90ccc for type 'mca_base_var_storage_t', which requires 8 byte 
alignment
-

is referring to the int conversion.  According to opal_config, ints are 4-byte 
aligned.  The ...0ccc address is 4-byte aligned.

Note that I also get similar warnings about OB1:

-
pml_ob1_hdr.h:462:17: runtime error: member access within misaligned address 
0x2aaabb35f2cc for type 'mca_pml_ob1_hdr_t' (aka 'union mca_pml_ob1_hdr_t'), 
which requires 8 byte alignment
-

mca_pml_ob1_hdr_t is also a union.

Is this a bug in the clang alignment sanitizer?




On Sep 10, 2014, at 4:41 PM, George Bosilca  wrote:

> It complains about 0x2b1b1ed9 being misaligned which seems as a valid 
> complaint. What is the dst value when this trigger? What is var->mbv_storage?
> 
>   George.
> 
> 
> On Thu, Sep 11, 2014 at 5:29 AM, Jeff Squyres (jsquyres)  
> wrote:
> On Sep 10, 2014, at 4:23 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> > Regardless, what do we do about this?
> 
> To answer my own question, I guess we can replace:
> 
>dst->intval = int_value
> 
> with
> 
>int *bogus = (int*) dst;
>*bogus = int_value;
> 
> and do a similar thing for the bool.
> 
> Seems kludgey, and kinda defeats the point of having a union, though.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15799.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15800.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Github migration plan

2014-09-11 Thread Jeff Squyres (jsquyres)
[cid:9A7FB12E-C5CB-424A-9168-586C48C4AE90@cisco.com]


On Sep 11, 2014, at 8:15 AM, Jeff Squyres (jsquyres) 
> wrote:

Rolf --

I'll be ready to discuss a concrete plan and timeline to migrate to Github next 
Tuesday.

Can you please add me to Tuesday's agenda?

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/09/15806.php


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] Github migration plan

2014-09-11 Thread Jeff Squyres (jsquyres)
Rolf --

I'll be ready to discuss a concrete plan and timeline to migrate to Github next 
Tuesday.

Can you please add me to Tuesday's agenda?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
Ralph,

the root cause is when the second orted/mpirun runs rcd_finalize_coll,
it does not invoke pmix_server_release
because allgather_stub was not previously invoked since the the fence
was not yet entered.
/* in rcd_finalize_coll, coll->cbfunc is NULL */

the attached patch is likely not the right fix, it was very lightly
tested, but so far, it works for me ...

Cheers,

Gilles

On 2014/09/11 16:11, Gilles Gouaillardet wrote:
> Ralph,
>
> things got worst indeed :-(
>
> now a simple hello world involving two hosts hang in mpi_init.
> there is still a race condition : if a tasks a call fence long after task b,
> then task b will never leave the fence
>
> i ll try to debug this ...
>
> Cheers,
>
> Gilles
>
> On 2014/09/11 2:36, Ralph Castain wrote:
>> I think I now have this fixed - let me know what you see.
>>
>>
>> On Sep 9, 2014, at 6:15 AM, Ralph Castain  wrote:
>>
>>> Yeah, that's not the correct fix. The right way to fix it is for all three 
>>> components to have their own RML tag, and for each of them to establish a 
>>> persistent receive. They then can use the signature to tell which 
>>> collective the incoming message belongs to.
>>>
>>> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.
>>>
>>>
>>> On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet 
>>>  wrote:
>>>
 Folks,

 Since r32672 (trunk), grpcomm/rcd is the default module.
 the attached spawn.c test program is a trimmed version of the
 spawn_with_env_vars.c test case
 from the ibm test suite.

 when invoked on two nodes :
 - the program hangs with -np 2
 - the program can crash with np > 2
 error message is
 [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
 AND TAG -33 - ABORTING

 here is my full command line (from node0) :

 mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
 coll ^ml ./spawn

 a simple workaround is to add the following extra parameter to the
 mpirun command line :
 --mca grpcomm_rcd_priority 0

 my understanding it that the race condition occurs when all the
 processes call MPI_Finalize()
 internally, the pmix module will have mpirun/orted issue two ALLGATHER
 involving mpirun and orted
 (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
 the error message is very explicit : this is not (currently) supported

 i wrote the attached rml.patch which is really a workaround and not a fix :
 in this case, each job will invoke an ALLGATHER but with a different tag
 /* that works for a limited number of jobs only */

 i did not commit this patch since this is not a fix, could someone
 (Ralph ?) please review the issue and comment ?


 Cheers,

 Gilles

 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post: 
 http://www.open-mpi.org/community/lists/devel/2014/09/15780.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15794.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15804.php

Index: orte/mca/grpcomm/rcd/grpcomm_rcd.c
===
--- orte/mca/grpcomm/rcd/grpcomm_rcd.c  (revision 32706)
+++ orte/mca/grpcomm/rcd/grpcomm_rcd.c  (working copy)
@@ -6,6 +6,8 @@
  * Copyright (c) 2011-2013 Los Alamos National Security, LLC. All
  * rights reserved.
  * Copyright (c) 2014  Intel, Inc.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -85,6 +87,9 @@
 static int allgather(orte_grpcomm_coll_t *coll,
  opal_buffer_t *sendbuf)
 {
+orte_grpcomm_base_pending_coll_t *pc;
+bool pending = false;
+
 OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base_framework.framework_output,
  "%s grpcomm:coll:recdub algo employed for %d 
processes",
  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), 
(int)coll->ndmns));
@@ -106,6 +111,33 @@
  */
 rcd_allgather_send_dist(coll, 1);

+OPAL_LIST_FOREACH(pc, _grpcomm_base.pending, 
orte_grpcomm_base_pending_coll_t) {
+if (NULL == coll->sig->signature) {
+if (NULL == pc->coll->sig->signature) {
+/* only one collective can operate 

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
Ralph,

things got worst indeed :-(

now a simple hello world involving two hosts hang in mpi_init.
there is still a race condition : if a tasks a call fence long after task b,
then task b will never leave the fence

i ll try to debug this ...

Cheers,

Gilles

On 2014/09/11 2:36, Ralph Castain wrote:
> I think I now have this fixed - let me know what you see.
>
>
> On Sep 9, 2014, at 6:15 AM, Ralph Castain  wrote:
>
>> Yeah, that's not the correct fix. The right way to fix it is for all three 
>> components to have their own RML tag, and for each of them to establish a 
>> persistent receive. They then can use the signature to tell which collective 
>> the incoming message belongs to.
>>
>> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.
>>
>>
>> On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet 
>>  wrote:
>>
>>> Folks,
>>>
>>> Since r32672 (trunk), grpcomm/rcd is the default module.
>>> the attached spawn.c test program is a trimmed version of the
>>> spawn_with_env_vars.c test case
>>> from the ibm test suite.
>>>
>>> when invoked on two nodes :
>>> - the program hangs with -np 2
>>> - the program can crash with np > 2
>>> error message is
>>> [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
>>> AND TAG -33 - ABORTING
>>>
>>> here is my full command line (from node0) :
>>>
>>> mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
>>> coll ^ml ./spawn
>>>
>>> a simple workaround is to add the following extra parameter to the
>>> mpirun command line :
>>> --mca grpcomm_rcd_priority 0
>>>
>>> my understanding it that the race condition occurs when all the
>>> processes call MPI_Finalize()
>>> internally, the pmix module will have mpirun/orted issue two ALLGATHER
>>> involving mpirun and orted
>>> (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
>>> the error message is very explicit : this is not (currently) supported
>>>
>>> i wrote the attached rml.patch which is really a workaround and not a fix :
>>> in this case, each job will invoke an ALLGATHER but with a different tag
>>> /* that works for a limited number of jobs only */
>>>
>>> i did not commit this patch since this is not a fix, could someone
>>> (Ralph ?) please review the issue and comment ?
>>>
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/09/15780.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15794.php



Re: [OMPI devel] Need to know your Github ID

2014-09-11 Thread Gilles Gouaillardet
ggouaillardet -> ggouaillardet

On 2014/09/10 19:46, Jeff Squyres (jsquyres) wrote:
> As the next step of the planned migration to Github, I need to know:
>
> - Your Github ID (so that you can be added to the new OMPI git repo)
> - Your SVN ID (so that I can map SVN->Github IDs, and therefore map Trac 
> tickets to appropriate owners)
>
> Here's the list of SVN IDs who have committed over the past year -- I'm 
> guessing that most of these people will need Github IDs:
>
>  adrian 
>  alekseys 
>  alex 
>  alinas 
>  amikheev 
>  bbenton 
>  bosilca (done)
>  bouteill 
>  brbarret 
>  bwesarg 
>  devendar 
>  dgoodell (done)
>  edgar 
>  eugene 
>  ggouaillardet 
>  hadi 
>  hjelmn 
>  hpcchris 
>  hppritcha 
>  igoru 
>  jjhursey (done)
>  jladd 
>  jroman 
>  jsquyres (done)
>  jurenz 
>  kliteyn 
>  manjugv 
>  miked (done)
>  mjbhaskar 
>  mpiteam (done)
>  naughtont 
>  osvegis 
>  pasha 
>  regrant 
>  rfaucett 
>  rhc (done)
>  rolfv (done)
>  samuel 
>  shiqing 
>  swise 
>  tkordenbrock 
>  vasily 
>  vvenkates 
>  vvenkatesan 
>  yaeld 
>  yosefe 
>