Re: [OMPI devel] Broken abort backtrace functionality

2014-09-30 Thread Ralph Castain
Should be fixed in r32821, scheduled for 1.8.4


On Sep 29, 2014, at 2:00 PM, Deva  wrote:

> I looks like OMPI_MCA_mpi_abort_print_stack=1 is broken.  I'm seeing 
> following warning with it.  
> 
> --
> $mpirun -np 2  -x OMPI_MCA_mpi_abort_print_stack=1 ./hello_c
> --
> WARNING: A user-supplied value attempted to override the default-only MCA
> variable named "mpi_abort_print_stack".
> 
> The user-supplied value was ignored.
> --
> --
> WARNING: A user-supplied value attempted to override the default-only MCA
> variable named "mpi_abort_print_stack".
> 
> The user-supplied value was ignored.
> --
> Hello, world, I am 1 of 2, 
> Hello, world, I am 0 of 2, 
> --
> 
> 
> It seems HAVE_BACKTRACE is not defined by any configuration but, below 
> relevant code is guarded with it. 
> 
> 
> #if OPAL_WANT_PRETTY_PRINT_STACKTRACE && defined(HAVE_BACKTRACE)
>  0,
>  OPAL_INFO_LVL_9,
>  MCA_BASE_VAR_SCOPE_READONLY,
> #else
>  MCA_BASE_VAR_FLAG_DEFAULT_ONLY,
>  OPAL_INFO_LVL_9,
>  MCA_BASE_VAR_SCOPE_CONSTANT,
> #endif
> 
> $git grep HAVE_BACKTRACE
> ompi/runtime/ompi_mpi_params.c:#if OPAL_WANT_PRETTY_PRINT_STACKTRACE && 
> defined(HAVE_BACKTRACE)
> $
> 
> 
> -- 
> -Devendar
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15933.php



Re: [OMPI devel] Github migration: tomorrow

2014-09-30 Thread Jeff Squyres (jsquyres)
8am US Eastern tomorrow morning is the cutoff.  Anything you do tonight will be 
fine.

Note, however, that open CMRs will NOT be moved over to Github -- you'll have 
to re-file them as pull requests after the migration.


On Sep 30, 2014, at 2:09 PM, Pritchard Jr., Howard  wrote:

> Hi Jeff,
> 
> When's the latest today that we can do checkins without causing problems?
> 
> Howard
> 
> 
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
> (jsquyres)
> Sent: Tuesday, September 30, 2014 9:57 AM
> To: Open MPI Developers List
> Subject: [OMPI devel] Github migration: tomorrow
> 
> It's happening tomorrow, October 1, 2014, starting at 8am US Eastern time.
> 
> There was discussion about Bitbucket vs. Github, and all things being equal 
> (except the cost!), we're going with the original plan of the main OMPI repo 
> at Github.
> 
> The plan tomorrow is the same as it was last week:
> 
> - SVN and Trac get frozen at 8am US Eastern time.
> - The migration is likely to take all day.
> - I'll be moving all open, non-CMR tickets from Trac to Github.
> - The Trac wiki has already been moved.
> 
> I'll send an "all clear" email when it is safe to start using the main Github 
> OMPI repo (tickets and git) tomorrow.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15947.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15951.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Github migration: tomorrow

2014-09-30 Thread Pritchard Jr., Howard
Hi Jeff,

When's the latest today that we can do checkins without causing problems?

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Tuesday, September 30, 2014 9:57 AM
To: Open MPI Developers List
Subject: [OMPI devel] Github migration: tomorrow

It's happening tomorrow, October 1, 2014, starting at 8am US Eastern time.

There was discussion about Bitbucket vs. Github, and all things being equal 
(except the cost!), we're going with the original plan of the main OMPI repo at 
Github.

The plan tomorrow is the same as it was last week:

- SVN and Trac get frozen at 8am US Eastern time.
- The migration is likely to take all day.
- I'll be moving all open, non-CMR tickets from Trac to Github.
- The Trac wiki has already been moved.

I'll send an "all clear" email when it is safe to start using the main Github 
OMPI repo (tickets and git) tomorrow.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/09/15947.php


Re: [OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-09-30 Thread Ralph Castain
I fixed this in r32818 - the components shouldn't be passing back success if 
the requested info isn't found. Hope that fixes the problem.


On Sep 30, 2014, at 1:54 AM, Gilles Gouaillardet 
 wrote:

> Folks,
> 
> the dynamic/spawn test from the ibm test suite crashes if the openib btl
> is detected
> (the test can be ran on one node with an IB port)
> 
> here is what happens :
> 
> in mca_btl_openib_proc_create,
> the macro
>OPAL_MODEX_RECV(rc, _btl_openib_component.super.btl_version,
>proc, , _size);
> does not find any information *but*
> rc is OPAL_SUCCESS
> msg_size is not updated (e.g. left uninitialized)
> message is not updated (e.g. left uninitialized)
> 
> then, if msg_size is unitialized with a non zero value, and if message
> is uninitialized with
> a non valid address, a crash will occur when accessing message.
> 
> /* i am not debating here the fact that there is no information returned,
> i am simply discussing the crash */
> 
> a simple workaround is to initialize msg_size to zero.
> 
> that being said, is this the correct fix ?
> 
> one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro
> like this :
> 
> /* from opal/mca/pmix/pmix.h */
> #define OPAL_MODEX_RECV_STRING(r, s, p, d, sz)  \
>do {\
>opal_value_t *kv;   \
>if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name,   \
> (s), ))) {  \
>if (NULL != kv)
> {   \
>*(d) =
> kv->data.bo.bytes;   \
>*(sz) =
> kv->data.bo.size;   \
>kv->data.bo.bytes = NULL; /* protect the data
> */\
> 
> OBJ_RELEASE(kv);\
>} else {\
>*(sz) = 0;\
>(r) = OPAL_ERR_NOT_FOUND;
>} \
>}   \
>} while(0);
> 
> /*
> *(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz)
> *or* (r) could be set
> */
> 
> and an other alternate fix is to update the end of the native_get
> function like this :
> 
> /* from opal/mca/pmix/native/pmix_native.c */
> 
>if (found) {
>return OPAL_SUCCESS;
>}
>*kv = NULL;
>if (OPAL_SUCCESS == rc) {
>if (OPAL_SUCCESS == ret) {
>rc = OPAL_ERR_NOT_FOUND;
>} else {
>rc = ret;
>}
>}
>return rc;
> 
> Could you please advise ?
> 
> Cheers,
> 
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15942.php



[OMPI devel] Github migration: tomorrow

2014-09-30 Thread Jeff Squyres (jsquyres)
It's happening tomorrow, October 1, 2014, starting at 8am US Eastern time.

There was discussion about Bitbucket vs. Github, and all things being equal 
(except the cost!), we're going with the original plan of the main OMPI repo at 
Github.

The plan tomorrow is the same as it was last week:

- SVN and Trac get frozen at 8am US Eastern time.
- The migration is likely to take all day.
- I'll be moving all open, non-CMR tickets from Trac to Github.
- The Trac wiki has already been moved.

I'll send an "all clear" email when it is safe to start using the main Github 
OMPI repo (tickets and git) tomorrow.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] [patch] libnbc intercommunicator iallgather bug

2014-09-30 Thread Pritchard Jr., Howard
Hi Takahiro,

Thanks very much for the patch and the test!  

After the git migration we'll open an issue and patch nbc_iallgather.
This will get pushed to 1.8.4.

Howard


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Kawashima, Takahiro
Sent: Monday, September 29, 2014 10:22 PM
To: de...@open-mpi.org
Subject: [OMPI devel] [patch] libnbc intercommunicator iallgather bug

Hi,

The attached program intercommunicator-iallgather.c outputs message "MPI Error 
in MPI_Testall() (18)" forever and doesn't finish. This is because libnbc has 
typos of send/recv.

See attached intercommunicator-iallgather.patch for the fix.
The patch modifies iallgather_inter and iallgather_intra.
The modification of iallgather_intra is just for symmetry with 
iallgather_inter. Users guarantee the consistency of send/recv.

Both trunk and v1.8 branch have this issue.

Regards,
Takahiro Kawashima,
MPI development team,
Fujitsu


Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-30 Thread Nathan Hjelm

Not quite right. There still is no topology information at collective
selection time for either graph or dist graph.

-Nathan

On Tue, Sep 30, 2014 at 02:03:27PM +0900, Gilles Gouaillardet wrote:
>Nathan,
> 
>here is a revision of the previously attached patch, and that supports
>graph and dist graph.
> 
>Cheers,
> 
>Gilles
> 
>On 2014/09/30 0:05, Nathan Hjelm wrote:
> 
>  An equivalent change would need to be made for graph and dist graph as
>  well. That will take a little more work. Also, I was avoiding changing
>  anything in topo for 1.8.
> 
>  -Nathan
> 
>  On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote:
> 
> Nathan,
> 
> why not just make the topology information available at that point as you
> described it ?
> 
> the attached patch does this, could you please review it ?
> 
> Cheers,
> 
> Gilles
> 
> On 2014/09/26 2:50, Nathan Hjelm wrote:
> 
>   On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote:
> 
>   I finally managed to track down some issues in mpi4py's test suite
>   using Open MPI 1.8+. The code below should be enough to reproduce the
>   problem. Run it under valgrind to make sense of my following
>   diagnostics.
> 
>   In this code I'm creating a 2D, periodic Cartesian topology out of
>   COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
>   links to itself. So we have size=1 but indegree=outdegree=4. However,
>   in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
>   being allocated to manage communication:
> 
>   if (OMPI_COMM_IS_INTER(comm)) {
>   size = ompi_comm_remote_size(comm);
>   } else {
>   size = ompi_comm_size(comm);
>   }
>   basic_module->mccb_num_reqs = size * 2;
>   basic_module->mccb_reqs = (ompi_request_t**)
>   malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
> 
>   I guess you have to also special-case for topologies and allocate
>   indegree+outdegree requests (not sure about this number, just
>   guessing).
> 
> 
>   I wish this was possible but the topology information is not available
>   at that point. We may be able to change that but I don't see the work
>   completing anytime soon. I committed an alternative fix as r32796 and
>   CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer
>   produces a SEGV. Let me know if you run into any more issues.
> 
> 
>   -Nathan
> 
>   ___
>   devel mailing list
>   de...@open-mpi.org
>   Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>   Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15915.php
> 
>  Index: ompi/mca/topo/base/topo_base_cart_create.c
>  ===
>  --- ompi/mca/topo/base/topo_base_cart_create.c  (revision 32807)
>  +++ ompi/mca/topo/base/topo_base_cart_create.c  (working copy)
>  @@ -163,10 +163,18 @@
>   return MPI_ERR_INTERN;
>   }
>  
>  +assert(NULL == new_comm->c_topo);
>  +assert(!(new_comm->c_flags & OMPI_COMM_CART));
>  +new_comm->c_topo   = topo;
>  +new_comm->c_topo->mtc.cart = cart;
>  +new_comm->c_topo->reorder  = reorder;
>  +new_comm->c_flags |= OMPI_COMM_CART;
>   ret = ompi_comm_enable(old_comm, new_comm,
>  new_rank, num_procs, topo_procs);
>   if (OMPI_SUCCESS != ret) {
>   /* something wrong happened during setting the communicator */
>  +new_comm->c_topo = NULL;
>  +new_comm->c_flags &= ~OMPI_COMM_CART;
>   ompi_comm_free (_comm);
>   free(topo_procs);
>   if(NULL != cart->periods) free(cart->periods);
>  @@ -176,10 +184,6 @@
>   return ret;
>   }
>  
>  -new_comm->c_topo   = topo;
>  -new_comm->c_topo->mtc.cart = cart;
>  -new_comm->c_topo->reorder  = reorder;
>  -new_comm->c_flags |= OMPI_COMM_CART;
>   *comm_topo = new_comm;
>  
>   if( MPI_UNDEFINED == new_rank ) {
>  Index: ompi/mca/coll/basic/coll_basic_module.c
>  ===
>  --- ompi/mca/coll/basic/coll_basic_module.c (revision 32807)
>  +++ ompi/mca/coll/basic/coll_basic_module.c (working copy)
>  @@ -13,6 +13,8 @@
>* Copyright (c) 2012  Sandia National Laboratories. All rights 
> reserved.
>* Copyright (c) 2013  Los Alamos National Security, LLC. All rights
>* reserved.
>  + * Copyright (c) 2014  Research Organization for Information Science
>  + * and Technology (RIST). All rights reserved.
>* $COPYRIGHT$
>*
>* Additional copyrights may follow
>  @@ -28,6 +30,7 @@
>   #include "mpi.h"
>   #include "ompi/mca/coll/coll.h"
>   #include "ompi/mca/coll/base/base.h"
>  +#include "ompi/mca/topo/topo.h"
>   #include "coll_basic.h"
>  
>  
>  @@ -70,6 

Re: [OMPI devel] recomended software stack for development?

2014-09-30 Thread Ralph Castain
Kewl. FWIW: we already have the ability to migrate processes in the ORTE code. 
You can tell the system to try and restart the process in its existing location 
N number of times before requesting relocation. Of course, if a node fails, 
then we automatically relocate the procs to other nodes.

The relocation algorithm (i.e., where to put the relocating process) is in the 
"resilient" mapper component (see orte/mca/rmaps/resilient). It tries to ensure 
that we don't relocate the proc to an inappropriate place.

I can provide more details if you like.
Ralph

On Sep 30, 2014, at 3:20 AM, Manuel Rodríguez Pascual 
 wrote:

> Hi all,
> 
> I kind of broke something with mail mail configuration so I haven't
> been able to properly answer to this earlier, sorry.
> 
> @Jsquyres We are planning to work on fault tolerance and improved
> scheduling cappabilities for HPC. To do so, we are first focusing on
> serial tasks, and in a next step we will work with parallel jobs.  In
> particular, I will be working on job migration, so tasks composing an
> MPI job can be re-allocated inside a cluster. Anyway, this is
> anticipating too much, now we are in the the first steps of the
> project. Also, thanks for the videos and the environment
> recommendations, it has been really helpful.
> 
> @Ralph Castain: Of course :) Our objective is to create open software
> adopting the existing Open-MPI license, and make it available to the
> community. i am not in charge of the "paperwork", but I will make sure
> that someone relevant in my organization looks at this contributor
> agreement-
> 
> 
> Thanks again for your recommendations and warmth welcome. Best regards,
> 
> 
> Manuel
> 
>> 
>> Message: 9
>> Date: Fri, 29 Aug 2014 14:40:08 +
>> From: "Jeff Squyres (jsquyres)" 
>> To: Open MPI Developers List 
>> Subject: Re: [OMPI devel] Fwd: recomended software stack for
>>development?
>> Message-ID: <632d2995-ea78-4aa2-ba94-bc77f05ae...@cisco.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>> 
>> On Aug 29, 2014, at 5:36 AM, Manuel Rodr?guez Pascual 
>> wrote:
>> 
>>> We are a small development team that will soon start working in open-mpi.
>> 
>> Welcome!
>> 
>>> Being total newbies on the area (both on open-mpi and in this kind of
>>> large projects), we are seeking for advise in which tools to use on the
>>> development. Any suggestion on IDE, compiler, regression testing software
>>> and everything else is more than welcome. Of course this is highly personal,
>>> but it would be great to know what you folks are using to help us decide and
>>> start working.
>> 
>> I think you'll find us all over the map on IDE.  I personally use
>> emacs+terminal.  I know others who use vim+terminal.  Many of us use ctags
>> and the like, but it's not quite as helpful as usual because of OMPI's heavy
>> use of pointers.  I don't think many developers use a full-blown IDE.
>> 
>> For compiler, I'm guessing most of us develop with gcc most of the time,
>> although a few may have non-gcc as the default.  We test across a wide
>> variety of compilers, so portability is important.
>> 
>> For regression testing, we use the MPI Testing Tool
>> (https://svn.open-mpi.org/trac/mtt/ and http://mtt.open-mpi.org/).  Many of
>> us have it configured to do builds of the nightly tarballs; some of us push
>> our results to the public database at mtt.open-mpi.org.
>> 
>>> Thanks for your help. We are really looking to cooperate with the project,
>>> so we'll hopefully see you around here for a while!
>> 
>> Just curious: what do you anticipate working on?
>> 
>> It might be a good idea to see our "intro to the OMPI code base" videos:
>> http://www.open-mpi.org/video/?category=internals
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> 
>> --
>> 
>> Message: 11
>> Date: Fri, 29 Aug 2014 07:53:46 -0700
>> From: Ralph Castain 
>> To: Open MPI Developers 
>> Subject: Re: [OMPI devel] Fwd: recomended software stack for
>>development?
>> Message-ID: 
>> Content-Type: text/plain; charset=iso-8859-1
>> 
>> Indeed, welcome!
>> 
>> Just to make things smoother: are you planning to contribute your work back
>> to the community? If so, we'll need a signed contributor agreement - see
>> here:
>> 
>> http://www.open-mpi.org/community/contribute/corporate.php
>> 
> 
> 
> 
> -- 
> Dr. Manuel Rodríguez-Pascual
> skype: manuel.rodriguez.pascual
> phone: (+34) 913466173 // (+34) 679925108
> 
> CIEMAT-Moncloa
> Edificio 22, desp. 1.25
> Avenida Complutense, 40
> 28040- MADRID
> SPAIN
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: 

[OMPI devel] recomended software stack for development?

2014-09-30 Thread Manuel Rodríguez Pascual
Hi all,

I kind of broke something with mail mail configuration so I haven't
been able to properly answer to this earlier, sorry.

@Jsquyres We are planning to work on fault tolerance and improved
scheduling cappabilities for HPC. To do so, we are first focusing on
serial tasks, and in a next step we will work with parallel jobs.  In
particular, I will be working on job migration, so tasks composing an
MPI job can be re-allocated inside a cluster. Anyway, this is
anticipating too much, now we are in the the first steps of the
project. Also, thanks for the videos and the environment
recommendations, it has been really helpful.

@Ralph Castain: Of course :) Our objective is to create open software
adopting the existing Open-MPI license, and make it available to the
community. i am not in charge of the "paperwork", but I will make sure
that someone relevant in my organization looks at this contributor
agreement-


Thanks again for your recommendations and warmth welcome. Best regards,


Manuel

>
> Message: 9
> Date: Fri, 29 Aug 2014 14:40:08 +
> From: "Jeff Squyres (jsquyres)" 
> To: Open MPI Developers List 
> Subject: Re: [OMPI devel] Fwd: recomended software stack for
> development?
> Message-ID: <632d2995-ea78-4aa2-ba94-bc77f05ae...@cisco.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Aug 29, 2014, at 5:36 AM, Manuel Rodr?guez Pascual 
> wrote:
>
>> We are a small development team that will soon start working in open-mpi.
>
> Welcome!
>
>> Being total newbies on the area (both on open-mpi and in this kind of
>> large projects), we are seeking for advise in which tools to use on the
>> development. Any suggestion on IDE, compiler, regression testing software
>> and everything else is more than welcome. Of course this is highly personal,
>> but it would be great to know what you folks are using to help us decide and
>> start working.
>
> I think you'll find us all over the map on IDE.  I personally use
> emacs+terminal.  I know others who use vim+terminal.  Many of us use ctags
> and the like, but it's not quite as helpful as usual because of OMPI's heavy
> use of pointers.  I don't think many developers use a full-blown IDE.
>
> For compiler, I'm guessing most of us develop with gcc most of the time,
> although a few may have non-gcc as the default.  We test across a wide
> variety of compilers, so portability is important.
>
> For regression testing, we use the MPI Testing Tool
> (https://svn.open-mpi.org/trac/mtt/ and http://mtt.open-mpi.org/).  Many of
> us have it configured to do builds of the nightly tarballs; some of us push
> our results to the public database at mtt.open-mpi.org.
>
>> Thanks for your help. We are really looking to cooperate with the project,
>> so we'll hopefully see you around here for a while!
>
> Just curious: what do you anticipate working on?
>
> It might be a good idea to see our "intro to the OMPI code base" videos:
> http://www.open-mpi.org/video/?category=internals
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
> --
>
> Message: 11
> Date: Fri, 29 Aug 2014 07:53:46 -0700
> From: Ralph Castain 
> To: Open MPI Developers 
> Subject: Re: [OMPI devel] Fwd: recomended software stack for
> development?
> Message-ID: 
> Content-Type: text/plain; charset=iso-8859-1
>
> Indeed, welcome!
>
> Just to make things smoother: are you planning to contribute your work back
> to the community? If so, we'll need a signed contributor agreement - see
> here:
>
> http://www.open-mpi.org/community/contribute/corporate.php
>



-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN


[OMPI devel] MPI_Comm_spawn crashes with the openib btl

2014-09-30 Thread Gilles Gouaillardet
Folks,

the dynamic/spawn test from the ibm test suite crashes if the openib btl
is detected
(the test can be ran on one node with an IB port)

here is what happens :

in mca_btl_openib_proc_create,
the macro
OPAL_MODEX_RECV(rc, _btl_openib_component.super.btl_version,
proc, , _size);
does not find any information *but*
rc is OPAL_SUCCESS
msg_size is not updated (e.g. left uninitialized)
message is not updated (e.g. left uninitialized)

then, if msg_size is unitialized with a non zero value, and if message
is uninitialized with
a non valid address, a crash will occur when accessing message.

/* i am not debating here the fact that there is no information returned,
i am simply discussing the crash */

a simple workaround is to initialize msg_size to zero.

that being said, is this the correct fix ?

one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro
like this :

/* from opal/mca/pmix/pmix.h */
#define OPAL_MODEX_RECV_STRING(r, s, p, d, sz)  \
do {\
opal_value_t *kv;   \
if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name,   \
 (s), ))) {  \
if (NULL != kv)
{   \
*(d) =
kv->data.bo.bytes;   \
*(sz) =
kv->data.bo.size;   \
kv->data.bo.bytes = NULL; /* protect the data
*/\

OBJ_RELEASE(kv);\
} else {\
*(sz) = 0;\
(r) = OPAL_ERR_NOT_FOUND;
} \
}   \
} while(0);

/*
*(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz)
*or* (r) could be set
*/

and an other alternate fix is to update the end of the native_get
function like this :

/* from opal/mca/pmix/native/pmix_native.c */

if (found) {
return OPAL_SUCCESS;
}
*kv = NULL;
if (OPAL_SUCCESS == rc) {
if (OPAL_SUCCESS == ret) {
rc = OPAL_ERR_NOT_FOUND;
} else {
rc = ret;
}
}
return rc;

Could you please advise ?

Cheers,

Gilles


Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-30 Thread Gilles Gouaillardet
Nathan,

here is a revision of the previously attached patch, and that supports
graph and dist graph.

Cheers,

Gilles

On 2014/09/30 0:05, Nathan Hjelm wrote:
> An equivalent change would need to be made for graph and dist graph as
> well. That will take a little more work. Also, I was avoiding changing
> anything in topo for 1.8.
>
> -Nathan
>
> On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote:
>>Nathan,
>>
>>why not just make the topology information available at that point as you
>>described it ?
>>
>>the attached patch does this, could you please review it ?
>>
>>Cheers,
>>
>>Gilles
>>
>>On 2014/09/26 2:50, Nathan Hjelm wrote:
>>
>>  On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote:
>>
>>  I finally managed to track down some issues in mpi4py's test suite
>>  using Open MPI 1.8+. The code below should be enough to reproduce the
>>  problem. Run it under valgrind to make sense of my following
>>  diagnostics.
>>
>>  In this code I'm creating a 2D, periodic Cartesian topology out of
>>  COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
>>  links to itself. So we have size=1 but indegree=outdegree=4. However,
>>  in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
>>  being allocated to manage communication:
>>
>>  if (OMPI_COMM_IS_INTER(comm)) {
>>  size = ompi_comm_remote_size(comm);
>>  } else {
>>  size = ompi_comm_size(comm);
>>  }
>>  basic_module->mccb_num_reqs = size * 2;
>>  basic_module->mccb_reqs = (ompi_request_t**)
>>  malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
>>
>>  I guess you have to also special-case for topologies and allocate
>>  indegree+outdegree requests (not sure about this number, just
>>  guessing).
>>
>>
>>  I wish this was possible but the topology information is not available
>>  at that point. We may be able to change that but I don't see the work
>>  completing anytime soon. I committed an alternative fix as r32796 and
>>  CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer
>>  produces a SEGV. Let me know if you run into any more issues.
>>
>>
>>  -Nathan
>>
>>  ___
>>  devel mailing list
>>  de...@open-mpi.org
>>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>  Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15915.php
>> Index: ompi/mca/topo/base/topo_base_cart_create.c
>> ===
>> --- ompi/mca/topo/base/topo_base_cart_create.c   (revision 32807)
>> +++ ompi/mca/topo/base/topo_base_cart_create.c   (working copy)
>> @@ -163,10 +163,18 @@
>>  return MPI_ERR_INTERN;
>>  }
>>  
>> +assert(NULL == new_comm->c_topo);
>> +assert(!(new_comm->c_flags & OMPI_COMM_CART));
>> +new_comm->c_topo   = topo;
>> +new_comm->c_topo->mtc.cart = cart;
>> +new_comm->c_topo->reorder  = reorder;
>> +new_comm->c_flags |= OMPI_COMM_CART;
>>  ret = ompi_comm_enable(old_comm, new_comm,
>> new_rank, num_procs, topo_procs);
>>  if (OMPI_SUCCESS != ret) {
>>  /* something wrong happened during setting the communicator */
>> +new_comm->c_topo = NULL;
>> +new_comm->c_flags &= ~OMPI_COMM_CART;
>>  ompi_comm_free (_comm);
>>  free(topo_procs);
>>  if(NULL != cart->periods) free(cart->periods);
>> @@ -176,10 +184,6 @@
>>  return ret;
>>  }
>>  
>> -new_comm->c_topo   = topo;
>> -new_comm->c_topo->mtc.cart = cart;
>> -new_comm->c_topo->reorder  = reorder;
>> -new_comm->c_flags |= OMPI_COMM_CART;
>>  *comm_topo = new_comm;
>>  
>>  if( MPI_UNDEFINED == new_rank ) {
>> Index: ompi/mca/coll/basic/coll_basic_module.c
>> ===
>> --- ompi/mca/coll/basic/coll_basic_module.c  (revision 32807)
>> +++ ompi/mca/coll/basic/coll_basic_module.c  (working copy)
>> @@ -13,6 +13,8 @@
>>   * Copyright (c) 2012  Sandia National Laboratories. All rights 
>> reserved.
>>   * Copyright (c) 2013  Los Alamos National Security, LLC. All rights
>>   * reserved.
>> + * Copyright (c) 2014  Research Organization for Information Science
>> + * and Technology (RIST). All rights reserved.
>>   * $COPYRIGHT$
>>   * 
>>   * Additional copyrights may follow
>> @@ -28,6 +30,7 @@
>>  #include "mpi.h"
>>  #include "ompi/mca/coll/coll.h"
>>  #include "ompi/mca/coll/base/base.h"
>> +#include "ompi/mca/topo/topo.h"
>>  #include "coll_basic.h"
>>  
>>  
>> @@ -70,6 +73,15 @@
>>  } else {
>>  size = ompi_comm_size(comm);
>>  }
>> +if (comm->c_flags & OMPI_COMM_CART) {
>> +int cart_size;
>> +assert (NULL != comm->c_topo);
>> +