Re: [OMPI devel] Broken abort backtrace functionality
Should be fixed in r32821, scheduled for 1.8.4 On Sep 29, 2014, at 2:00 PM, Devawrote: > I looks like OMPI_MCA_mpi_abort_print_stack=1 is broken. I'm seeing > following warning with it. > > -- > $mpirun -np 2 -x OMPI_MCA_mpi_abort_print_stack=1 ./hello_c > -- > WARNING: A user-supplied value attempted to override the default-only MCA > variable named "mpi_abort_print_stack". > > The user-supplied value was ignored. > -- > -- > WARNING: A user-supplied value attempted to override the default-only MCA > variable named "mpi_abort_print_stack". > > The user-supplied value was ignored. > -- > Hello, world, I am 1 of 2, > Hello, world, I am 0 of 2, > -- > > > It seems HAVE_BACKTRACE is not defined by any configuration but, below > relevant code is guarded with it. > > > #if OPAL_WANT_PRETTY_PRINT_STACKTRACE && defined(HAVE_BACKTRACE) > 0, > OPAL_INFO_LVL_9, > MCA_BASE_VAR_SCOPE_READONLY, > #else > MCA_BASE_VAR_FLAG_DEFAULT_ONLY, > OPAL_INFO_LVL_9, > MCA_BASE_VAR_SCOPE_CONSTANT, > #endif > > $git grep HAVE_BACKTRACE > ompi/runtime/ompi_mpi_params.c:#if OPAL_WANT_PRETTY_PRINT_STACKTRACE && > defined(HAVE_BACKTRACE) > $ > > > -- > -Devendar > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15933.php
Re: [OMPI devel] Github migration: tomorrow
8am US Eastern tomorrow morning is the cutoff. Anything you do tonight will be fine. Note, however, that open CMRs will NOT be moved over to Github -- you'll have to re-file them as pull requests after the migration. On Sep 30, 2014, at 2:09 PM, Pritchard Jr., Howardwrote: > Hi Jeff, > > When's the latest today that we can do checkins without causing problems? > > Howard > > > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres > (jsquyres) > Sent: Tuesday, September 30, 2014 9:57 AM > To: Open MPI Developers List > Subject: [OMPI devel] Github migration: tomorrow > > It's happening tomorrow, October 1, 2014, starting at 8am US Eastern time. > > There was discussion about Bitbucket vs. Github, and all things being equal > (except the cost!), we're going with the original plan of the main OMPI repo > at Github. > > The plan tomorrow is the same as it was last week: > > - SVN and Trac get frozen at 8am US Eastern time. > - The migration is likely to take all day. > - I'll be moving all open, non-CMR tickets from Trac to Github. > - The Trac wiki has already been moved. > > I'll send an "all clear" email when it is safe to start using the main Github > OMPI repo (tickets and git) tomorrow. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15947.php > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15951.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] Github migration: tomorrow
Hi Jeff, When's the latest today that we can do checkins without causing problems? Howard -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Tuesday, September 30, 2014 9:57 AM To: Open MPI Developers List Subject: [OMPI devel] Github migration: tomorrow It's happening tomorrow, October 1, 2014, starting at 8am US Eastern time. There was discussion about Bitbucket vs. Github, and all things being equal (except the cost!), we're going with the original plan of the main OMPI repo at Github. The plan tomorrow is the same as it was last week: - SVN and Trac get frozen at 8am US Eastern time. - The migration is likely to take all day. - I'll be moving all open, non-CMR tickets from Trac to Github. - The Trac wiki has already been moved. I'll send an "all clear" email when it is safe to start using the main Github OMPI repo (tickets and git) tomorrow. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2014/09/15947.php
Re: [OMPI devel] MPI_Comm_spawn crashes with the openib btl
I fixed this in r32818 - the components shouldn't be passing back success if the requested info isn't found. Hope that fixes the problem. On Sep 30, 2014, at 1:54 AM, Gilles Gouaillardetwrote: > Folks, > > the dynamic/spawn test from the ibm test suite crashes if the openib btl > is detected > (the test can be ran on one node with an IB port) > > here is what happens : > > in mca_btl_openib_proc_create, > the macro >OPAL_MODEX_RECV(rc, _btl_openib_component.super.btl_version, >proc, , _size); > does not find any information *but* > rc is OPAL_SUCCESS > msg_size is not updated (e.g. left uninitialized) > message is not updated (e.g. left uninitialized) > > then, if msg_size is unitialized with a non zero value, and if message > is uninitialized with > a non valid address, a crash will occur when accessing message. > > /* i am not debating here the fact that there is no information returned, > i am simply discussing the crash */ > > a simple workaround is to initialize msg_size to zero. > > that being said, is this the correct fix ? > > one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro > like this : > > /* from opal/mca/pmix/pmix.h */ > #define OPAL_MODEX_RECV_STRING(r, s, p, d, sz) \ >do {\ >opal_value_t *kv; \ >if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name, \ > (s), ))) { \ >if (NULL != kv) > { \ >*(d) = > kv->data.bo.bytes; \ >*(sz) = > kv->data.bo.size; \ >kv->data.bo.bytes = NULL; /* protect the data > */\ > > OBJ_RELEASE(kv);\ >} else {\ >*(sz) = 0;\ >(r) = OPAL_ERR_NOT_FOUND; >} \ >} \ >} while(0); > > /* > *(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz) > *or* (r) could be set > */ > > and an other alternate fix is to update the end of the native_get > function like this : > > /* from opal/mca/pmix/native/pmix_native.c */ > >if (found) { >return OPAL_SUCCESS; >} >*kv = NULL; >if (OPAL_SUCCESS == rc) { >if (OPAL_SUCCESS == ret) { >rc = OPAL_ERR_NOT_FOUND; >} else { >rc = ret; >} >} >return rc; > > Could you please advise ? > > Cheers, > > Gilles > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15942.php
[OMPI devel] Github migration: tomorrow
It's happening tomorrow, October 1, 2014, starting at 8am US Eastern time. There was discussion about Bitbucket vs. Github, and all things being equal (except the cost!), we're going with the original plan of the main OMPI repo at Github. The plan tomorrow is the same as it was last week: - SVN and Trac get frozen at 8am US Eastern time. - The migration is likely to take all day. - I'll be moving all open, non-CMR tickets from Trac to Github. - The Trac wiki has already been moved. I'll send an "all clear" email when it is safe to start using the main Github OMPI repo (tickets and git) tomorrow. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] [patch] libnbc intercommunicator iallgather bug
Hi Takahiro, Thanks very much for the patch and the test! After the git migration we'll open an issue and patch nbc_iallgather. This will get pushed to 1.8.4. Howard -Original Message- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Kawashima, Takahiro Sent: Monday, September 29, 2014 10:22 PM To: de...@open-mpi.org Subject: [OMPI devel] [patch] libnbc intercommunicator iallgather bug Hi, The attached program intercommunicator-iallgather.c outputs message "MPI Error in MPI_Testall() (18)" forever and doesn't finish. This is because libnbc has typos of send/recv. See attached intercommunicator-iallgather.patch for the fix. The patch modifies iallgather_inter and iallgather_intra. The modification of iallgather_intra is just for symmetry with iallgather_inter. Users guarantee the consistency of send/recv. Both trunk and v1.8 branch have this issue. Regards, Takahiro Kawashima, MPI development team, Fujitsu
Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one
Not quite right. There still is no topology information at collective selection time for either graph or dist graph. -Nathan On Tue, Sep 30, 2014 at 02:03:27PM +0900, Gilles Gouaillardet wrote: >Nathan, > >here is a revision of the previously attached patch, and that supports >graph and dist graph. > >Cheers, > >Gilles > >On 2014/09/30 0:05, Nathan Hjelm wrote: > > An equivalent change would need to be made for graph and dist graph as > well. That will take a little more work. Also, I was avoiding changing > anything in topo for 1.8. > > -Nathan > > On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote: > > Nathan, > > why not just make the topology information available at that point as you > described it ? > > the attached patch does this, could you please review it ? > > Cheers, > > Gilles > > On 2014/09/26 2:50, Nathan Hjelm wrote: > > On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote: > > I finally managed to track down some issues in mpi4py's test suite > using Open MPI 1.8+. The code below should be enough to reproduce the > problem. Run it under valgrind to make sense of my following > diagnostics. > > In this code I'm creating a 2D, periodic Cartesian topology out of > COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out > links to itself. So we have size=1 but indegree=outdegree=4. However, > in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are > being allocated to manage communication: > > if (OMPI_COMM_IS_INTER(comm)) { > size = ompi_comm_remote_size(comm); > } else { > size = ompi_comm_size(comm); > } > basic_module->mccb_num_reqs = size * 2; > basic_module->mccb_reqs = (ompi_request_t**) > malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs); > > I guess you have to also special-case for topologies and allocate > indegree+outdegree requests (not sure about this number, just > guessing). > > > I wish this was possible but the topology information is not available > at that point. We may be able to change that but I don't see the work > completing anytime soon. I committed an alternative fix as r32796 and > CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer > produces a SEGV. Let me know if you run into any more issues. > > > -Nathan > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/09/15915.php > > Index: ompi/mca/topo/base/topo_base_cart_create.c > === > --- ompi/mca/topo/base/topo_base_cart_create.c (revision 32807) > +++ ompi/mca/topo/base/topo_base_cart_create.c (working copy) > @@ -163,10 +163,18 @@ > return MPI_ERR_INTERN; > } > > +assert(NULL == new_comm->c_topo); > +assert(!(new_comm->c_flags & OMPI_COMM_CART)); > +new_comm->c_topo = topo; > +new_comm->c_topo->mtc.cart = cart; > +new_comm->c_topo->reorder = reorder; > +new_comm->c_flags |= OMPI_COMM_CART; > ret = ompi_comm_enable(old_comm, new_comm, > new_rank, num_procs, topo_procs); > if (OMPI_SUCCESS != ret) { > /* something wrong happened during setting the communicator */ > +new_comm->c_topo = NULL; > +new_comm->c_flags &= ~OMPI_COMM_CART; > ompi_comm_free (_comm); > free(topo_procs); > if(NULL != cart->periods) free(cart->periods); > @@ -176,10 +184,6 @@ > return ret; > } > > -new_comm->c_topo = topo; > -new_comm->c_topo->mtc.cart = cart; > -new_comm->c_topo->reorder = reorder; > -new_comm->c_flags |= OMPI_COMM_CART; > *comm_topo = new_comm; > > if( MPI_UNDEFINED == new_rank ) { > Index: ompi/mca/coll/basic/coll_basic_module.c > === > --- ompi/mca/coll/basic/coll_basic_module.c (revision 32807) > +++ ompi/mca/coll/basic/coll_basic_module.c (working copy) > @@ -13,6 +13,8 @@ >* Copyright (c) 2012 Sandia National Laboratories. All rights > reserved. >* Copyright (c) 2013 Los Alamos National Security, LLC. All rights >* reserved. > + * Copyright (c) 2014 Research Organization for Information Science > + * and Technology (RIST). All rights reserved. >* $COPYRIGHT$ >* >* Additional copyrights may follow > @@ -28,6 +30,7 @@ > #include "mpi.h" > #include "ompi/mca/coll/coll.h" > #include "ompi/mca/coll/base/base.h" > +#include "ompi/mca/topo/topo.h" > #include "coll_basic.h" > > > @@ -70,6
Re: [OMPI devel] recomended software stack for development?
Kewl. FWIW: we already have the ability to migrate processes in the ORTE code. You can tell the system to try and restart the process in its existing location N number of times before requesting relocation. Of course, if a node fails, then we automatically relocate the procs to other nodes. The relocation algorithm (i.e., where to put the relocating process) is in the "resilient" mapper component (see orte/mca/rmaps/resilient). It tries to ensure that we don't relocate the proc to an inappropriate place. I can provide more details if you like. Ralph On Sep 30, 2014, at 3:20 AM, Manuel Rodríguez Pascualwrote: > Hi all, > > I kind of broke something with mail mail configuration so I haven't > been able to properly answer to this earlier, sorry. > > @Jsquyres We are planning to work on fault tolerance and improved > scheduling cappabilities for HPC. To do so, we are first focusing on > serial tasks, and in a next step we will work with parallel jobs. In > particular, I will be working on job migration, so tasks composing an > MPI job can be re-allocated inside a cluster. Anyway, this is > anticipating too much, now we are in the the first steps of the > project. Also, thanks for the videos and the environment > recommendations, it has been really helpful. > > @Ralph Castain: Of course :) Our objective is to create open software > adopting the existing Open-MPI license, and make it available to the > community. i am not in charge of the "paperwork", but I will make sure > that someone relevant in my organization looks at this contributor > agreement- > > > Thanks again for your recommendations and warmth welcome. Best regards, > > > Manuel > >> >> Message: 9 >> Date: Fri, 29 Aug 2014 14:40:08 + >> From: "Jeff Squyres (jsquyres)" >> To: Open MPI Developers List >> Subject: Re: [OMPI devel] Fwd: recomended software stack for >>development? >> Message-ID: <632d2995-ea78-4aa2-ba94-bc77f05ae...@cisco.com> >> Content-Type: text/plain; charset="iso-8859-1" >> >> On Aug 29, 2014, at 5:36 AM, Manuel Rodr?guez Pascual >> wrote: >> >>> We are a small development team that will soon start working in open-mpi. >> >> Welcome! >> >>> Being total newbies on the area (both on open-mpi and in this kind of >>> large projects), we are seeking for advise in which tools to use on the >>> development. Any suggestion on IDE, compiler, regression testing software >>> and everything else is more than welcome. Of course this is highly personal, >>> but it would be great to know what you folks are using to help us decide and >>> start working. >> >> I think you'll find us all over the map on IDE. I personally use >> emacs+terminal. I know others who use vim+terminal. Many of us use ctags >> and the like, but it's not quite as helpful as usual because of OMPI's heavy >> use of pointers. I don't think many developers use a full-blown IDE. >> >> For compiler, I'm guessing most of us develop with gcc most of the time, >> although a few may have non-gcc as the default. We test across a wide >> variety of compilers, so portability is important. >> >> For regression testing, we use the MPI Testing Tool >> (https://svn.open-mpi.org/trac/mtt/ and http://mtt.open-mpi.org/). Many of >> us have it configured to do builds of the nightly tarballs; some of us push >> our results to the public database at mtt.open-mpi.org. >> >>> Thanks for your help. We are really looking to cooperate with the project, >>> so we'll hopefully see you around here for a while! >> >> Just curious: what do you anticipate working on? >> >> It might be a good idea to see our "intro to the OMPI code base" videos: >> http://www.open-mpi.org/video/?category=internals >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> >> -- >> >> Message: 11 >> Date: Fri, 29 Aug 2014 07:53:46 -0700 >> From: Ralph Castain >> To: Open MPI Developers >> Subject: Re: [OMPI devel] Fwd: recomended software stack for >>development? >> Message-ID: >> Content-Type: text/plain; charset=iso-8859-1 >> >> Indeed, welcome! >> >> Just to make things smoother: are you planning to contribute your work back >> to the community? If so, we'll need a signed contributor agreement - see >> here: >> >> http://www.open-mpi.org/community/contribute/corporate.php >> > > > > -- > Dr. Manuel Rodríguez-Pascual > skype: manuel.rodriguez.pascual > phone: (+34) 913466173 // (+34) 679925108 > > CIEMAT-Moncloa > Edificio 22, desp. 1.25 > Avenida Complutense, 40 > 28040- MADRID > SPAIN > ___ > devel mailing list > de...@open-mpi.org > Subscription:
[OMPI devel] recomended software stack for development?
Hi all, I kind of broke something with mail mail configuration so I haven't been able to properly answer to this earlier, sorry. @Jsquyres We are planning to work on fault tolerance and improved scheduling cappabilities for HPC. To do so, we are first focusing on serial tasks, and in a next step we will work with parallel jobs. In particular, I will be working on job migration, so tasks composing an MPI job can be re-allocated inside a cluster. Anyway, this is anticipating too much, now we are in the the first steps of the project. Also, thanks for the videos and the environment recommendations, it has been really helpful. @Ralph Castain: Of course :) Our objective is to create open software adopting the existing Open-MPI license, and make it available to the community. i am not in charge of the "paperwork", but I will make sure that someone relevant in my organization looks at this contributor agreement- Thanks again for your recommendations and warmth welcome. Best regards, Manuel > > Message: 9 > Date: Fri, 29 Aug 2014 14:40:08 + > From: "Jeff Squyres (jsquyres)"> To: Open MPI Developers List > Subject: Re: [OMPI devel] Fwd: recomended software stack for > development? > Message-ID: <632d2995-ea78-4aa2-ba94-bc77f05ae...@cisco.com> > Content-Type: text/plain; charset="iso-8859-1" > > On Aug 29, 2014, at 5:36 AM, Manuel Rodr?guez Pascual > wrote: > >> We are a small development team that will soon start working in open-mpi. > > Welcome! > >> Being total newbies on the area (both on open-mpi and in this kind of >> large projects), we are seeking for advise in which tools to use on the >> development. Any suggestion on IDE, compiler, regression testing software >> and everything else is more than welcome. Of course this is highly personal, >> but it would be great to know what you folks are using to help us decide and >> start working. > > I think you'll find us all over the map on IDE. I personally use > emacs+terminal. I know others who use vim+terminal. Many of us use ctags > and the like, but it's not quite as helpful as usual because of OMPI's heavy > use of pointers. I don't think many developers use a full-blown IDE. > > For compiler, I'm guessing most of us develop with gcc most of the time, > although a few may have non-gcc as the default. We test across a wide > variety of compilers, so portability is important. > > For regression testing, we use the MPI Testing Tool > (https://svn.open-mpi.org/trac/mtt/ and http://mtt.open-mpi.org/). Many of > us have it configured to do builds of the nightly tarballs; some of us push > our results to the public database at mtt.open-mpi.org. > >> Thanks for your help. We are really looking to cooperate with the project, >> so we'll hopefully see you around here for a while! > > Just curious: what do you anticipate working on? > > It might be a good idea to see our "intro to the OMPI code base" videos: > http://www.open-mpi.org/video/?category=internals > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > -- > > Message: 11 > Date: Fri, 29 Aug 2014 07:53:46 -0700 > From: Ralph Castain > To: Open MPI Developers > Subject: Re: [OMPI devel] Fwd: recomended software stack for > development? > Message-ID: > Content-Type: text/plain; charset=iso-8859-1 > > Indeed, welcome! > > Just to make things smoother: are you planning to contribute your work back > to the community? If so, we'll need a signed contributor agreement - see > here: > > http://www.open-mpi.org/community/contribute/corporate.php > -- Dr. Manuel Rodríguez-Pascual skype: manuel.rodriguez.pascual phone: (+34) 913466173 // (+34) 679925108 CIEMAT-Moncloa Edificio 22, desp. 1.25 Avenida Complutense, 40 28040- MADRID SPAIN
[OMPI devel] MPI_Comm_spawn crashes with the openib btl
Folks, the dynamic/spawn test from the ibm test suite crashes if the openib btl is detected (the test can be ran on one node with an IB port) here is what happens : in mca_btl_openib_proc_create, the macro OPAL_MODEX_RECV(rc, _btl_openib_component.super.btl_version, proc, , _size); does not find any information *but* rc is OPAL_SUCCESS msg_size is not updated (e.g. left uninitialized) message is not updated (e.g. left uninitialized) then, if msg_size is unitialized with a non zero value, and if message is uninitialized with a non valid address, a crash will occur when accessing message. /* i am not debating here the fact that there is no information returned, i am simply discussing the crash */ a simple workaround is to initialize msg_size to zero. that being said, is this the correct fix ? one possible alternate fix is to update the OPAL_MODEX_RECV_STRING macro like this : /* from opal/mca/pmix/pmix.h */ #define OPAL_MODEX_RECV_STRING(r, s, p, d, sz) \ do {\ opal_value_t *kv; \ if (OPAL_SUCCESS == ((r) = opal_pmix.get(&(p)->proc_name, \ (s), ))) { \ if (NULL != kv) { \ *(d) = kv->data.bo.bytes; \ *(sz) = kv->data.bo.size; \ kv->data.bo.bytes = NULL; /* protect the data */\ OBJ_RELEASE(kv);\ } else {\ *(sz) = 0;\ (r) = OPAL_ERR_NOT_FOUND; } \ } \ } while(0); /* *(sz) = 0; and (r) = OPAL_ERR_NOT_FOUND; can be seen as redundant, *(sz) *or* (r) could be set */ and an other alternate fix is to update the end of the native_get function like this : /* from opal/mca/pmix/native/pmix_native.c */ if (found) { return OPAL_SUCCESS; } *kv = NULL; if (OPAL_SUCCESS == rc) { if (OPAL_SUCCESS == ret) { rc = OPAL_ERR_NOT_FOUND; } else { rc = ret; } } return rc; Could you please advise ? Cheers, Gilles
Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one
Nathan, here is a revision of the previously attached patch, and that supports graph and dist graph. Cheers, Gilles On 2014/09/30 0:05, Nathan Hjelm wrote: > An equivalent change would need to be made for graph and dist graph as > well. That will take a little more work. Also, I was avoiding changing > anything in topo for 1.8. > > -Nathan > > On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote: >>Nathan, >> >>why not just make the topology information available at that point as you >>described it ? >> >>the attached patch does this, could you please review it ? >> >>Cheers, >> >>Gilles >> >>On 2014/09/26 2:50, Nathan Hjelm wrote: >> >> On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote: >> >> I finally managed to track down some issues in mpi4py's test suite >> using Open MPI 1.8+. The code below should be enough to reproduce the >> problem. Run it under valgrind to make sense of my following >> diagnostics. >> >> In this code I'm creating a 2D, periodic Cartesian topology out of >> COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out >> links to itself. So we have size=1 but indegree=outdegree=4. However, >> in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are >> being allocated to manage communication: >> >> if (OMPI_COMM_IS_INTER(comm)) { >> size = ompi_comm_remote_size(comm); >> } else { >> size = ompi_comm_size(comm); >> } >> basic_module->mccb_num_reqs = size * 2; >> basic_module->mccb_reqs = (ompi_request_t**) >> malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs); >> >> I guess you have to also special-case for topologies and allocate >> indegree+outdegree requests (not sure about this number, just >> guessing). >> >> >> I wish this was possible but the topology information is not available >> at that point. We may be able to change that but I don't see the work >> completing anytime soon. I committed an alternative fix as r32796 and >> CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer >> produces a SEGV. Let me know if you run into any more issues. >> >> >> -Nathan >> >> ___ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/09/15915.php >> Index: ompi/mca/topo/base/topo_base_cart_create.c >> === >> --- ompi/mca/topo/base/topo_base_cart_create.c (revision 32807) >> +++ ompi/mca/topo/base/topo_base_cart_create.c (working copy) >> @@ -163,10 +163,18 @@ >> return MPI_ERR_INTERN; >> } >> >> +assert(NULL == new_comm->c_topo); >> +assert(!(new_comm->c_flags & OMPI_COMM_CART)); >> +new_comm->c_topo = topo; >> +new_comm->c_topo->mtc.cart = cart; >> +new_comm->c_topo->reorder = reorder; >> +new_comm->c_flags |= OMPI_COMM_CART; >> ret = ompi_comm_enable(old_comm, new_comm, >> new_rank, num_procs, topo_procs); >> if (OMPI_SUCCESS != ret) { >> /* something wrong happened during setting the communicator */ >> +new_comm->c_topo = NULL; >> +new_comm->c_flags &= ~OMPI_COMM_CART; >> ompi_comm_free (_comm); >> free(topo_procs); >> if(NULL != cart->periods) free(cart->periods); >> @@ -176,10 +184,6 @@ >> return ret; >> } >> >> -new_comm->c_topo = topo; >> -new_comm->c_topo->mtc.cart = cart; >> -new_comm->c_topo->reorder = reorder; >> -new_comm->c_flags |= OMPI_COMM_CART; >> *comm_topo = new_comm; >> >> if( MPI_UNDEFINED == new_rank ) { >> Index: ompi/mca/coll/basic/coll_basic_module.c >> === >> --- ompi/mca/coll/basic/coll_basic_module.c (revision 32807) >> +++ ompi/mca/coll/basic/coll_basic_module.c (working copy) >> @@ -13,6 +13,8 @@ >> * Copyright (c) 2012 Sandia National Laboratories. All rights >> reserved. >> * Copyright (c) 2013 Los Alamos National Security, LLC. All rights >> * reserved. >> + * Copyright (c) 2014 Research Organization for Information Science >> + * and Technology (RIST). All rights reserved. >> * $COPYRIGHT$ >> * >> * Additional copyrights may follow >> @@ -28,6 +30,7 @@ >> #include "mpi.h" >> #include "ompi/mca/coll/coll.h" >> #include "ompi/mca/coll/base/base.h" >> +#include "ompi/mca/topo/topo.h" >> #include "coll_basic.h" >> >> >> @@ -70,6 +73,15 @@ >> } else { >> size = ompi_comm_size(comm); >> } >> +if (comm->c_flags & OMPI_COMM_CART) { >> +int cart_size; >> +assert (NULL != comm->c_topo); >> +