[hwloc-devel] Create success (hwloc git dev-239-gfe0111e)

2014-09-29 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-239-gfe0111e
Start time: Mon Sep 29 21:02:51 EDT 2014
End time:   Mon Sep 29 21:04:18 EDT 2014

Your friendly daemon,
Cyrador


[OMPI devel] Problem on MPI_Type_create_resized and multiple BTL modules

2014-09-29 Thread Kawashima, Takahiro
Hi George,

Thank you for attending the meeting at Kyoto. As we talked
at the meeting, my colleague suffers from a datatype problem.

See attached create_resized.c. It creates a datatype with an
LB marker using MPI_Type_create_struct and MPI_Type_create_resized.

Expected contents of the output file (received_data) is:

0: t1 = 0.1, t2 = 0.2
1: t1 = 1.1, t2 = 1.2
2: t1 = 2.1, t2 = 2.2
3: t1 = 3.1, t2 = 3.2
4: t1 = 4.1, t2 = 4.2
... snip ...
1995: t1 = 1995.1, t2 = 1995.2
1996: t1 = 1996.1, t2 = 1996.2
1997: t1 = 1997.1, t2 = 1997.2
1998: t1 = 1998.1, t2 = 1998.2
1999: t1 = 1999.1, t2 = 1999.2


But if you run the program many times with multiple BTL modules
and with their small eager_limit and small max_send_size,
you'll see on some run:

0: t1 = 0.1, t2 = 0.2
1: t1 = 1.1, t2 = 1.2
2: t1 = 2.1, t2 = 2.2
3: t1 = 3.1, t2 = 3.2
4: t1 = 4.1, t2 = 4.2
... snip ...
470: t1 = 470.1, t2 = 470.2
471: t1 = 471.1, t2 = 471.2
472: t1 = 472.1, t2 = 472.2
473: t1 = 473.1, t2 = 473.2
474: t1 = 474.1, t2 = 0<-- broken!
475: t1 = 0, t2 = 475.1
476: t1 = 0, t2 = 476.1
477: t1 = 0, t2 = 477.1
... snip ...
1995: t1 = 0, t2 = 1995.1
1996: t1 = 0, t2 = 1996.1
1997: t1 = 0, t2 = 1997.1
1998: t1 = 0, t2 = 1998.1
1999: t1 = 0, t2 = 1999.1


The index of the array at which data start to break (474 in the
above case) may change on every run.
Same result appears on both trunk and v1.8.3.

You can reproduce this with the following options if you have
multiple IB HCAs.

  -n 2
  --mca btl self,openib
  --mca btl_openib_eager_limit 256
  --mca btl_openib_max_send_size 384

Or if you don't have multiple NICs, with the following options.

  -n 2
  --host localhost
  --mca btl self,sm,vader
  --mca btl_vader_exclusivity 65536
  --mca btl_vader_eager_limit 256
  --mca btl_vader_max_send_size 384
  --mca btl_sm_exclusivity 65536
  --mca btl_sm_eager_limit 256
  --mca btl_sm_max_send_size 384

My colleague found that OPAL convertor on the receiving process
seems to add the LB value twice for out-of-order arrival of
fragments when computing the receive buffer write-offset.

He created the patch bellow. Our program works fine with
this patch but we don't know this is a correct fix.
Could you see this issue?

Index: opal/datatype/opal_convertor.c
===
--- opal/datatype/opal_convertor.c  (revision 32807)
+++ opal/datatype/opal_convertor.c  (working copy)
@@ -362,11 +362,11 @@
 if( OPAL_LIKELY(0 == count) ) {
 pStack[1].type = pElems->elem.common.type;
 pStack[1].count= pElems->elem.count;
-pStack[1].disp = pElems->elem.disp;
+pStack[1].disp = 0;
 } else {
 pStack[1].type  = OPAL_DATATYPE_UINT1;
 pStack[1].count = pData->size - count;
-pStack[1].disp  = pData->true_lb + count;
+pStack[1].disp  = count;
 }
 pStack[1].index= 0;  /* useless */


Best regards,
Takahiro Kawashima,
MPI development team,
Fujitsu
/* np=2 */

#include 
#include 
#include 

struct structure {
double not_transfered;
double transfered_1;
double transfered_2;
};

int main(int argc, char *argv[])
{
int i, n = 2000, myrank;
struct structure *data;
MPI_Datatype struct_type, temp_type;
MPI_Datatype types[2] = {MPI_DOUBLE, MPI_DOUBLE};
int blocklens[2] = {1, 1};
MPI_Aint disps[3];

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, );

data = malloc(sizeof(data[0]) * n);

if (myrank == 0) {
for (i = 0; i < n; i++) {
data[i].transfered_1 = i + 0.1;
data[i].transfered_2 = i + 0.2;
}
}

MPI_Get_address([0].transfered_1, [0]);
MPI_Get_address([0].transfered_2, [1]);
MPI_Get_address([0], [2]);
disps[1] -= disps[2]; /*  8 */
disps[0] -= disps[2]; /* 16 */
MPI_Type_create_struct(2, blocklens, disps, types, _type);
MPI_Type_create_resized(temp_type, 0, sizeof(data[0]), _type);
MPI_Type_commit(_type);

if (myrank == 0) {
MPI_Send(data, n, struct_type, 1, 0, MPI_COMM_WORLD);
} else if (myrank == 1) {
MPI_Recv(data, n, struct_type, 0, 0, MPI_COMM_WORLD,
 MPI_STATUS_IGNORE);
}

MPI_Type_free(_type);
MPI_Type_free(_type);

if (myrank == 1) {
FILE *fp;
fp = fopen("received_data", "w");
for (i = 0; i < n; i++) {
fprintf(fp, "%d: t1 = %g, t2 = %g\n",
i, data[i].transfered_1, data[i].transfered_2);
}
fclose(fp);
}

free(data);
MPI_Finalize();

return 0;
}
Index: opal/datatype/opal_convertor.c
===
--- opal/datatype/opal_convertor.c	(revision 32807)
+++ opal/datatype/opal_convertor.c	(working copy)
@@ -362,11 +362,11 @@
 if( OPAL_LIKELY(0 == count) ) {
 

Re: [OMPI devel] --enable-visibility ( OPAL_C_HAVE_VISIBILITY) behavior in trunk

2014-09-29 Thread Devendar Bureddy
I see behavioral difference between 1.8.x and trunk for OPAL_C_HAVE_VISIBILITY 
definition on same build environment.  is this expected?

-Devendar

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Monday, September 29, 2014 4:25 PM
To: Open MPI Developers List
Subject: Re: [OMPI devel] --enable-visibility ( OPAL_C_HAVE_VISIBILITY) 
behavior in trunk

I can't quite parse what you are saying -- do you have a specific question?


On Sep 29, 2014, at 7:18 PM, Devendar Bureddy  wrote:

> This is supposed to be enable by default.  In trunk, I see that  
> OPAL_C_HAVE_VISIBILITY is defined to 0 by default.   1.8.x looks fine
>  
> Configure : ./configure -prefix=$PWD/install 
> --enable-mpirun-prefix-by-default --disable-mpi-fortran --disable-vt 
> --enable-debug --enable-oshmem --with-pmi GCC : gcc version 4.4.7 
> 20120313 (Red Hat 4.4.7-3) (GCC)
>  
> -Devendar
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15936.php


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/09/15937.php


Re: [OMPI devel] --enable-visibility ( OPAL_C_HAVE_VISIBILITY) behavior in trunk

2014-09-29 Thread Jeff Squyres (jsquyres)
I can't quite parse what you are saying -- do you have a specific question?


On Sep 29, 2014, at 7:18 PM, Devendar Bureddy  wrote:

> This is supposed to be enable by default.  In trunk, I see that  
> OPAL_C_HAVE_VISIBILITY is defined to 0 by default.   1.8.x looks fine
>  
> Configure : ./configure –prefix=$PWD/install 
> --enable-mpirun-prefix-by-default --disable-mpi-fortran --disable-vt 
> --enable-debug --enable-oshmem --with-pmi
> GCC : gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)
>  
> -Devendar
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15936.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] --enable-visibility ( OPAL_C_HAVE_VISIBILITY) behavior in trunk

2014-09-29 Thread Devendar Bureddy
This is supposed to be enable by default.  In trunk, I see that  
OPAL_C_HAVE_VISIBILITY is defined to 0 by default.   1.8.x looks fine

Configure : ./configure -prefix=$PWD/install --enable-mpirun-prefix-by-default 
--disable-mpi-fortran --disable-vt --enable-debug --enable-oshmem --with-pmi
GCC : gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)

-Devendar


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32814 - trunk/ompi/mca/coll/ml

2014-09-29 Thread Pritchard Jr., Howard
Hi Jeff,

Sure if that's the preferred check inside ompi itself.

Howard

-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Monday, September 29, 2014 3:59 PM
To: Open MPI Developers List
Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32814 - 
trunk/ompi/mca/coll/ml

Howard --

Do you want to just check ompi_mpi_thread_provided (== MPI_THREAD_MULTIPLE), 
instead?


On Sep 29, 2014, at 5:02 PM,  
 wrote:

> Author: hppritcha (Howard Pritchard)
> Date: 2014-09-29 17:02:15 EDT (Mon, 29 Sep 2014) New Revision: 32814
> URL: https://svn.open-mpi.org/trac/ompi/changeset/32814
> 
> Log:
> disqualify coll ml for MPI_THREAD_MULTIPLE
> 
> Text files modified: 
>   trunk/ompi/mca/coll/ml/coll_ml_module.c | 7 +++ 
> 
>   1 files changed, 7 insertions(+), 0 deletions(-)
> 
> Modified: trunk/ompi/mca/coll/ml/coll_ml_module.c
> ==
> --- trunk/ompi/mca/coll/ml/coll_ml_module.c   Mon Sep 29 15:26:33 2014
> (r32813)
> +++ trunk/ompi/mca/coll/ml/coll_ml_module.c   2014-09-29 17:02:15 EDT (Mon, 
> 29 Sep 2014)  (r32814)
> @@ -2896,6 +2896,13 @@
> return NULL;
> }
> 
> +if (opal_using_threads()) {
> +ML_VERBOSE(10, ("coll:ml: MPI_THREAD_MULTIPLE not suppported; 
> skipping this component"));
> +*priority = -1;
> +return NULL;
> +}
> +
> +
> /* NTH: Disabled this check until we have a better one. */ #if 0
> if (!ompi_rte_proc_is_bound) {
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/09/15934.php


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32814 - trunk/ompi/mca/coll/ml

2014-09-29 Thread Jeff Squyres (jsquyres)
Howard --

Do you want to just check ompi_mpi_thread_provided (== MPI_THREAD_MULTIPLE), 
instead?


On Sep 29, 2014, at 5:02 PM,  
 wrote:

> Author: hppritcha (Howard Pritchard)
> Date: 2014-09-29 17:02:15 EDT (Mon, 29 Sep 2014)
> New Revision: 32814
> URL: https://svn.open-mpi.org/trac/ompi/changeset/32814
> 
> Log:
> disqualify coll ml for MPI_THREAD_MULTIPLE
> 
> Text files modified: 
>   trunk/ompi/mca/coll/ml/coll_ml_module.c | 7 +++ 
> 
>   1 files changed, 7 insertions(+), 0 deletions(-)
> 
> Modified: trunk/ompi/mca/coll/ml/coll_ml_module.c
> ==
> --- trunk/ompi/mca/coll/ml/coll_ml_module.c   Mon Sep 29 15:26:33 2014
> (r32813)
> +++ trunk/ompi/mca/coll/ml/coll_ml_module.c   2014-09-29 17:02:15 EDT (Mon, 
> 29 Sep 2014)  (r32814)
> @@ -2896,6 +2896,13 @@
> return NULL;
> }
> 
> +if (opal_using_threads()) {
> +ML_VERBOSE(10, ("coll:ml: MPI_THREAD_MULTIPLE not suppported; 
> skipping this component"));
> +*priority = -1;
> +return NULL;
> +}
> +
> +
> /* NTH: Disabled this check until we have a better one. */
> #if 0
> if (!ompi_rte_proc_is_bound) {
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



[OMPI devel] Broken abort backtrace functionality

2014-09-29 Thread Deva
I looks like OMPI_MCA_mpi_abort_print_stack=1 is broken.  I'm seeing
following warning with it.

--
$mpirun -np 2  -x OMPI_MCA_mpi_abort_print_stack=1 ./hello_c
--
WARNING: A user-supplied value attempted to override the default-only MCA
variable named "mpi_abort_print_stack".

The user-supplied value was ignored.
--
--
WARNING: A user-supplied value attempted to override the default-only MCA
variable named "mpi_abort_print_stack".

The user-supplied value was ignored.
--
Hello, world, I am 1 of 2,
Hello, world, I am 0 of 2,
--


It seems HAVE_BACKTRACE is not defined by any configuration but, below
relevant code is guarded with it.


#if OPAL_WANT_PRETTY_PRINT_STACKTRACE && defined(HAVE_BACKTRACE)
 0,
 OPAL_INFO_LVL_9,
 MCA_BASE_VAR_SCOPE_READONLY,
#else
 MCA_BASE_VAR_FLAG_DEFAULT_ONLY,
 OPAL_INFO_LVL_9,
 MCA_BASE_VAR_SCOPE_CONSTANT,
#endif

$git grep HAVE_BACKTRACE
ompi/runtime/ompi_mpi_params.c:#if OPAL_WANT_PRETTY_PRINT_STACKTRACE &&
defined(HAVE_BACKTRACE)
$


-- 
-Devendar


[OMPI devel] release 1.9

2014-09-29 Thread Pritchard Jr., Howard
Hi Folks,

The release managers for the 1.9/2.0 stream have been putting together
notes on features for this series, what sort of code pruning to do, etc.

See

https://github.com/open-mpi/ompi/wiki/Releasev19

We will be discussing the contents of the table(s) at the bottom of the wiki
at tomorrow's meeting.

Thanks,
Howard

-
Howard Pritchard
HPC-5
Los Alamos National Laboratory




Re: [OMPI devel] Valgrind warning in MPI_Win_allocate[_shared]()

2014-09-29 Thread Ralph Castain
Good catch - the problem is that ompi_info_get_bool returns "success" if the 
value isn't found, setting "flag" to false, but doesn't set the value of the 
param itself. So if you don't specify "blocking_fence" in MPI_Info, then the 
"blocking_fence" flag wasn't being set.

Fixed in r32812 and scheduled for 1.8.4

Thanks!
Ralph

On Sep 28, 2014, at 2:43 AM, Lisandro Dalcin  wrote:

> Just built 1.8.3 for another round of testing with mpi4py. I'm getting
> the following valgrind warning:
> 
> ==4718== Conditional jump or move depends on uninitialised value(s)
> ==4718==at 0xD0D9F4C: component_select (osc_sm_component.c:333)
> ==4718==by 0x4CF44F6: ompi_osc_base_select (osc_base_init.c:73)
> ==4718==by 0x4C68B69: ompi_win_allocate (win.c:182)
> ==4718==by 0x4CBB8C2: PMPI_Win_allocate (pwin_allocate.c:79)
> ==4718==by 0x400898: main (in /home/dalcinl/Devel/BUGS-MPI/openmpi/a.out)
> 
> The offending code is in ompi/mca/osc/sm/osc_sm_component.c, it seems
> you forgot to initialize the "blocking_fence" to a default true or
> false value.
> 
>bool blocking_fence;
>int flag;
> 
>if (OMPI_SUCCESS != ompi_info_get_bool(info, "blocking_fence",
>   _fence, )) {
>goto error;
>}
> 
>if (blocking_fence) {
> 
> 
> -- 
> Lisandro Dalcin
> 
> Research Scientist
> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
> Numerical Porous Media Center (NumPor)
> King Abdullah University of Science and Technology (KAUST)
> http://numpor.kaust.edu.sa/
> 
> 4700 King Abdullah University of Science and Technology
> al-Khawarizmi Bldg (Bldg 1), Office # 4332
> Thuwal 23955-6900, Kingdom of Saudi Arabia
> http://www.kaust.edu.sa
> 
> Office Phone: +966 12 808-0459
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15925.php



Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-29 Thread Nathan Hjelm
An equivalent change would need to be made for graph and dist graph as
well. That will take a little more work. Also, I was avoiding changing
anything in topo for 1.8.

-Nathan

On Mon, Sep 29, 2014 at 08:02:41PM +0900, Gilles Gouaillardet wrote:
>Nathan,
> 
>why not just make the topology information available at that point as you
>described it ?
> 
>the attached patch does this, could you please review it ?
> 
>Cheers,
> 
>Gilles
> 
>On 2014/09/26 2:50, Nathan Hjelm wrote:
> 
>  On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote:
> 
>  I finally managed to track down some issues in mpi4py's test suite
>  using Open MPI 1.8+. The code below should be enough to reproduce the
>  problem. Run it under valgrind to make sense of my following
>  diagnostics.
> 
>  In this code I'm creating a 2D, periodic Cartesian topology out of
>  COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
>  links to itself. So we have size=1 but indegree=outdegree=4. However,
>  in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
>  being allocated to manage communication:
> 
>  if (OMPI_COMM_IS_INTER(comm)) {
>  size = ompi_comm_remote_size(comm);
>  } else {
>  size = ompi_comm_size(comm);
>  }
>  basic_module->mccb_num_reqs = size * 2;
>  basic_module->mccb_reqs = (ompi_request_t**)
>  malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
> 
>  I guess you have to also special-case for topologies and allocate
>  indegree+outdegree requests (not sure about this number, just
>  guessing).
> 
> 
>  I wish this was possible but the topology information is not available
>  at that point. We may be able to change that but I don't see the work
>  completing anytime soon. I committed an alternative fix as r32796 and
>  CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer
>  produces a SEGV. Let me know if you run into any more issues.
> 
> 
>  -Nathan
> 
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>  Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15915.php

> Index: ompi/mca/topo/base/topo_base_cart_create.c
> ===
> --- ompi/mca/topo/base/topo_base_cart_create.c(revision 32807)
> +++ ompi/mca/topo/base/topo_base_cart_create.c(working copy)
> @@ -163,10 +163,18 @@
>  return MPI_ERR_INTERN;
>  }
>  
> +assert(NULL == new_comm->c_topo);
> +assert(!(new_comm->c_flags & OMPI_COMM_CART));
> +new_comm->c_topo   = topo;
> +new_comm->c_topo->mtc.cart = cart;
> +new_comm->c_topo->reorder  = reorder;
> +new_comm->c_flags |= OMPI_COMM_CART;
>  ret = ompi_comm_enable(old_comm, new_comm,
> new_rank, num_procs, topo_procs);
>  if (OMPI_SUCCESS != ret) {
>  /* something wrong happened during setting the communicator */
> +new_comm->c_topo = NULL;
> +new_comm->c_flags &= ~OMPI_COMM_CART;
>  ompi_comm_free (_comm);
>  free(topo_procs);
>  if(NULL != cart->periods) free(cart->periods);
> @@ -176,10 +184,6 @@
>  return ret;
>  }
>  
> -new_comm->c_topo   = topo;
> -new_comm->c_topo->mtc.cart = cart;
> -new_comm->c_topo->reorder  = reorder;
> -new_comm->c_flags |= OMPI_COMM_CART;
>  *comm_topo = new_comm;
>  
>  if( MPI_UNDEFINED == new_rank ) {
> Index: ompi/mca/coll/basic/coll_basic_module.c
> ===
> --- ompi/mca/coll/basic/coll_basic_module.c   (revision 32807)
> +++ ompi/mca/coll/basic/coll_basic_module.c   (working copy)
> @@ -13,6 +13,8 @@
>   * Copyright (c) 2012  Sandia National Laboratories. All rights reserved.
>   * Copyright (c) 2013  Los Alamos National Security, LLC. All rights
>   * reserved.
> + * Copyright (c) 2014  Research Organization for Information Science
> + * and Technology (RIST). All rights reserved.
>   * $COPYRIGHT$
>   * 
>   * Additional copyrights may follow
> @@ -28,6 +30,7 @@
>  #include "mpi.h"
>  #include "ompi/mca/coll/coll.h"
>  #include "ompi/mca/coll/base/base.h"
> +#include "ompi/mca/topo/topo.h"
>  #include "coll_basic.h"
>  
>  
> @@ -70,6 +73,15 @@
>  } else {
>  size = ompi_comm_size(comm);
>  }
> +if (comm->c_flags & OMPI_COMM_CART) {
> +int cart_size;
> +assert (NULL != comm->c_topo);
> +comm->c_topo->topo.cart.cartdim_get(comm, _size);
> +cart_size *= 2;
> +if (cart_size > size) {
> +size = cart_size;
> +}
> +}
>  basic_module->mccb_num_reqs = size * 2;
>  basic_module->mccb_reqs = (ompi_request_t**) 
>  malloc(sizeof(ompi_request_t *) * 

Re: [OMPI devel] Neighbor collectives with periodic Cartesian topologies of size one

2014-09-29 Thread Gilles Gouaillardet
Nathan,

why not just make the topology information available at that point as
you described it ?

the attached patch does this, could you please review it ?

Cheers,

Gilles

On 2014/09/26 2:50, Nathan Hjelm wrote:
> On Tue, Aug 26, 2014 at 07:03:24PM +0300, Lisandro Dalcin wrote:
>> I finally managed to track down some issues in mpi4py's test suite
>> using Open MPI 1.8+. The code below should be enough to reproduce the
>> problem. Run it under valgrind to make sense of my following
>> diagnostics.
>>
>> In this code I'm creating a 2D, periodic Cartesian topology out of
>> COMM_SELF. In this case, the process in COMM_SELF has 4 logical in/out
>> links to itself. So we have size=1 but indegree=outdegree=4. However,
>> in ompi/mca/coll/basic/coll_basic_module.c, "size * 2" request are
>> being allocated to manage communication:
>>
>> if (OMPI_COMM_IS_INTER(comm)) {
>> size = ompi_comm_remote_size(comm);
>> } else {
>> size = ompi_comm_size(comm);
>> }
>> basic_module->mccb_num_reqs = size * 2;
>> basic_module->mccb_reqs = (ompi_request_t**)
>> malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);
>>
>> I guess you have to also special-case for topologies and allocate
>> indegree+outdegree requests (not sure about this number, just
>> guessing).
>>
> I wish this was possible but the topology information is not available
> at that point. We may be able to change that but I don't see the work
> completing anytime soon. I committed an alternative fix as r32796 and
> CMR'd it to 1.8.3. I can confirm that the attached reproducer no longer
> produces a SEGV. Let me know if you run into any more issues.
>
>
> -Nathan
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15915.php

Index: ompi/mca/topo/base/topo_base_cart_create.c
===
--- ompi/mca/topo/base/topo_base_cart_create.c  (revision 32807)
+++ ompi/mca/topo/base/topo_base_cart_create.c  (working copy)
@@ -163,10 +163,18 @@
 return MPI_ERR_INTERN;
 }

+assert(NULL == new_comm->c_topo);
+assert(!(new_comm->c_flags & OMPI_COMM_CART));
+new_comm->c_topo   = topo;
+new_comm->c_topo->mtc.cart = cart;
+new_comm->c_topo->reorder  = reorder;
+new_comm->c_flags |= OMPI_COMM_CART;
 ret = ompi_comm_enable(old_comm, new_comm,
new_rank, num_procs, topo_procs);
 if (OMPI_SUCCESS != ret) {
 /* something wrong happened during setting the communicator */
+new_comm->c_topo = NULL;
+new_comm->c_flags &= ~OMPI_COMM_CART;
 ompi_comm_free (_comm);
 free(topo_procs);
 if(NULL != cart->periods) free(cart->periods);
@@ -176,10 +184,6 @@
 return ret;
 }

-new_comm->c_topo   = topo;
-new_comm->c_topo->mtc.cart = cart;
-new_comm->c_topo->reorder  = reorder;
-new_comm->c_flags |= OMPI_COMM_CART;
 *comm_topo = new_comm;

 if( MPI_UNDEFINED == new_rank ) {
Index: ompi/mca/coll/basic/coll_basic_module.c
===
--- ompi/mca/coll/basic/coll_basic_module.c (revision 32807)
+++ ompi/mca/coll/basic/coll_basic_module.c (working copy)
@@ -13,6 +13,8 @@
  * Copyright (c) 2012  Sandia National Laboratories. All rights reserved.
  * Copyright (c) 2013  Los Alamos National Security, LLC. All rights
  * reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -28,6 +30,7 @@
 #include "mpi.h"
 #include "ompi/mca/coll/coll.h"
 #include "ompi/mca/coll/base/base.h"
+#include "ompi/mca/topo/topo.h"
 #include "coll_basic.h"


@@ -70,6 +73,15 @@
 } else {
 size = ompi_comm_size(comm);
 }
+if (comm->c_flags & OMPI_COMM_CART) {
+int cart_size;
+assert (NULL != comm->c_topo);
+comm->c_topo->topo.cart.cartdim_get(comm, _size);
+cart_size *= 2;
+if (cart_size > size) {
+size = cart_size;
+}
+}
 basic_module->mccb_num_reqs = size * 2;
 basic_module->mccb_reqs = (ompi_request_t**) 
 malloc(sizeof(ompi_request_t *) * basic_module->mccb_num_reqs);