from:"George Bosilca"

Re: [OMPI devel] SM component init unload

2012-07-03 Thread George Bosilca

Juan,

Something weird is going on there. The selection mechanism for the SM coll and 
SM BTL should be very similar. However, the SM BTL successfully select itself 
while the SM coll fails to determine that all processes are local.

In the coll SM the issue is that the remote procs do not have the LOCAL flag 
set, even when they are on the local node (however the ompi_proc_local() return 
has a special flag stating that all processes in the job are local). I compared 
the initialization of the SM BTL and the SM coll. It turns out that somehow the 
procs returned by ompi_proc_all() and the procs provided to the add_proc of the 
BTLs are not identical. The second have the local flag correctly set, so I went 
a little bit deeper.

Here is what I found while toying with gdb inside:

breakpoint 1, mca_coll_sm_init_query (enable_progress_threads=false, 
enable_mpi_threads=false) at coll_sm_module.c:132

(gdb) p procs[0]
$1 = (ompi_proc_t *) 0x109a1e8c0
(gdb) p procs[1]
$2 = (ompi_proc_t *) 0x109a1e970
(gdb) p procs[0]->proc_flags
$3 = 0
(gdb) p procs[1]->proc_flags
$4 = 4095

Breakpoint 2, mca_btl_sm_add_procs (btl=0x109baa1c0, nprocs=2, 
procs=0x109a319e0, peers=0x109a319f0, reachability=0x7fff691378e8) at 
btl_sm.c:427

(gdb) p procs[0]
$5 = (struct ompi_proc_t *) 0x109a1e8c0
(gdb) p procs[1]
$6 = (struct ompi_proc_t *) 0x109a1e970
(gdb) p procs[0]->proc_flags
$7 = 1920
(gdb) p procs[1]->proc_flags
$8 = 4095

Thus the problem seems to come from the fact that during the initialization of 
the SM coll the flags are not correctly set. However, this is somehow expected 
… as the call to the initialization happens before the exchange of the business 
cards (and therefore there is no way to have any knowledge about the remote 
procs).

So, either something changed drastically in the way we set the flags for remote 
processes or we did not use the SM coll for the last 3 years. I think the 
culprit is r21967 (https://svn.open-mpi.org/trac/ompi/changeset/21967) who 
added a "selection" logic based on knowledge about remote procs in the coll SM 
initialization function. But this selection logic was way to early !!!

I would strongly encourage you not to use this SM collective component in 
anything related to production runs.

  george.

PS: However, if you want to toy with the SM coll apply the following patch:
Index: coll_sm_module.c
===
--- coll_sm_module.c(revision 26737)
+++ coll_sm_module.c(working copy)
@@ -128,6 +128,7 @@
 int mca_coll_sm_init_query(bool enable_progress_threads,
bool enable_mpi_threads)
 {
+#if 0
 ompi_proc_t *my_proc, **procs;
 size_t i, size;

@@ -158,7 +159,7 @@
 "coll:sm:init_query: no other local procs; 
disqualifying myself");
 return OMPI_ERR_NOT_AVAILABLE;
 }
-
+#endif
 /* Don't do much here because we don't really want to allocate any
shared memory until this component is selected to be used. */
 opal_output_verbose(10, mca_coll_base_output,

On Jul 4, 2012, at 02:05 , Ralph Castain wrote:

> Okay, please try this again with r26739 or above. You can remove the rest of 
> the "verbose" settings and the --display-map so we declutter the output. 
> Please add "-mca orte_nidmap_verbose 20" to your cmd line.
> 
> Thanks!
> Ralph
> 
> 
> On Tue, Jul 3, 2012 at 1:50 PM, Juan A. Rico  wrote:
> Here is the output.
> 
> [jarico@Metropolis-01 examples]$ 
> /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --bind-to-core 
> --bynode --mca mca_base_verbose 100 --mca mca_coll_base_output 100  --mca 
> coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca 
> mca_verbose 100 --mca mca_base_verbose 100 --mca coll_base_verbose 100 -n 2 
> -mca grpcomm_base_verbose 5 ./bmem
> [Metropolis-01:24563] mca: base: components_open: Looking for hwloc components
> [Metropolis-01:24563] mca: base: components_open: opening hwloc components
> [Metropolis-01:24563] mca: base: components_open: found loaded component 
> hwloc142
> [Metropolis-01:24563] mca: base: components_open: component hwloc142 has no 
> register function
> [Metropolis-01:24563] mca: base: components_open: component hwloc142 has no 
> open function
> [Metropolis-01:24563] hwloc:base:get_topology
> [Metropolis-01:24563] hwloc:base: no cpus specified - using root available 
> cpuset
> [Metropolis-01:24563] mca:base:select:(grpcomm) Querying component [bad]
> [Metropolis-01:24563] mca:base:select:(grpcomm) Query of component [bad] set 
> priority to 10
> [Metropolis-01:24563] mca:base:select:(grpcomm) Selected component [bad]
> [Metropolis-01:24563] [[36265,0],0] grpcomm:base:receive start comm
> --
> WARNING: a request was made to bind a process. While the system
> supports binding the process itself, at least one node does NOT
> support binding memory to the process location.

Re: [OMPI devel] non-blocking barrier

2012-07-06 Thread George Bosilca

No, it is not right. With the ibarrier usage you're making below, the output 
should be similar to the first case (all should leave at earlist at 6.0). The 
ibarrier is still a synchronizing point, all processes MUST reach it before 
anyone is allowed to leave.

However, if you move the ibarrier on proc < 2 before the sleep, the output you 
got become possible.

  George



On Jul 6, 2012, at 7:53, Eugene Loh  wrote:

> Either there is a problem with MPI_Ibarrier or I don't understand the 
> semantics.
> 
> The following example is with openmpi-1.9a1r26747.  (Thanks for the fix in 
> 26757.  I tried with that as well with same results.)  I get similar results 
> for different OSes, compilers, bitness, etc.
> 
> % cat ibarrier.c
> #include 
> #include 
> #include 
> #include 
> 
> int main(int argc, char** argv) {
>int i, me;
>double t0, t1, t2;
>MPI_Request req;
> 
>MPI_Init(,);
>MPI_Comm_rank(MPI_COMM_WORLD,);
> 
>MPI_Barrier(MPI_COMM_WORLD);
>MPI_Barrier(MPI_COMM_WORLD);
>MPI_Barrier(MPI_COMM_WORLD);
>t0 = MPI_Wtime();  /* set "time zero" */
> 
>if ( me < 2 ) sleep(3);/* two processes delay before hitting 
> barrier */
>t1 = MPI_Wtime() - t0;
>MPI_Barrier(MPI_COMM_WORLD);
>t2 = MPI_Wtime() - t0;
>printf("%d entered at %3.1lf and exited at %3.1lf\n", me, t1, t2);
> 
>if ( me < 2 ) sleep(3);/* two processes delay before hitting 
> barrier */
>t1 = MPI_Wtime() - t0;
>MPI_Ibarrier(MPI_COMM_WORLD, );
>MPI_Wait(, MPI_STATUS_IGNORE);
>t2 = MPI_Wtime() - t0;
>printf("%d entered at %3.1lf and exited at %3.1lf\n", me, t1, t2);
> 
>MPI_Finalize();
>return 0;
> }
> % mpirun -n 4 ./a.out
> 0 entered at 3.0 and exited at 3.0
> 1 entered at 3.0 and exited at 3.0
> 2 entered at 0.0 and exited at 3.0
> 3 entered at 0.0 and exited at 3.0
> 0 entered at 6.0 and exited at 6.0
> 1 entered at 6.0 and exited at 6.0
> 2 entered at 3.0 and exited at 3.0
> 3 entered at 3.0 and exited at 3.0
> 
> With the first barrier, no one leaves until the last process has entered.  
> With the non-blocking barrier, two processes enter and leave before the two 
> laggards arrive at the barrier.  Is that right?
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] SM component init unload

2012-07-06 Thread George Bosilca

You're right, the code was overzealous. I fix it by removing the parsing of the 
modex data completely. In any case, the collective module has another chance of 
deselecting itself, upon creation of a new communicator (thus, after the modex 
was completed).

  George



On Jul 6, 2012, at 2:20, Ralph Castain <rhc.open...@gmail.com> wrote:

> George: is there any reason for opening and selecting the coll framework so 
> early in mpi_init? I'm wondering if we can move that code to the end of the 
> procedure so we wouldn't need the locality info until later.
> 
> Sent from my iPad
> 
> On Jul 5, 2012, at 10:05 AM, Jeff Squyres <jsquy...@cisco.com> wrote:
> 
>> Thanks George.  I filed https://svn.open-mpi.org/trac/ompi/ticket/3162 about 
>> this.
>> 
>> 
>> On Jul 4, 2012, at 5:34 AM, Juan A. Rico wrote:
>> 
>>> Thanks all of you for your time and early responses.
>>> 
>>> After applying the patch, SM can be used by raising its priority. It is 
>>> enough for me (I hope so). But it continues failing when I specify --mca 
>>> coll sm,self in the command line (with tuned too).
>>> I am not going to use this release in production, only for playing with the 
>>> code :-)
>>> 
>>> Regards,
>>> Juan Antonio.
>>> 
>>> El 04/07/2012, a las 02:59, George Bosilca escribió:
>>> 
>>>> Juan,
>>>> 
>>>> Something weird is going on there. The selection mechanism for the SM coll 
>>>> and SM BTL should be very similar. However, the SM BTL successfully select 
>>>> itself while the SM coll fails to determine that all processes are local.
>>>> 
>>>> In the coll SM the issue is that the remote procs do not have the LOCAL 
>>>> flag set, even when they are on the local node (however the 
>>>> ompi_proc_local() return has a special flag stating that all processes in 
>>>> the job are local). I compared the initialization of the SM BTL and the SM 
>>>> coll. It turns out that somehow the procs returned by ompi_proc_all() and 
>>>> the procs provided to the add_proc of the BTLs are not identical. The 
>>>> second have the local flag correctly set, so I went a little bit deeper.
>>>> 
>>>> Here is what I found while toying with gdb inside:
>>>> 
>>>> breakpoint 1, mca_coll_sm_init_query (enable_progress_threads=false, 
>>>> enable_mpi_threads=false) at coll_sm_module.c:132
>>>> 
>>>> (gdb) p procs[0]
>>>> $1 = (ompi_proc_t *) 0x109a1e8c0
>>>> (gdb) p procs[1]
>>>> $2 = (ompi_proc_t *) 0x109a1e970
>>>> (gdb) p procs[0]->proc_flags
>>>> $3 = 0
>>>> (gdb) p procs[1]->proc_flags
>>>> $4 = 4095
>>>> 
>>>> Breakpoint 2, mca_btl_sm_add_procs (btl=0x109baa1c0, nprocs=2, 
>>>> procs=0x109a319e0, peers=0x109a319f0, reachability=0x7fff691378e8) at 
>>>> btl_sm.c:427
>>>> 
>>>> (gdb) p procs[0]
>>>> $5 = (struct ompi_proc_t *) 0x109a1e8c0
>>>> (gdb) p procs[1]
>>>> $6 = (struct ompi_proc_t *) 0x109a1e970
>>>> (gdb) p procs[0]->proc_flags
>>>> $7 = 1920
>>>> (gdb) p procs[1]->proc_flags
>>>> $8 = 4095
>>>> 
>>>> Thus the problem seems to come from the fact that during the 
>>>> initialization of the SM coll the flags are not correctly set. However, 
>>>> this is somehow expected … as the call to the initialization happens 
>>>> before the exchange of the business cards (and therefore there is no way 
>>>> to have any knowledge about the remote procs).
>>>> 
>>>> So, either something changed drastically in the way we set the flags for 
>>>> remote processes or we did not use the SM coll for the last 3 years. I 
>>>> think the culprit is r21967 
>>>> (https://svn.open-mpi.org/trac/ompi/changeset/21967) who added a 
>>>> "selection" logic based on knowledge about remote procs in the coll SM 
>>>> initialization function. But this selection logic was way to early !!!
>>>> 
>>>> I would strongly encourage you not to use this SM collective component in 
>>>> anything related to production runs.
>>>> 
>>>> george.
>>>> 
>>>> PS: However, if you want to toy with the SM coll apply the following patch:
>>>> Index: coll_sm_module.c
>>>> ===
>>&g

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26801 - trunk/ompi/include

2012-07-19 Thread George Bosilca

Thanks.

  george.

On Jul 19, 2012, at 14:30 , Ralph Castain wrote:

> I had to revert this commit so the trunk would build again. Perhaps you and 
> Brian could work together to update the implementation to match whatever API 
> is correct, and then commit the entire change as one revision so the trunk 
> remains buildable?
> 
> 
> On Jul 18, 2012, at 7:23 AM, svn-commit-mai...@open-mpi.org wrote:
> 
>> Author: bosilca (George Bosilca)
>> Date: 2012-07-18 10:23:23 EDT (Wed, 18 Jul 2012)
>> New Revision: 26801
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/26801
>> 
>> Log:
>> Fix the non-blocking collective prototypes.
>> 
>> Text files modified: 
>>  trunk/ompi/include/mpi.h.in | 6 ++  
>>  1 files changed, 2 insertions(+), 4 deletions(-)
>> 
>> Modified: trunk/ompi/include/mpi.h.in
>> ==
>> --- trunk/ompi/include/mpi.h.in  Wed Jul 18 10:22:45 2012(r26800)
>> +++ trunk/ompi/include/mpi.h.in  2012-07-18 10:23:23 EDT (Wed, 18 Jul 
>> 2012)  (r26801)
>> @@ -2003,12 +2003,10 @@
>> MPI_Datatype datatype, MPI_Op);
>> OMPI_DECLSPEC  int PMPI_Reduce_scatter(void *sendbuf, void *recvbuf, int 
>> *recvcounts,
>>   MPI_Datatype datatype, MPI_Op op, 
>> MPI_Comm comm);
>> -OMPI_DECLSPEC  int PMPI_Ireduce_scatter(void *sendbuf, void *recvbuf, int 
>> *recvcounts,
>> -MPI_Datatype datatype, MPI_Op op, 
>> MPI_Comm comm, MPI_Request *request);
>> -OMPI_DECLSPEC  int PMPI_Reduce_scatter(void *sendbuf, void *recvbuf, int 
>> *recvcounts,
>> -   MPI_Datatype datatype, MPI_Op op, 
>> MPI_Comm comm);
>> OMPI_DECLSPEC  int PMPI_Reduce_scatter_block(void *sendbuf, void *recvbuf, 
>> int recvcount,
>> MPI_Datatype datatype, MPI_Op 
>> op, MPI_Comm comm);
>> +OMPI_DECLSPEC  int PMPI_Ireduce_scatter(void *sendbuf, void *recvbuf, int 
>> *recvcounts,
>> +   MPI_Datatype datatype, MPI_Op op, 
>> MPI_Comm comm);
>> OMPI_DECLSPEC  int PMPI_Ireduce_scatter_block(void *sendbuf, void *recvbuf, 
>> int recvcount,
>>  MPI_Datatype datatype, MPI_Op 
>> op, MPI_Comm comm, MPI_Request *request);
>> OMPI_DECLSPEC  int PMPI_Register_datarep(char *datarep,
>> ___
>> svn mailing list
>> s...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Existing frameworks for remote device memory exclusive read/write

2012-07-23 Thread George Bosilca

Dima,

A while back we investigated the potential of a memcpy module in the OPAL 
layer. We had some proof of concept, but finally didn't went forward due to 
lack of resources. However, we the skeleton of the code is still in the trunk 
(in opal/mca/memcpy). While I don't think it will cover all the cases expressed 
in your email due to it's synchronous nature, it can be a first step.

In Open MPI, we avoid using memcpy directly. Instead, we use the convertor 
mechanism to deal with all memory to memory type of operations (as it hide the 
complexities of managing complex memory layout as defined by the MPI 
datatypes). Few weeks ago, Rolf (our NVIDIA guru), applied a patch allowing 
asynchronous memcpy in the OB1 PML for the last version of CUDA. Dig in the 
code looking for the HAVE_CUDA define to see the code he used to achieve 
asynchronous memcpy.

  george.

On Jul 21, 2012, at 00:27 , Dmitry N. Mikushin wrote:

> Dear OpenMPI developers,
> 
> My question is not directly related to OpenMPI, but might be related to 
> internal project kitchen and your wide experiences.
> 
> Say, there is a need to implement a transparent read/write of PCI-Express 
> device internal memory from the host system. It is allowed to use only 
> software capabilities of PCI-E device, which can memcpy synchronously and 
> asynchronously in both directions. Memcpy can be initiated both by host and 
> device. Host is required to implement its device memory read/write in 
> critical sections: no PCI-E code could be using the same memory, while it is 
> in operation.
> 
> Question: could you please point related projects/subsystems, which code 
> could be reused to implement the described functionality? We are mostly 
> interested in ones implementing multiple strategies of memory 
> synchronization, since there could be quite some, depending on typical memory 
> access patterns, for example. This subsystem is necessary for our project, 
> however not its primary goal, that's why we would like to borrow existing 
> things in best possible way.
> 
> Thanks and best regards,
> - Dima.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] Blame the compiler …

2012-07-23 Thread George Bosilca

These compilers guys that enforce standards with random limitations because 
they understand the benefit of never-ending "help" messages … ;)

  george

show_help_lex.c:1185: warning: 'input' defined but not used
../../../../ompi/opal/mca/hwloc/base/hwloc_base_open.c: In function 
'opal_hwloc_base_open':
../../../../ompi/opal/mca/hwloc/base/hwloc_base_open.c:99: warning: string 
length '556' is greater than the length '509' ISO C90 compilers are required to 
support

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread George Bosilca

Takahiro,

Indeed we were way to lax on canceling the requests. I modified your patch to 
correctly deal with the MEMCHECK macro (remove the call from the branch that 
will requires a completion function). The modified patch is attached below. I 
will commit asap.

  Thanks,
george.


Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
===
--- ompi/mca/pml/ob1/pml_ob1_recvreq.c  (revision 26870)
+++ ompi/mca/pml/ob1/pml_ob1_recvreq.c  (working copy)
@@ -3,7 +3,7 @@
  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
  * University Research and Technology
  * Corporation.  All rights reserved.
- * Copyright (c) 2004-2009 The University of Tennessee and The University
+ * Copyright (c) 2004-2012 The University of Tennessee and The University
  * of Tennessee Research Foundation.  All rights
  * reserved.
  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, 
@@ -15,6 +15,7 @@
  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
  * reserved.
+ * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -97,36 +98,26 @@
 mca_pml_ob1_recv_request_t* request = 
(mca_pml_ob1_recv_request_t*)ompi_request;
 mca_pml_ob1_comm_t* comm = request->req_recv.req_base.req_comm->c_pml_comm;

-if( true == ompi_request->req_complete ) { /* way to late to cancel this 
one */
-/*
- * Receive request completed, make user buffer accessable.
- */
-MEMCHECKER(
-memchecker_call(_memchecker_base_mem_defined,
-request->req_recv.req_base.req_addr,
-request->req_recv.req_base.req_count,
-request->req_recv.req_base.req_datatype);
-);
+if( true == request->req_match_received ) { /* way to late to cancel this 
one */
+assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* not 
matched isn't it */
 return OMPI_SUCCESS;
 }

 /* The rest should be protected behind the match logic lock */
 OPAL_THREAD_LOCK(>matching_lock);
-if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match has 
not been already done */
-   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
-  opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
-   } else {
-  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
-  opal_list_remove_item(>specific_receives, 
(opal_list_item_t*)request);
-   }
-   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
-&(request->req_recv.req_base), PERUSE_RECV );
-   /**
-* As now the PML is done with this request we have to force the 
pml_complete
-* to true. Otherwise, the request will never be freed.
-*/
-   request->req_recv.req_base.req_pml_complete = true;
+if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
+opal_list_remove_item( >wild_receives, 
(opal_list_item_t*)request );
+} else {
+mca_pml_ob1_comm_proc_t* proc = comm->procs + 
request->req_recv.req_base.req_peer;
+opal_list_remove_item(>specific_receives, 
(opal_list_item_t*)request);
 }
+PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
+ &(request->req_recv.req_base), PERUSE_RECV );
+/**
+ * As now the PML is done with this request we have to force the 
pml_complete
+ * to true. Otherwise, the request will never be freed.
+ */
+request->req_recv.req_base.req_pml_complete = true;
 OPAL_THREAD_UNLOCK(>matching_lock);

 OPAL_THREAD_LOCK(_request_lock);
@@ -138,7 +129,7 @@
 MCA_PML_OB1_RECV_REQUEST_MPI_COMPLETE(request);
 OPAL_THREAD_UNLOCK(_request_lock);
 /*
- * Receive request cancelled, make user buffer accessable.
+ * Receive request cancelled, make user buffer accessible.
  */
 MEMCHECKER(
 memchecker_call(_memchecker_base_mem_defined,

On Jul 26, 2012, at 13:41 , Kawashima, Takahiro wrote:

> Hi Open MPI developers,
> 
> I found a small bug in Open MPI.
> 
> See attached program cancelled.c.
> In this program, rank 1 tries to cancel a MPI_Irecv and calls a MPI_Recv
> instead if the cancellation succeeds. This program should terminate whether
> the cancellation succeeds or not. But it leads a deadlock in MPI_Recv after
> printing "MPI_Test_cancelled: 1".
> I confirmed it works fine with MPICH2.
> 
> The problem is in mca_pml_ob1_recv_request_cancel function in
> ompi/mca/pml/ob1/pml_ob1_recvreq.c. It accepts the cancellation unless
> the request has been completed. I think it

Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if it has a matched recv frag

2012-07-26 Thread George Bosilca

Rich,

There is no matching in this case. Canceling a receive operation is possible 
only up to the moment the request has been matched. Up to this point the 
sequence numbers of the peers are not used, so removing a non-matched request 
has no impact on the sequence number.

  george.

On Jul 26, 2012, at 16:31 , Richard Graham wrote:

> I do not see any resetting of sequence numbers.  It has been a long time 
> since I have looked at the matching code, so don't know if the out-of-order 
> handling has been taken out.  If not, the sequence number has to be dealt 
> with in some manner, or else there will be a gap in the arriving sequence 
> numbers, and the matching logic will prevent any further progress.
> 
> Rich
> 
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of George Bosilca
> Sent: Thursday, July 26, 2012 10:06 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [patch] MPI_Cancel should not cancel a request if 
> it has a matched recv frag
> 
> Takahiro,
> 
> Indeed we were way to lax on canceling the requests. I modified your patch to 
> correctly deal with the MEMCHECK macro (remove the call from the branch that 
> will requires a completion function). The modified patch is attached below. I 
> will commit asap.
> 
>  Thanks,
>george.
> 
> 
> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
> ===
> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 26870)
> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c(working copy)
> @@ -3,7 +3,7 @@
>  * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>  * University Research and Technology
>  * Corporation.  All rights reserved.
> - * Copyright (c) 2004-2009 The University of Tennessee and The University
> + * Copyright (c) 2004-2012 The University of Tennessee and The 
> + University
>  * of Tennessee Research Foundation.  All rights
>  * reserved.
>  * Copyright (c) 2004-2008 High Performance Computing Center Stuttgart, @@ 
> -15,6 +15,7 @@
>  * Copyright (c) 2012  NVIDIA Corporation.  All rights reserved.
>  * Copyright (c) 2011-2012 Los Alamos National Security, LLC. All rights
>  * reserved.
> + * Copyright (c) 2012  FUJITSU LIMITED.  All rights reserved.
>  * $COPYRIGHT$
>  *
>  * Additional copyrights may follow
> @@ -97,36 +98,26 @@
> mca_pml_ob1_recv_request_t* request = 
> (mca_pml_ob1_recv_request_t*)ompi_request;
> mca_pml_ob1_comm_t* comm = 
> request->req_recv.req_base.req_comm->c_pml_comm;
> 
> -if( true == ompi_request->req_complete ) { /* way to late to cancel this 
> one */
> -/*
> - * Receive request completed, make user buffer accessable.
> - */
> -MEMCHECKER(
> -memchecker_call(_memchecker_base_mem_defined,
> -request->req_recv.req_base.req_addr,
> -request->req_recv.req_base.req_count,
> -request->req_recv.req_base.req_datatype);
> -);
> +if( true == request->req_match_received ) { /* way to late to cancel 
> this one */
> +assert( OMPI_ANY_TAG != ompi_request->req_status.MPI_TAG ); /* 
> + not matched isn't it */
> return OMPI_SUCCESS;
> }
> 
> /* The rest should be protected behind the match logic lock */
> OPAL_THREAD_LOCK(>matching_lock);
> -if( OMPI_ANY_TAG == ompi_request->req_status.MPI_TAG ) { /* the match 
> has not been already done */
> -   if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> -  opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> -   } else {
> -  mca_pml_ob1_comm_proc_t* proc = comm->procs + 
> request->req_recv.req_base.req_peer;
> -  opal_list_remove_item(>specific_receives, 
> (opal_list_item_t*)request);
> -   }
> -   PERUSE_TRACE_COMM_EVENT( PERUSE_COMM_REQ_REMOVE_FROM_POSTED_Q,
> -&(request->req_recv.req_base), PERUSE_RECV );
> -   /**
> -* As now the PML is done with this request we have to force the 
> pml_complete
> -* to true. Otherwise, the request will never be freed.
> -*/
> -   request->req_recv.req_base.req_pml_complete = true;
> +if( request->req_recv.req_base.req_peer == OMPI_ANY_SOURCE ) {
> +opal_list_remove_item( >wild_receives, 
> (opal_list_item_t*)request );
> +} else {
> +mca_pml_ob1_comm_proc_t* proc = com

Re: [OMPI devel] [OMPI svn] svn:open-mpi r26868 - in trunk/orte/mca/plm: base rsh

2012-07-26 Thread George Bosilca

r26868 seems to have some issues. It works well as long as all processes are 
started on the same node (aka. there is a single daemon), but it breaks with 
the error message attached below if there are more than two daemons.

$ mpirun -np 2 --bynode ./runme
[node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent 
to a process whose contact information is unknown in file 
../../../../../ompi/orte/mca/rml/oob/rml_oob_send.c at line 362
[node01:07767] [[21341,0],1] attempted to send to [[21341,0],2]: tag 15
[node01:07767] [[21341,0],1] ORTE_ERROR_LOG: A message is attempting to be sent 
to a process whose contact information is unknown in file 
../../../../ompi/orte/mca/grpcomm/base/grpcomm_base_xcast.c at line 157

I confirm that applying the reverted commit brings the trunk to a normal state.

Please - a tad more care in what gets committed??

  george.


On Jul 25, 2012, at 23:46 , svn-commit-mai...@open-mpi.org wrote:

> Author: rhc (Ralph Castain)
> Date: 2012-07-25 17:46:45 EDT (Wed, 25 Jul 2012)
> New Revision: 26868
> URL: https://svn.open-mpi.org/trac/ompi/changeset/26868
> 
> Log:
> Reconnect the rsh/ssh error reporting code for remote spawns to report 
> failure to launch. Ensure the HNP correctly reports non-zero exit status when 
> ssh encounters a problem.
> 
> Thanks to Terry for spotting it!
> 
> Text files modified: 
>   trunk/orte/mca/plm/base/plm_base_launch_support.c |44 
> 
>   trunk/orte/mca/plm/base/plm_base_receive.c| 6 + 
>   
>   trunk/orte/mca/plm/base/plm_private.h | 4 +++   
>   
>   trunk/orte/mca/plm/rsh/plm_rsh_module.c   |18 +++-  
>   
>   4 files changed, 62 insertions(+), 10 deletions(-)
> 
> Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c
> ==
> --- trunk/orte/mca/plm/base/plm_base_launch_support.c Wed Jul 25 12:32:51 
> 2012(r26867)
> +++ trunk/orte/mca/plm/base/plm_base_launch_support.c 2012-07-25 17:46:45 EDT 
> (Wed, 25 Jul 2012)  (r26868)
> @@ -741,6 +741,50 @@
> 
> }
> 
> +void orte_plm_base_daemon_failed(int st, orte_process_name_t* sender,
> + opal_buffer_t *buffer,
> + orte_rml_tag_t tag, void *cbdata)
> +{
> +int status, rc;
> +int32_t n;
> +orte_vpid_t vpid;
> +orte_proc_t *daemon;
> +
> +/* get the daemon job, if necessary */
> +if (NULL == jdatorted) {
> +jdatorted = orte_get_job_data_object(ORTE_PROC_MY_NAME->jobid);
> +}
> +
> +/* unpack the daemon that failed */
> +n=1;
> +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , 
> ORTE_VPID))) {
> +ORTE_ERROR_LOG(rc);
> +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE);
> +goto finish;
> +}
> +
> +/* unpack the exit status */
> +n=1;
> +if (OPAL_SUCCESS != (rc = opal_dss.unpack(buffer, , , 
> OPAL_INT))) {
> +ORTE_ERROR_LOG(rc);
> +status = ORTE_ERROR_DEFAULT_EXIT_CODE;
> +ORTE_UPDATE_EXIT_STATUS(ORTE_ERROR_DEFAULT_EXIT_CODE);
> +} else {
> +ORTE_UPDATE_EXIT_STATUS(WEXITSTATUS(status));
> +}
> +
> +/* find the daemon and update its state/status */
> +if (NULL == (daemon = 
> (orte_proc_t*)opal_pointer_array_get_item(jdatorted->procs, vpid))) {
> +ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
> +goto finish;
> +}
> +daemon->state = ORTE_PROC_STATE_FAILED_TO_START;
> +daemon->exit_code = status;
> +
> + finish:
> +ORTE_ACTIVATE_PROC_STATE(>name, ORTE_PROC_STATE_FAILED_TO_START);
> +}
> +
> int orte_plm_base_setup_orted_cmd(int *argc, char ***argv)
> {
> int i, loc;
> 
> Modified: trunk/orte/mca/plm/base/plm_base_receive.c
> ==
> --- trunk/orte/mca/plm/base/plm_base_receive.cWed Jul 25 12:32:51 
> 2012(r26867)
> +++ trunk/orte/mca/plm/base/plm_base_receive.c2012-07-25 17:46:45 EDT 
> (Wed, 25 Jul 2012)  (r26868)
> @@ -87,6 +87,12 @@
>   
> orte_plm_base_daemon_callback, NULL))) {
> ORTE_ERROR_LOG(rc);
> }
> +if (ORTE_SUCCESS != (rc = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD,
> +  
> ORTE_RML_TAG_REPORT_REMOTE_LAUNCH,
> +  
> ORTE_RML_PERSISTENT,
> +  
> orte_plm_base_daemon_failed, NULL))) {
> +ORTE_ERROR_LOG(rc);
> +}
> }
> recv_issued = true;
> 
> 
> Modified: trunk/orte/mca/plm/base/plm_private.h
> ==
> ---

[OMPI devel] The hostfile option

2012-07-27 Thread George Bosilca

I'm somewhat puzzled by the behavior of the -hostfile in Open MPI. Based on the 
FAQ it is supposed to provide a list of resources to be used by the launcher 
(in my case ssh) to start the processes. Make sense so far.

However, if the configuration file contain a value for orte_default_hostfile, 
then the behavior of the hostfile option change drastically, and the option 
become a filter (the machines must be on the original list or a cryptic error 
message is displayed).

Overall, we have a well defined [mostly] consistent behavior for parameters in 
Open MPI. We have an order of precedence of sources of MCA parameters, clearly 
defined which make understanding where a value comes straightforward. I'm 
absolutely certain there was a group discussion about this unique 
"eccentricity" regarding the hostfile option, but I fail to remember what was 
the reason we decided to go this way. Can I have a quick refresh please?

Thanks,
 george.

Re: [OMPI devel] The hostfile option

2012-07-30 Thread George Bosilca

I think that as long as there is a single home area per cluster the difference 
between the different approaches might seem irrelevant to most of the people.

My problem is twofold. First, I have a common home area across several 
different development clusters. Thus I have direct access through ssh to any 
machine. If I create a single large machinefile, it turns out that every mpirun 
will spawn a daemon on every single node, even if I only run a ping-pong test. 
Second, while I usually run my apps on the same set of resources I need on a 
regular base to switch my nodes for few tests.

What I was hoping to achieve is a machinefile containing the "default" 
development cluster (aka. the cluster where I'm almost alone so my deamons have 
minimal chances to disturb other people experiences), and then use a 
machinefile to sporadicly change the cluster where I run for smaller tests. 
Unfortunately, this doesn't work due to the filtering behavior described in my 
original email.

  george.


On Jul 28, 2012, at 19:24 , Ralph Castain wrote:

> It's been awhile, but I vaguely remember the discussion. IIRC, the rationale 
> was that the default hostfile was equivalent to an RM allocation and should 
> be treated the same. So hostfile and -host become filters in that case.
> 
> FWIW, I believe the discussion was split on that question. I added a "none" 
> option to the default hostfile MCA param so it would be ignored in the case 
> where (a) the sys admin has given a default hostfile, but (b) someone wants 
> to use hosts outside of it.
> 
>MCA orte: parameter "orte_default_hostfile" (current value: 
> , data source: default value)
>  Name of the default hostfile (relative or absolute 
> path, "none" to ignore environmental or default MCA param setting)
> 
> That said, I can see a use-case argument for behaving somewhat differently. 
> We've even had cases where users have gotten an allocation from an RM, but 
> want to add hosts that are external to the cluster to the job.
> 
> It would be rather trivial to modify the logic:
> 
> 1. read the default hostfile or RM allocation for our baseline
> 
> 2. remove any hosts on that list that are *not* in the given hostfile
> 
> 3. add any hosts that are in the given hostfile, but weren't in the default 
> hostfile
> 
> And subsequently do the same for -host. I think that would retain the spirit 
> of the discussion, but provide more flexibility and provide a tad more 
> "expected" behavior.
> 
> I don't have an iron in this fire as I don't use hostfiles, so I'm happy to 
> implement whatever the community would like to see.
> Ralph
> 
> On Jul 27, 2012, at 6:30 PM, George Bosilca wrote:
> 
>> I'm somewhat puzzled by the behavior of the -hostfile in Open MPI. Based on 
>> the FAQ it is supposed to provide a list of resources to be used by the 
>> launcher (in my case ssh) to start the processes. Make sense so far.
>> 
>> However, if the configuration file contain a value for 
>> orte_default_hostfile, then the behavior of the hostfile option change 
>> drastically, and the option become a filter (the machines must be on the 
>> original list or a cryptic error message is displayed).
>> 
>> Overall, we have a well defined [mostly] consistent behavior for parameters 
>> in Open MPI. We have an order of precedence of sources of MCA parameters, 
>> clearly defined which make understanding where a value comes 
>> straightforward. I'm absolutely certain there was a group discussion about 
>> this unique "eccentricity" regarding the hostfile option, but I fail to 
>> remember what was the reason we decided to go this way. Can I have a quick 
>> refresh please?
>> 
>> Thanks,
>> george.
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] The hostfile option

2012-07-31 Thread George Bosilca


On Jul 30, 2012, at 15:29 , Ralph Castain wrote:

> 
> On Jul 30, 2012, at 2:37 AM, George Bosilca wrote:
> 
>> I think that as long as there is a single home area per cluster the 
>> difference between the different approaches might seem irrelevant to most of 
>> the people.
> 
> Yeah, I agree - after thinking about it, it probably didn't accomplish much.
> 
>> 
>> My problem is twofold. First, I have a common home area across several 
>> different development clusters. Thus I have direct access through ssh to any 
>> machine. If I create a single large machinefile, it turns out that every 
>> mpirun will spawn a daemon on every single node, even if I only run a 
>> ping-pong test.
> 
> That shouldn't happen if you specify the hosts you want to use, either via 
> -host or -hostfile. I assume you are specifying nothing and so you get that 
> behavior?
> 
>> Second, while I usually run my apps on the same set of resources I need on a 
>> regular base to switch my nodes for few tests.
>> 
>> What I was hoping to achieve is a machinefile containing the "default" 
>> development cluster (aka. the cluster where I'm almost alone so my deamons 
>> have minimal chances to disturb other people experiences), and then use a 
>> machinefile to sporadicly change the cluster where I run for smaller tests. 
>> Unfortunately, this doesn't work due to the filtering behavior described in 
>> my original email.
> 
> Why not just set the default hostfile to point to the new machinefile via the 
> "--default-hostfile foo" option to mpirun, or you can use the corresponding 
> MCA param?

I confirm, if instead of -machinefile I use --default-hostfile I get the 
behavior I expected (it overwrites the default).

> I'm not trying to re-open the hostfile discussion, but I would be interested 
> to hear how you feel -hostfile should work. I kinda gather you feel it should 
> override the default hostfile instead of filter it, yes? My point being that 
> I don't particularly know if anyone would disagree with that behavior, so we 
> might decide to modify things if you want to propose it.

Right, I would have expected to work in the same way as almost all the other 
MCA parameters, by overwriting the less variants with less priority. But I 
don't mind typing "--default-hostfile" instead of "-machinefile" to get the 
behavior I like.

  george.

> 
> Ralph
> 
> 
>> 
>> george.
>> 
>> 
>> On Jul 28, 2012, at 19:24 , Ralph Castain wrote:
>> 
>>> It's been awhile, but I vaguely remember the discussion. IIRC, the 
>>> rationale was that the default hostfile was equivalent to an RM allocation 
>>> and should be treated the same. So hostfile and -host become filters in 
>>> that case.
>>> 
>>> FWIW, I believe the discussion was split on that question. I added a "none" 
>>> option to the default hostfile MCA param so it would be ignored in the case 
>>> where (a) the sys admin has given a default hostfile, but (b) someone wants 
>>> to use hosts outside of it.
>>> 
>>>  MCA orte: parameter "orte_default_hostfile" (current value: 
>>> , data source: default value)
>>>Name of the default hostfile (relative or absolute 
>>> path, "none" to ignore environmental or default MCA param setting)
>>> 
>>> That said, I can see a use-case argument for behaving somewhat differently. 
>>> We've even had cases where users have gotten an allocation from an RM, but 
>>> want to add hosts that are external to the cluster to the job.
>>> 
>>> It would be rather trivial to modify the logic:
>>> 
>>> 1. read the default hostfile or RM allocation for our baseline
>>> 
>>> 2. remove any hosts on that list that are *not* in the given hostfile
>>> 
>>> 3. add any hosts that are in the given hostfile, but weren't in the default 
>>> hostfile
>>> 
>>> And subsequently do the same for -host. I think that would retain the 
>>> spirit of the discussion, but provide more flexibility and provide a tad 
>>> more "expected" behavior.
>>> 
>>> I don't have an iron in this fire as I don't use hostfiles, so I'm happy to 
>>> implement whatever the community would like to see.
>>> Ralph
>>> 
>>> On Jul 27, 2012, at 6:30 PM, George Bosilca wrote:
>>> 
>>>> I'm somewhat puzzled by the behavior of the -hostfile in Open MPI. Based 
>>>> on the FAQ it is supposed to provide a list of reso

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27161 - trunk/orte/mca/grpcomm/base

2012-08-30 Thread George Bosilca

A strange race condition happening for undisclosed reasons, and only fixable by 
replication is jeopardizing our reference count system. That sounds 
definitively almost scary (!) 

I think that the proposed solution is just a band-aid. It somehow fixes this 
particular instance of the issue but leave all the others unpatched, asking for 
troubles later on. This problem has been lingering around for years, but we 
failed to address it correctly up to now.

Based on my understanding of the code the problem is not with the ref count but 
with the way opal_buffer_t is handled. We have no way to retrieve the pointer 
where the data in the opal_buffer_t is stored without a destructive operation. 
This means every time we need to have the pointer of the opal_buffer_t (like in 
the send operation to build the iovecs), we have to do a load followed by an 
unload, leaving the opal_buffer_t uninitialized for a short amount of time. As 
a result it is completely unsafe to use the same opal_buffer_t concurrently for 
multiple operations, as some callbacks can find the buffer uninitialized when 
they fire.

Now regarding the patch itself, I have to congratulate the Open MPI community 
for its unbelievable response time. A solution proposed, then tested on the 
faulty platforms, then the code carefully reviewed and finally pushed in a 
stable branch all in a mere 43 minutes (!). It shows that all the protection 
mechanism we put in place around our stable branches are entirely functional 
and their role is completely fulfilled. I doubt any other open source project 
can claim such a feat. Congratulations!

commit in the trunk @ Timestamp: 08/28/12 13:17:34 (6 hours ago)
commit in the 1.7   @ Timestamp: 08/28/12 14:00:10 (5 hours ago)

  george.

On Aug 28, 2012, at 19:17 , svn-commit-mai...@open-mpi.org wrote:

> Author: rhc (Ralph Castain)
> Date: 2012-08-28 13:17:34 EDT (Tue, 28 Aug 2012)
> New Revision: 27161
> URL: https://svn.open-mpi.org/trac/ompi/changeset/27161
> 
> Log:
> Fix a strange race condition by creating a separate buffer for each send - 
> apparently, just a retain isn't enough protection on some systems

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27161 - trunk/orte/mca/grpcomm/base

2012-08-30 Thread George Bosilca


On Aug 30, 2012, at 17:15 , Ralph Castain <r...@open-mpi.org> wrote:

> 
> On Aug 30, 2012, at 8:10 AM, George Bosilca <bosi...@eecs.utk.edu> wrote:
> 
>> A strange race condition happening for undisclosed reasons, and only fixable 
>> by replication is jeopardizing our reference count system. That sounds 
>> definitively almost scary (!) 
>> 
>> I think that the proposed solution is just a band-aid. It somehow fixes this 
>> particular instance of the issue but leave all the others unpatched, asking 
>> for troubles later on. This problem has been lingering around for years, but 
>> we failed to address it correctly up to now.
>> 
>> Based on my understanding of the code the problem is not with the ref count 
>> but with the way opal_buffer_t is handled. We have no way to retrieve the 
>> pointer where the data in the opal_buffer_t is stored without a destructive 
>> operation. This means every time we need to have the pointer of the 
>> opal_buffer_t (like in the send operation to build the iovecs), we have to 
>> do a load followed by an unload, leaving the opal_buffer_t uninitialized for 
>> a short amount of time. As a result it is completely unsafe to use the same 
>> opal_buffer_t concurrently for multiple operations, as some callbacks can 
>> find the buffer uninitialized when they fire.
> 
> That is correct - and yes, it is a bandaid. Fixing the opal_buffer_t 
> situation is a much bigger issue that will require more time and effort than 
> we had at the moment.

I feel compelled to acknowledge the clearness of the commit message explaining 
the real reason behind the bandaid. And of course the fact that the community 
was made aware about this critical issue, especially now that we have an event 
based runtime and people are toying with multiple threads (every ingredient is 
here to have this issue come back more often).

Anyway, I guess as long as you were aware of the issue, the community doesn't 
have to take any further action nor be informed about. A proper software design 
will certainly be proposed in a timely manner.

  george.


>> Now regarding the patch itself, I have to congratulate the Open MPI 
>> community for its unbelievable response time. A solution proposed, then 
>> tested on the faulty platforms, then the code carefully reviewed and finally 
>> pushed in a stable branch all in a mere 43 minutes (!). It shows that all 
>> the protection mechanism we put in place around our stable branches are 
>> entirely functional and their role is completely fulfilled. I doubt any 
>> other open source project can claim such a feat. Congratulations!
> 
> As always, George - thanks for your positive, inspirational attitude. I'm 
> sure we all truly appreciate your input.
> 
> 
>> 
>> commit in the trunk @ Timestamp: 08/28/12 13:17:34 (6 hours ago)
>> commit in the 1.7   @ Timestamp: 08/28/12 14:00:10 (5 hours ago)
>> 
>> george.
>> 
>> On Aug 28, 2012, at 19:17 , svn-commit-mai...@open-mpi.org wrote:
>> 
>>> Author: rhc (Ralph Castain)
>>> Date: 2012-08-28 13:17:34 EDT (Tue, 28 Aug 2012)
>>> New Revision: 27161
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27161
>>> 
>>> Log:
>>> Fix a strange race condition by creating a separate buffer for each send - 
>>> apparently, just a retain isn't enough protection on some systems
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: hwloc object userdata

2012-10-03 Thread George Bosilca

In the case such a functionality become necessary, I would suggest we use a 
mechanism similar to the attributes in MPI (but without the multi-language 
mess). That will allow whoever want to attach data to a hwloc node, to do it 
without the mess of dealing with reserving a slot. It might require a little 
bit more memory, but so far the number of nodes in the HWLOC data is limited.

  george.

On Oct 3, 2012, at 16:13 , Jeff Squyres  wrote:

> WHAT: allowing multiple entities in the OMPI code base to hang data off 
> hwloc_obj->userdata
> 
> WHY: anticipating that more parts of the OMPI code base will be using the 
> hwloc data
> 
> WHERE: hwloc base
> 
> WHEN: no real hurry; Ralph and I just identified the potential for this issue 
> this morning.  We're not aware of it being an actual problem (yet).
> 
> MORE DETAIL:
> 
> The rmaps base (in mpirun) is currently hanging its own data off various 
> objects in the hwloc topology tree.  However, it should be noted that the 
> hwloc topology tree is a global data structure in each MPI processes; 
> multiple upper-level entities in the ORTE and OMPI layers may want to hang 
> their own userdata off hwloc objects.
> 
> Ralph and I figured that some functionality could be added to the hwloc base 
> to hang a opal_pointer_array off each hwloc object; each array value will be 
> a (void*).  Then upper-level entities can reserve a slot in all the pointer 
> arrays and store whatever they want in their (void*) slot.
> 
> For example, if the openib BTL wants to use the hwloc data and hang its own 
> userdata off hwloc objects, it can call the hwloc base and reserve a slot.  
> The hwloc base will say "Ok, you can have slot 7".  Then the openib BTL can 
> always use slot 7 in the opal_pointer_array off any hwloc object.
> 
> Does this sound reasonable?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] MPI_Reduce Hangs in my Application

2012-10-10 Thread George Bosilca

Your code works for me on two platforms. Thus, I guess the problem is with the 
communication layer (BTL) is Open MPI. What network do you use? If Ethernet how 
many interfaces?

  Thanks,
george.

On Oct 10, 2012, at 09:30 , Santhosh Kokala  
wrote:

> I have a problem with my MPI code, it hangs when the code is run on multiple 
> nodes. It successfullycompletes when run on a single node. I am not sure how 
> to debug this. Can someone help me debug this issue?
> Program Usage:
> 
> mpicc -o string string.cpp
> mpirun -np 4 -npernode 2 -hostfile hosts ./string 12 0.1 0.9 10 2
>  
> MPI_Reduce Hangs in 2nd iteration: (Output cout statements from my program)
>  
> 1st Iteration (Timestep 1)
> -
> 0 Waiting for MPI_Reduce()
> 0 Done Waiting for MPI_Reduce()
>  
> 1 Waiting for MPI_Reduce()
> 1 Done Waiting for MPI_Reduce()
>  
> 2 Waiting for MPI_Reduce()
> 2 Done Waiting for MPI_Reduce()
>  
> 3 Waiting for MPI_Reduce()
> 3 Done Waiting for MPI_Reduce()
>  
> 0 Sending to right  task  = 1
> 0 Receiving from right task   = 1
>  
> 1 Receiving from left task   = 0
> 1 Sending to left task   = 0
>  
> 1 Sending to right  task  = 2
> 1 Receiving from right task   = 2
>  
>  
> 2 Receiving from left task   = 1
> 2 Sending to left task   = 1
>  
> 2 Sending to right  task  = 3
> 2 Receiving from right task   = 3
>  
> 3 Receiving from left task   = 2
> 3 Sending to left task   = 2
>  
>  
>  
> 2nd Iteration (Timestep 2)
> -
> 0 Waiting for MPI_Reduce()
>  
> 1 Waiting for MPI_Reduce()
> 1 Done Waiting for MPI_Reduce()
>  
> 2 Waiting for MPI_Reduce()
>  
> 3 Waiting for MPI_Reduce()
>  
>  
>  
> My Code:
>  
> #include 
> #include 
> #include 
> #include 
> #include "mpi.h"
>  
> #define MASTER 0
> int RtoL = 10;
> int LtoR = 20;
>  
> int main ( int argc, char **argv )
> {
> int nprocs, taskid;
> FILE *f = NULL;
> int left, right, i_start, i_end;
> float sum = 0;
> MPI_Status status;
> float *y, *yold;
> float *v, *vold;
>  
> //  const int NUM_MASSES = 1000;
> //  const float Ktension = 0.1;
> //  const float Kdamping = 0.9;
> //  const float duration = 10.0;
>  
> #if 0
> if ( argc != 5 ) {
> std::cout << "usage: " << argv[0] << " NUM_MASSES durationInSecs 
> Ktension Kdamping\n";
> return 2;
> }
> #endif
>  
> int NUM_MASSES  = atoi ( argv[1] );
> float duration = atof ( argv[2] );
> float Ktension = atof ( argv[3] );
> float Kdamping = atof ( argv[4] );
> const int PICKUP_POS = NUM_MASSES / 7;
> const int OVERSAMPLING = 16;
>  
> MPI_Init(,);
> MPI_Comm_size(MPI_COMM_WORLD,);
> MPI_Comm_rank(MPI_COMM_WORLD,);
>  
> if (taskid  == 0) {
> f = fopen ( "rstring.raw", "wb" );
> if (!f) {
> std::cout << "can't open output file\n";
> return 1;
> }
> }
>  
> y = new float[NUM_MASSES];
> yold = new float[NUM_MASSES];
> v = new float[NUM_MASSES];
>  
> for (int i = 0; i < NUM_MASSES; i++ ) {
> v[i]  = 0.0f;
> yold[i] = y[i] = 0.0f;
> if (i == NUM_MASSES/2 )
> yold[i] = 1.0;
> }
>  
> if (taskid == 0) {
> left = -1;
> right = 1;
> } else if (taskid == nprocs - 1) {
> left = taskid - 1;
> right = -1;
> } else {
> left = taskid - 1;
> right = taskid + 1;
> }
>  
> i_start = taskid * (NUM_MASSES/nprocs);
> i_end = i_start + (NUM_MASSES/nprocs);
>  
> int numIters = duration * 44100 * OVERSAMPLING;;
> if (argc == 6) {
> numIters = atoi(argv[5]);
> }
>  
> for ( int t = 0; t < numIters; t++ ) {
> float sum = 0;
> float gsum = 0;
>  
> for ( int i = i_start; i < i_end; i++ ) {
> if ( i == 0 || i == NUM_MASSES-1 ) {
> } else {
> float accel = Ktension * (yold[i+1] + yold[i-1] - 2*yold[i]);
> v[i] += accel;
> v[i] *= Kdamping;
> y[i] = yold[i] + v[i];
> sum += y[i];
> }
> }
>  
> std::cout << taskid << " Waiting for MPI_Reduce()" << std::endl;
> MPI_Reduce(, , 1, MPI_FLOAT, MPI_SUM, MASTER, 
> MPI_COMM_WORLD);
> std::cout << taskid << " Done Waiting for MPI_Reduce()" << std::endl;
>  
> if (taskid != 0) {
> MPI_Recv([i_start-1], 1, MPI_FLOAT, left, LtoR, MPI_COMM_WORLD, 
> );
> std::cout << taskid << " Receiving from left task   = " << left 
> << std::endl;
> MPI_Send([i_start],   1, MPI_FLOAT, left, RtoL, MPI_COMM_WORLD);
> std::cout << taskid << " Sending to left task   = " << left 
> << std::endl;
> }
> if (taskid != nprocs - 1) {
> MPI_Send([i_end-1],1, MPI_FLOAT, right, LtoR, MPI_COMM_WORLD);
>

Re: [OMPI devel] MPI_Reduce Hangs in my Application

2012-10-10 Thread George Bosilca

I guess the TCP BTL gets confused by your virtual interfaces (vmnet?). Try to 
limit the used interfaces using the "--mca btl_tcp_if_include eth0" argument. 
Let us know if this solves your issue.

  Thanks,
george.


On Oct 10, 2012, at 18:54 , Santhosh Kokala <santhosh.kok...@riverbed.com> 
wrote:

> George,
> I am using each host with  4 interfaces including loopback interface. Can you 
> please let me know more about your environment?
>  
> eth0  Link encap:Ethernet  HWaddr bc:30:5b:db:ae:6f
>   inet addr:xxx.xxx.xxx.134  Bcast:xxx.xxx.xxx.255  Mask:255.255.255.0
>   inet6 addr: fe80::be30:5bff:fedb:ae6f/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:1375598 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:709644 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:1431654357 (1.4 GB)  TX bytes:69604165 (69.6 MB)
>   Interrupt:17
>  
> loLink encap:Local Loopback
>   inet addr:127.0.0.1  Mask:255.0.0.0
>   inet6 addr: ::1/128 Scope:Host
>   UP LOOPBACK RUNNING  MTU:16436  Metric:1
>   RX packets:944 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:944 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:0
>   RX bytes:264692 (264.6 KB)  TX bytes:264692 (264.6 KB)
>  
> vmnet1Link encap:Ethernet  HWaddr 00:50:56:c0:00:01
>   inet addr: xxx.xxx.xxx.1  Bcast: xxx.xxx.xxx.255  Mask:255.255.255.0
>   inet6 addr: fe80::250:56ff:fec0:1/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:245 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
>  
> vmnet8Link encap:Ethernet  HWaddr 00:50:56:c0:00:08
>   inet addr: xxx.xxx.xxx..1  Bcast: xxx.xxx.xxx..255  
> Mask:255.255.255.0
>   inet6 addr: fe80::250:56ff:fec0:8/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:58357 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:238 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>       RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
>  
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of George Bosilca
> Sent: Wednesday, October 10, 2012 4:41 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] MPI_Reduce Hangs in my Application
>  
> Your code works for me on two platforms. Thus, I guess the problem is with 
> the communication layer (BTL) is Open MPI. What network do you use? If 
> Ethernet how many interfaces?
>  
>   Thanks,
> george.
>  
> On Oct 10, 2012, at 09:30 , Santhosh Kokala <santhosh.kok...@riverbed.com> 
> wrote:
> 
> 
> I have a problem with my MPI code, it hangs when the code is run on multiple 
> nodes. It successfullycompletes when run on a single node. I am not sure how 
> to debug this. Can someone help me debug this issue?
> Program Usage:
> 
> mpicc -o string string.cpp
> mpirun -np 4 -npernode 2 -hostfile hosts ./string 12 0.1 0.9 10 2
>  
> MPI_Reduce Hangs in 2nd iteration: (Output cout statements from my program)
>  
> 1st Iteration (Timestep 1)
> -
> 0 Waiting for MPI_Reduce()
> 0 Done Waiting for MPI_Reduce()
>  
> 1 Waiting for MPI_Reduce()
> 1 Done Waiting for MPI_Reduce()
>  
> 2 Waiting for MPI_Reduce()
> 2 Done Waiting for MPI_Reduce()
>  
> 3 Waiting for MPI_Reduce()
> 3 Done Waiting for MPI_Reduce()
>  
> 0 Sending to right  task  = 1
> 0 Receiving from right task   = 1
>  
> 1 Receiving from left task   = 0
> 1 Sending to left task   = 0
>  
> 1 Sending to right  task  = 2
> 1 Receiving from right task   = 2
>  
>  
> 2 Receiving from left task   = 1
> 2 Sending to left task   = 1
>  
> 2 Sending to right  task  = 3
> 2 Receiving from right task   = 3
>  
> 3 Receiving from left task   = 2
> 3 Sending to left task   = 2
>  
>  
>  
> 2nd Iteration (Timestep 2)
> -
> 0 Waiting for MPI_Reduce()
>  
> 1 Waiting for MPI_Reduce()
> 1 Done Waiting for MPI_Reduce()
>  
> 2 Waiting for MPI_Reduce()
>  
> 3 Waiting for MPI_Reduce()
>  
>  
>  
> My Code:
>  
> #include 
> #include 
> #include 
> #include 
> #include "mpi.h"
>  
> #define MASTER 0
> int RtoL = 10;
> int LtoR

Re: [OMPI devel] [patch] Invalid MPI_Status for null or inactive request

2012-10-15 Thread George Bosilca

Takahiro,

I fail to see the cases your patch addresses. I recognize I did not have the 
time to look over all the instances where we deal with persistent inactive 
requests, but at the first occurrence, the one in req_test.c line 68, the case 
you exhibit there is already covered by the test "request->req_state == 
OMPI_REQUEST_INACTIVE". I see similar checks in all the other test/wait files. 
Basically, it doesn't matter that we leave the last returned error code on an 
inactive request, as we always return MPI_STATUS_EMPTY in the status for such 
requests.

Thanks,
  george.


On Oct 15, 2012, at 07:02 , "Kawashima, Takahiro"  
wrote:

> Hi Open MPI developers,
> 
> How is my updated patch?
> If there is an another concern, I'll try to update it.
> 
> The bugs are:
> 
> (1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE.
> 
> (2) MPI_Status for an inactive request must be an empty status.
> 
> (3) Possible BUS errors on sparc64 processors.
> 
>  r23554 fixed possible BUS errors on sparc64 processors.
>  But the fix seems to be insufficient.
> 
>  We should use OMPI_STATUS_SET macro for all user-supplied
>  MPI_Status objects.
 Regarding #3, see also a trac 3218. I'm putting a fix back today. Sorry
 for the delay. One proposed solution was extending the use of the
 OMPI_STATUS_SET macros, but I think the consensus was to fix the problem
 in the Fortran layer. Indeed, the Fortran layer already routinely
 converts between Fortran and C statuses. The problem was that we started
 introducing optimizations to bypass the Fortran-to-C conversion and that
 optimization was employed too liberally (e..g, in situations that would
 introduce the alignment errors you're describing). My patch will clean
 that up. I'll try to put it back in the next few hours.
>>> 
>>> Sorry, I didn't notice the ticket 3218.
>>> Now I've confirmed your commit r27403.
>>> Your modification is better for my issue (3).
>>> 
>>> With r27403, my patch for issue (1) and (2) needs modification.
>>> I'll re-send modified patch in a few hours.
>> 
>> The updated patch is attached.
>> This patch addresses bugs (1) and (2) in my previous mail
>> and fixes some typos in comments.
> 
> Regards,
> 
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Cross Memory Attach: What am I Missing?

2012-10-18 Thread George Bosilca

Check the permissions granted by pam. Look in the /etc/security to check for 
any type of restrictions.

  george.


On Oct 17, 2012, at 23:30 , "Gutierrez, Samuel K"  wrote:

> Hi,
> 
> I'm trying to run with CMA support, but process_vm_readv is failing with 
> EPERM when trying to use it as a regular user (everything seems to work fine 
> as root). I've looked around for some solutions, but I can't seem to find 
> what I'm looking for. The documentation states that the target and source 
> processes need to have the same GID and UID to work properly. It appears that 
> they do, so my feeling is that I'm missing something.
> 
> Any help is greatly appreciated.
> 
> Thanks,
> 
> Sam
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [patch] SEGV on processing unexpected messages

2012-10-18 Thread George Bosilca

Takahiro,

Nice catch. A nicer fix will be to check the type of the header, and copy the 
header accordingly. Attached is a patch following this idea.

  Thanks,
george.



hdr_copy.patch
Description: Binary data

On Oct 18, 2012, at 03:06 , "Kawashima, Takahiro"  
wrote:

> Hi Open MPI developers,
> 
> I found another issue in Open MPI.
> 
> In MCA_PML_OB1_RECV_FRAG_INIT macro in ompi/mca/pml/ob1/pml_ob1_recvfrag.h
> file, we copy a PML header from an arrived message to another buffer,
> as follows:
> 
>frag->hdr = *(mca_pml_ob1_hdr_t*)hdr;
> 
> On this copy, we cast hdr to mca_pml_ob1_hdr_t, which is a union
> of all actual header structs such as mca_pml_ob1_match_hdr_t.
> This means we copy the buffer of the size of the largest header
> even if the arrived message is smaller than it. This can cause
> SEGV if the arrived message is small and it is laid on the bottom
> of the page. Actually, my tofu BTL, the BTL component of Fujitsu
> MPI for K computer, suffered from this.
> 
> The attached patch will be one of possible fixes for this issue.
> This fix assume that the arrived header has at least segs[0].seg_len
> bytes. This is always true for current Open MPI code because hdr
> equals to segs[0].seg_addr.pval. There may exist a smarter fix.
> 
> Regards,
> 
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27451 - in trunk: ompi/mca/allocator/bucket ompi/mca/bcol/basesmuma ompi/mca/bml/base ompi/mca/btl ompi/mca/btl/base ompi/mca/btl/openib ompi/mca/btl/sm ompi/

2012-10-24 Thread George Bosilca

I have some issues starting my applications lately. Here is an example:

mpirun -x LD_LIBRARY_PATH -np 8 -hostfile /etc/hostfile -bynode  
./testing_dtrmm -N 4000 -p 4 -x

And the corresponding output:
/home/bosilca/opt/trunk/debug/bin/orted: Error: unknown option "-p"

And then the daemons segfault …

  george.


On Oct 22, 2012, at 17:43 , Ralph Castain  wrote:

> Ah, I see - I checked and I don't see any unusual behavior in ompi_info or in 
> picking up the params in components, so it looks like it didn't matter.
> 
> 
> On Oct 22, 2012, at 8:04 AM, "Hjelm, Nathan T"  wrote:
> 
>> The change was due to the removal of the deprecated functions in 
>> mca_base_param. Comments in the command line parser suggested the change was 
>> expected eventually:
>> 
>> 179  struct opal_cmd_line_init_t {
>> 180  /** If want to set an MCA parameter, set its type name here.
>> 181  WARNING: This MCA tuple (type, component, param) will
>> 182  eventually be replaced with a single name! */
>> 183  const char *ocl_mca_type_name;
>> 184  /** If want to set an MCA parameter, set its component name
>> 185  here.  WARNING: This MCA tuple (type, component, param)
>> 186  will eventually be replaced with a single name! */
>> 187  const char *ocl_mca_component_name;
>> 188  /** If want to set an MCA parameter, set its parameter name
>> 189  here.  WARNING: This MCA tuple (type, component, param)
>> 190  will eventually be replaced with a single name! */
>> 191  const char *ocl_mca_param_name;
>> 
>> -Nathan
>> 
>> 
>> From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] on behalf of 
>> Ralph Castain [r...@open-mpi.org]
>> Sent: Thursday, October 18, 2012 5:48 PM
>> To: de...@open-mpi.org
>> Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r27451 - in trunk:
>> ompi/mca/allocator/bucket ompi/mca/bcol/basesmuma   ompi/mca/bml/base 
>> ompi/mca/btl ompi/mca/btl/baseompi/mca/btl/openib ompi/mca/btl/sm 
>> ompi/mca/btl/smcuda ompi/mca/btl/template ompi/mca/btl/va...
>> 
>> Hmmm...this didn't just remove deprecated functions. It actually changed the 
>> way the cmd line parser works. Was that intentional?
>> 
>> I haven't fully grok'd what that did to us, but wonder if the change was 
>> intentional or just got caught in the commit?
>> 
>> 
>> On Oct 17, 2012, at 1:17 PM, svn-commit-mai...@open-mpi.org wrote:
>> 
>>> Author: hjelmn (Nathan Hjelm)
>>> Date: 2012-10-17 16:17:37 EDT (Wed, 17 Oct 2012)
>>> New Revision: 27451
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27451
>>> 
>>> Log:
>>> MCA: remove deprecated mca_base_param functions 
>>> (mca_base_param_register_int, mca_base_param_register_string, 
>>> mca_base_param_environ_variable). Remove all uses of deprecated functions.
>>> cmr:v1.7
>>> 
>>> Text files modified:
>>> trunk/ompi/mca/allocator/bucket/allocator_bucket.c | 5
>>> trunk/ompi/mca/bcol/basesmuma/bcol_basesmuma_component.c   | 7
>>> trunk/ompi/mca/bml/base/bml_base_open.c|22 
>>> +--
>>> trunk/ompi/mca/btl/base/btl_base_open.c|10 
>>> +-
>>> trunk/ompi/mca/btl/btl.h   | 4
>>> trunk/ompi/mca/btl/openib/btl_openib_component.c   | 8 +
>>> trunk/ompi/mca/btl/sm/btl_sm_component.c   |14 
>>> ++
>>> trunk/ompi/mca/btl/smcuda/btl_smcuda_component.c   |14 
>>> ++
>>> trunk/ompi/mca/btl/template/btl_template_component.c   |14 
>>> ++
>>> trunk/ompi/mca/btl/vader/btl_vader_component.c |14 
>>> +-
>>> trunk/ompi/mca/btl/wv/btl_wv_component.c   | 6
>>> trunk/ompi/mca/coll/demo/coll_demo_component.c | 8 +
>>> trunk/ompi/mca/coll/ml/coll_ml_component.c |19 
>>> ---
>>> trunk/ompi/mca/coll/self/coll_self_component.c | 3
>>> trunk/ompi/mca/pml/base/pml_base_bsend.c   | 5
>>> trunk/ompi/mca/pml/bfo/pml_bfo_component.c | 6
>>> trunk/ompi/mca/pml/csum/pml_csum_component.c   | 6
>>> trunk/ompi/mca/pml/dr/pml_dr_component.c   | 6
>>> trunk/ompi/mca/pml/example/pml_example_component.c | 6
>>> trunk/ompi/mca/pml/ob1/pml_ob1_component.c | 6
>>> trunk/ompi/mca/pml/v/pml_v_component.c |12 +
>>> trunk/ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c | 7 +
>>> trunk/ompi/mca/sbgp/basesmuma/sbgp_basesmuma_component.c   | 7 +
>>> trunk/ompi/mca/sbgp/ibnet/sbgp_ibnet_component.c   | 9 -
>>> trunk/ompi/mca/sbgp/p2p/sbgp_p2p_component.c

Re: [OMPI devel] Multirail + Open MPI 1.6.1 = very big latency for the first communication

2012-11-01 Thread George Bosilca

It will depend on the protocol used by the OpenIB BTL to wire up the peers 
(OOB, UDCM, RDMACM). In the worst case (OOB), the connection process will be 
done using TCP. We are looking at a handshake (over TCP 40 ms latency for a 
one-way message is standard, the handshake will take at least 80ms). Moreover, 
we only check the status of the sockets once in a while (to avoid impacting the 
performance), so this should be added to the handshake as well. Plus the time 
to setup the local queues (which should be significantly smaller than all the 
others). The connection time is going up pretty quickly !

  george.

On Oct 31, 2012, at 15:36 , Paul Kapinos  wrote:

> Hello all,
> 
> Open MPI is clever and use by default multiple IB adapters, if available.
> http://www.open-mpi.org/faq/?category=openfabrics#ofa-port-wireup
> 
> Open MPI is lazy and establish connections only iff needed.
> 
> Both is good.
> 
> We have kinda special nodes: up to 16 sockets, 128 cores, 4 boards, 4 IB 
> cards. Multirail works!
> 
> The crucial thing is, that starting with v1.6.1 the latency of the very first 
> PingPong sample between two nodes take really a lot of time - some 100x - 
> 200x of usual latency. You cannot see this using usual latency benchmark(*) 
> because they tend to omit the first samples as "warmup phase", but we use a 
> kinda self-written parallel test which clearly show this (and let me to muse 
> some days).
> If Miltirail is forbidden (-mca btl_openib_max_btls 1), or if v.1.5.3 used, 
> or if the MPI processes are preconnected 
> (http://www.open-mpi.org/faq/?category=running#mpi-preconnect) there is no 
> such huge latency outliers for the first sample.
> 
> Well, we know about the warm-up and lazy connections.
> 
> But 200x ?!
> 
> Any comments about that is OK so?
> 
> Best,
> 
> Paul Kapinos
> 
> (*) E.g. HPCC explicitely say in http://icl.cs.utk.edu/hpcc/faq/index.html#132
> > Additional startup latencies are masked out by starting the measurement 
> > after
> > one non-measured ping-pong.
> 
> P.S. Sorry for cross-posting to both Users and Developers, but my last 
> questions to Users have no reply until yet, so trying to broadcast...
> 
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, Center for Computing and Communication
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: fix frameworks usage of opal_output

2012-11-01 Thread George Bosilca


On Nov 1, 2012, at 16:18 , Nathan Hjelm <hje...@lanl.gov> wrote:

> On Thu, Nov 01, 2012 at 04:07:32PM -0400, George Bosilca wrote:
>> Nathan,
>> 
>> Here is a quick question regarding the topi framework. 
>> 
>> - The mca_topo_base_output is opened unconditionally in topo_base_open.c:62
>> 
>> - with your patch, mca_topo_base_output is closed conditionally in 
>> topo_base_close.c:46, but only in case mca_topo_base_components_opened_valid 
>> and mca_topo_base_components_available_valid are NULL. However, 
>> mca_topo_base_output is set to -1 in all cases right after.
>> 
>> Why is that so?
> 
> mca_base_components_close closes the output if the third argument is NULL. So 
> in this case calling opal_output_close after mca_base_components_close will 
> result in a second call to opal_output_close.

Indeed, the behavior of the mca_base_components_close seems quite weird to me, 
as it lack of consistency.
- the symmetric function (mca_components_open) doesn't open the output stream
- the stream is close but the corresponding variable is not set to a meaningful 
value (-1)
- it force us to have one specific output for each framework.

  george.


> 
>> In fact I think the mca_topo_base_close is entirely wrong. It should close 
>> all mca_topo_base_components_opened_valid component, then all 
>> mca_topo_base_components_available_valid components and then close the 
>> mca_topo_base_output and set it to -1.
> 
> I looked into this as well. The select function for topo OBJ_DESTRUCTs 
> mca_topo_base_components_opened and sets 
> mca_topo_base_components_opened_valid to false. So if 
> mca_topo_base_components_opened_valid is false it isn't safe to call 
> mca_base_components_close. It is a little confusing and I don't know why the 
> author if topo decided to do it that way.
> 
> -Nathan Hjelm
> HPC-3, LANL
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: fix frameworks usage of opal_output

2012-11-01 Thread George Bosilca


On Nov 1, 2012, at 19:07 , Nathan Hjelm  wrote:

> I was going to address this second inconsistency with another patch but now 
> seems like a good time to get a see if anyone has an opinion about how this 
> should be fixed. I can think of two simple fixes:
> 1) Since mca_base_components_open calls OBJ_CONSTRUCT should 
> mca_base_components_close call OBJ_DESTRUCT?
> 2) Should the caller be responsible for both the OBJ_CONSTRUCT and 
> OBJ_DESTRUCT calls?

I'm fine either way, but I do have a slight preference for 1.

>> - it force us to have one specific output for each framework
> 
> This isn't the case at the moment since frameworks can call opal_output_close 
> on any extra output streams. It would be better if frameworks have t close 
> all open output streams using opal_output_close instead of using 
> mca_base_components_close. If we want to change the semantics of 
> mca_base_components_close I can redo this patch. Anyone have an opinion on 
> this?

mca_base_components_close should not close an output stream opened by another 
entity (or if it does the arguments should be changed to int* and it should set 
it to -1). I think that counts as having an opinion ;)

  george.

> 
> -Nathan Hjelm
> HPC-3, LANL
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: fix various leaks in trunk (touches coll/ml, vprotocol, pml/v, btl/openib, and mca/base)

2012-11-05 Thread George Bosilca

+1!

  george.

On Nov 5, 2012, at 18:59 , Jeff Squyres  wrote:

> +1 on the ompi/mca/btl/openib/btl_openib_mca.c and 
> opal/mca/base/mca_base_param.c.
> 
> I didn't check the others.
> 
> 
> On Nov 5, 2012, at 6:31 PM, Nathan Hjelm wrote:
> 
>> What: I used valgrind on ompi_info and found several leaks in the trunk. 
>> This patch fixes some of the leaks.
>> 
>> pml/v:
>> - If vprotocol is not being used vprotocol_include_list is leaked. Assume 
>> vprotocol never takes ownership (see below) and always free the string.
>> 
>> coll/ml:
>> - (patch verified) calling mca_base_param_lookup_string after 
>> mca_base_param_reg_string is unnecessary. The call to 
>> mca_base_param_lookup_string causes the value returned by 
>> mca_base_param_reg_string to be leaked.
>> - Need to free mca_coll_ml_component.config_file_name on component close.
>> 
>> btl/openib:
>> - calling mca_base_param_lookup_string after mca_base_param_reg_string is 
>> unnecessary. The call to mca_base_param_lookup_string causes the value 
>> returned by mca_base_param_reg_string to be leaked.
>> 
>> vprotocol/base:
>> - There was no way for pml/v to determine if vprotocol took ownership of 
>> vprotocol_include_list. Fix by always never ownership (use strdup).
>> 
>> mca/base:
>> - param_lookup will result in storage->stringval to be a newly allocated 
>> string if the mca parameter has a string value. ensure this string is always 
>> freed.
>> 
>> 
>> When: This is a simple patch. Timeout set for tomorrow @ 12:00 PM MST
>> 
>> Why: Always a good idea to clean up all allocated memory. With this patch 
>> and some others I have in the pipeline valgrind no longer reports and 
>> "possibly leaked" or "definitely leaked" blocks in ompi_info.
>> 
>> 
>> -Nathan Hjelm
>> HPC-3, LANL
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27580 - in trunk: ompi/mca/btl/openib ompi/mca/btl/wv ompi/mca/coll/ml opal/util/keyval orte/mca/rmaps/rank_file

2012-12-03 Thread George Bosilca

I remember there were some discussions about lex (or flex) and their version, 
but I don't remember the specifics. Whatever the outcome was, we're back at 
having a problem there, more specifically a missing reference 
(opal_util_keyval_yylex_destroy) which seems to indicate the issue was not 
fixed.

  george.

config.log.bz2
Description: BZip2 compressed data

On Nov 9, 2012, at 23:00 , svn-commit-mai...@open-mpi.org wrote:

> Author: hjelmn (Nathan Hjelm)
> Date: 2012-11-09 17:00:27 EST (Fri, 09 Nov 2012)
> New Revision: 27580
> URL: https://svn.open-mpi.org/trac/ompi/changeset/27580
> 
> Log:
> add prototypes for lex destroy functions
> 
> Text files modified: 
>   trunk/ompi/mca/btl/openib/btl_openib_lex.h   | 1 +  
>  
>   trunk/ompi/mca/btl/wv/btl_wv_lex.h   | 1 +  
>  
>   trunk/ompi/mca/coll/ml/coll_ml_lex.h | 1 +  
>  
>   trunk/opal/util/keyval/keyval_lex.h  | 1 +  
>  
>   trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.h | 2 ++ 
>  
>   5 files changed, 6 insertions(+), 0 deletions(-)

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27580 - in trunk: ompi/mca/btl/openib ompi/mca/btl/wv ompi/mca/coll/ml opal/util/keyval orte/mca/rmaps/rank_file

2012-12-03 Thread George Bosilca

Cool, I'm looking forward. Meanwhile I'll go back and use the 1.6, which 
hopefully compiles.

  Thanks,
george.


On Dec 4, 2012, at 02:34 , "Hjelm, Nathan T" <hje...@lanl.gov> wrote:

> Sorry about that. We are working on a fix that both supports flex 2.5.4 
> (*#*&@ redhat) and cleans up the lex state correctly in modern flex. It 
> should be done in the next day or so.
> 
> -Nathan
> 
> On Monday, December 03, 2012 6:28 PM, devel-boun...@open-mpi.org 
> [devel-boun...@open-mpi.org] on behalf of George Bosilca 
> [bosi...@icl.utk.edu] wrote:
>> To: de...@open-mpi.org
>> Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r27580 - in trunk:
>> ompi/mca/btl/openib ompi/mca/btl/wv ompi/mca/coll/mlopal/util/keyval 
>> orte/mca/rmaps/rank_file
>> 
>> I remember there were some discussions about lex (or flex) and their 
>> version, but I don't remember the specifics. Whatever the outcome was, we're 
>> back at having a problem there, more specifically a missing reference 
>> (opal_util_keyval_yylex_destroy) which seems to indicate the issue was not 
>> fixed.
>> 
>>  george.
>> 
>> 
>> 
>> 
>> On Nov 9, 2012, at 23:00 , svn-commit-mai...@open-mpi.org wrote:
>> 
>>> Author: hjelmn (Nathan Hjelm)
>>> Date: 2012-11-09 17:00:27 EST (Fri, 09 Nov 2012)
>>> New Revision: 27580
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27580
>>> 
>>> Log:
>>> add prototypes for lex destroy functions
>>> 
>>> Text files modified:
>>>  trunk/ompi/mca/btl/openib/btl_openib_lex.h   | 1 +
>>>  trunk/ompi/mca/btl/wv/btl_wv_lex.h   | 1 +
>>>  trunk/ompi/mca/coll/ml/coll_ml_lex.h | 1 +
>>>  trunk/opal/util/keyval/keyval_lex.h  | 1 +
>>>  trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.h | 2 ++
>>>  5 files changed, 6 insertions(+), 0 deletions(-)
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27580 - in trunk: ompi/mca/btl/openib ompi/mca/btl/wv ompi/mca/coll/ml opal/util/keyval orte/mca/rmaps/rank_file

2012-12-04 Thread George Bosilca

This doesn't work on a subversion checkout. However, there is a similar trick 
that seems to work in this case. If one copies the 
opal/util/keyval/keyval_lex.c file from a platform with a recent version of 
flex (2.5.37 in my case) the compilation proceed without issues.

  george.

On Dec 4, 2012, at 02:39 , "Hjelm, Nathan T" <hje...@lanl.gov> wrote:

> Oh, and I don't know if you have tried this but we don't recreate the 
> keyval_lex.c file if it already exists. This allows use to not put a flex 
> requirement on the end user. Have you tried removing 
> opal/util/keyval/keyval_lex.c? If that works you might want to run 
> configure/make from an empty directory.
> 
> -Nathan
> 
> On Monday, December 03, 2012 6:28 PM, devel-boun...@open-mpi.org 
> [devel-boun...@open-mpi.org] on behalf of George Bosilca 
> [bosi...@icl.utk.edu] wrote:
>> To: de...@open-mpi.org
>> Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r27580 - in trunk:
>> ompi/mca/btl/openib ompi/mca/btl/wv ompi/mca/coll/mlopal/util/keyval 
>> orte/mca/rmaps/rank_file
>> 
>> I remember there were some discussions about lex (or flex) and their 
>> version, but I don't remember the specifics. Whatever the outcome was, we're 
>> back at having a problem there, more specifically a missing reference 
>> (opal_util_keyval_yylex_destroy) which seems to indicate the issue was not 
>> fixed.
>> 
>>  george.
>> 
>> 
>> 
>> 
>> On Nov 9, 2012, at 23:00 , svn-commit-mai...@open-mpi.org wrote:
>> 
>>> Author: hjelmn (Nathan Hjelm)
>>> Date: 2012-11-09 17:00:27 EST (Fri, 09 Nov 2012)
>>> New Revision: 27580
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27580
>>> 
>>> Log:
>>> add prototypes for lex destroy functions
>>> 
>>> Text files modified:
>>>  trunk/ompi/mca/btl/openib/btl_openib_lex.h   | 1 +
>>>  trunk/ompi/mca/btl/wv/btl_wv_lex.h   | 1 +
>>>  trunk/ompi/mca/coll/ml/coll_ml_lex.h | 1 +
>>>  trunk/opal/util/keyval/keyval_lex.h  | 1 +
>>>  trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.h | 2 ++
>>>  5 files changed, 6 insertions(+), 0 deletions(-)
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] CRIU checkpoint support in Open-MPI?

2012-12-06 Thread George Bosilca

Samuel,

Yes, all contributions are welcomed. It should be almost trivial to write a new 
backend in Open MPI to support what the kernel developers will agree to add as 
C/R capabilities. A good starting point to look at are the existing modules in 
opal/mca/crs.

  george.


On Dec 6, 2012, at 03:56 , Christopher Samuel  wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Hi folks,
> 
> I don't know if people have seen that the Linux kernel community is
> following its own different checkpoint/restart path to those currently
> supported by OMPI, namely that of the OpenVZ developers
> "checkpoint/restore in user space" project (CRIU).
> 
> You can read more about its current state here:
> 
> https://lwn.net/Articles/525675/
> 
> The CRIU website is here:
> 
> http://criu.org/
> 
> CRIU will also be up for discussion at LCA2013 in Canberra this year
> (though I won't be there):
> 
> http://linux.conf.au/schedule/30116/view_talk?day=thursday
> 
> Is there interest from OMPI in supporting this, given it looks like
> it's quite likely to make it into the mainline kernel?
> 
> Or is better to wait for it to be merged, and then take a look?
> 
> All the best,
> Chris
> - -- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with undefined - http://www.enigmail.net/
> 
> iEYEARECAAYFAlDACXYACgkQO2KABBYQAh8LIQCfagfyZNzK3KVKb+W0etJV4tyL
> AxwAn0z6q7TVNcOTom0tmvy7brfFf4QV
> =SLvF
> -END PGP SIGNATURE-
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27739 - in trunk: ompi/mca/btl/sm ompi/mca/common/sm ompi/mca/mpool/sm opal/mca/shmem opal/mca/shmem/mmap opal/mca/shmem/posix opal/mca/shmem/sysv opal/m

2013-01-04 Thread George Bosilca

Sam,

This is a major change and would have deserved an RFC, as it impose a 
drastic/major non-scalable change (up to now the backend file creation was 
centralized, not in addition we exchange the data through the modex). A quick 
look highlight the fact that quite a lot of new modex entries have appeared 
after this patch. On a 4 proc (2x2) we got more than 20 entries each one of 
them up to 32 bytes (he list is attached at the end of this email).

Clearly this new approach is significantly less scalable compared with the old 
one. In the past we had issues adding one single integer per process, I fail to 
understand how our standards changed so much that now few hundreds bytes per 
process become acceptable. Moreover, what is the benefit this change provides 
in exchange of this loss of scalability?

  George.

PS: The exhaustive list of new SM-related modex entries:
[dancer01:01049] [[50563,1],0] db:hash:store: storing key 
btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer01:01049] [[50563,1],0] db:hash:store: storing key 
btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],0]
[dancer01:01049] [[50563,1],0] db:hash:store: storing key 
btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer01:01049] [[50563,1],0] db:hash:store: storing key 
btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],0]
[dancer01:01049] [[50563,1],0] db:hash:store: storing key 
btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer02:01720] [[50563,1],1] db:hash:store: storing key 
btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
[dancer02:01720] [[50563,1],1] db:hash:store: storing key 
btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],1]
[dancer02:01720] [[50563,1],1] db:hash:store: storing key 
btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
[dancer02:01720] [[50563,1],1] db:hash:store: storing key 
btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],1]
[dancer02:01720] [[50563,1],1] db:hash:store: storing key 
btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
[dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],0]
[dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],0]
[dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],0]
[dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],0]
[dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],0]
[dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],0]
[dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
[dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
[dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],1]
[dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
[dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],1]
[dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
[dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
[dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],1]
[dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
[dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],1]
[dancer02:01721]

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27744 - trunk/ompi/runtime

2013-01-04 Thread George Bosilca

Ralph,

This function now belong to our svn history, and will therefore be resurrected 
as soon as the need for it become essential. Until then, there is no real value 
of having such a function.

  George.

On Jan 4, 2013, at 22:08 , Ralph Castain <r...@open-mpi.org> wrote:

> I guess it's actually the "recv_string_pointer" function that is used for 
> this purpose, but I'd rather not just willy-nilly prune functions out of the 
> code base because they aren't currently used. If we apply that criteria, a 
> lot of functions that are there for future and/or historical reasons would be 
> eliminated - and eventually likely restored.
> 
> I don't see how this function hurt anyone - other than esthetics, is there a 
> reason why this particular function must be removed?
> 
> 
> On Jan 4, 2013, at 1:01 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
>> Whoa - that function is used, I believe, to retrieve the pointer to the 
>> hostname info in the ompi_proc_t
>> 
>> 
>> On Jan 4, 2013, at 12:50 PM, svn-commit-mai...@open-mpi.org wrote:
>> 
>>> Author: bosilca (George Bosilca)
>>> Date: 2013-01-04 15:50:25 EST (Fri, 04 Jan 2013)
>>> New Revision: 27744
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27744
>>> 
>>> Log:
>>> Remove the unnecessary ompi_modex_recv_pointer function.
>>> 
>>> Text files modified: 
>>> trunk/ompi/runtime/ompi_module_exchange.c |22 --
>>>   
>>> trunk/ompi/runtime/ompi_module_exchange.h | 5 - 
>>>   
>>> 2 files changed, 0 insertions(+), 27 deletions(-)
>>> 
>>> Modified: trunk/ompi/runtime/ompi_module_exchange.c
>>> ==
>>> --- trunk/ompi/runtime/ompi_module_exchange.c   Fri Jan  4 15:47:25 
>>> 2013(r27743)
>>> +++ trunk/ompi/runtime/ompi_module_exchange.c   2013-01-04 15:50:25 EST 
>>> (Fri, 04 Jan 2013)  (r27744)
>>> @@ -90,28 +90,6 @@
>>>   return rc;
>>> }
>>> 
>>> -/* return a pointer to the data, but don't create a new copy of it */
>>> -int ompi_modex_recv_pointer(const mca_base_component_t *component,
>>> -const ompi_proc_t *proc,
>>> -void **buffer, opal_data_type_t type)
>>> -{
>>> -int rc;
>>> -char *name = mca_base_component_to_string(component);
>>> -
>>> -/* set defaults */
>>> -*buffer = NULL;
>>> -
>>> -if (NULL == name) {
>>> -return OMPI_ERR_OUT_OF_RESOURCE;
>>> -}
>>> -
>>> -/* the fetch_poointer API returns a pointer to the data */
>>> -rc = orte_db.fetch_pointer(>proc_name, name, buffer, type);
>>> -free(name);
>>> -
>>> -return rc;
>>> -}
>>> -
>>> int
>>> ompi_modex_send_string(const char* key,
>>>  const void *buffer, size_t size)
>>> 
>>> Modified: trunk/ompi/runtime/ompi_module_exchange.h
>>> ==
>>> --- trunk/ompi/runtime/ompi_module_exchange.h   Fri Jan  4 15:47:25 
>>> 2013(r27743)
>>> +++ trunk/ompi/runtime/ompi_module_exchange.h   2013-01-04 15:50:25 EST 
>>> (Fri, 04 Jan 2013)  (r27744)
>>> @@ -191,11 +191,6 @@
>>> const ompi_proc_t *source_proc,
>>> void **buffer, size_t *size);
>>> 
>>> -
>>> -OMPI_DECLSPEC int ompi_modex_recv_pointer(const mca_base_component_t 
>>> *component,
>>> -  const ompi_proc_t *proc,
>>> -  void **buffer, opal_data_type_t 
>>> type);
>>> -
>>> /**
>>> * Receive a buffer from a given peer
>>> *
>>> ___
>>> svn mailing list
>>> s...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27744 - trunk/ompi/runtime

2013-01-04 Thread George Bosilca

Having unused and untested function lingering out of the interface just for the 
sake of it is counter-productive. Less code is usually equivalent to cleaner 
code, potentially less bugs, to a faster reading and understanding of the code, 
to a faster immersion for newbies. The removed function was trivial, not even 
worth keeping as a reference. It can be re-written in few seconds is the need 
arise.

Of course don't take my word on it: YAGNI 
(http://en.wikipedia.org/wiki/You_ain't_gonna_need_it).

Moreover, I am interesting in your first statement. Can you enlighten us by 
pinpoint to an example where this was an issue?

  George.

On Jan 4, 2013, at 22:24 , Ralph Castain <r...@open-mpi.org> wrote:

> We've had zero luck using that approach in the past - finding a function that 
> has been removed is hard, to say the least. The modex_recv area contains a 
> balanced set of functions that includes both send/recv for each class of API. 
> It was done that way to make it easy for developers to use whatever they 
> needed - otherwise, people tend to write code directly into their local areas.
> 
> I'd prefer to have some currently-unused function that completes the set. Or 
> let's set a policy and go thru every class and framework defined in 
> opal/orte/ompi and remove all APIs that aren't currently used - after all, we 
> can restore those from svn someday too, can't we?
> 
> 
> On Jan 4, 2013, at 1:18 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> Ralph,
>> 
>> This function now belong to our svn history, and will therefore be 
>> resurrected as soon as the need for it become essential. Until then, there 
>> is no real value of having such a function.
>> 
>> George.
>> 
>> On Jan 4, 2013, at 22:08 , Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> I guess it's actually the "recv_string_pointer" function that is used for 
>>> this purpose, but I'd rather not just willy-nilly prune functions out of 
>>> the code base because they aren't currently used. If we apply that 
>>> criteria, a lot of functions that are there for future and/or historical 
>>> reasons would be eliminated - and eventually likely restored.
>>> 
>>> I don't see how this function hurt anyone - other than esthetics, is there 
>>> a reason why this particular function must be removed?
>>> 
>>> 
>>> On Jan 4, 2013, at 1:01 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>> Whoa - that function is used, I believe, to retrieve the pointer to the 
>>>> hostname info in the ompi_proc_t
>>>> 
>>>> 
>>>> On Jan 4, 2013, at 12:50 PM, svn-commit-mai...@open-mpi.org wrote:
>>>> 
>>>>> Author: bosilca (George Bosilca)
>>>>> Date: 2013-01-04 15:50:25 EST (Fri, 04 Jan 2013)
>>>>> New Revision: 27744
>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27744
>>>>> 
>>>>> Log:
>>>>> Remove the unnecessary ompi_modex_recv_pointer function.
>>>>> 
>>>>> Text files modified: 
>>>>> trunk/ompi/runtime/ompi_module_exchange.c |22 --  
>>>>> 
>>>>> trunk/ompi/runtime/ompi_module_exchange.h | 5 -   
>>>>> 
>>>>> 2 files changed, 0 insertions(+), 27 deletions(-)
>>>>> 
>>>>> Modified: trunk/ompi/runtime/ompi_module_exchange.c
>>>>> ==
>>>>> --- trunk/ompi/runtime/ompi_module_exchange.c Fri Jan  4 15:47:25 
>>>>> 2013(r27743)
>>>>> +++ trunk/ompi/runtime/ompi_module_exchange.c 2013-01-04 15:50:25 EST 
>>>>> (Fri, 04 Jan 2013)  (r27744)
>>>>> @@ -90,28 +90,6 @@
>>>>> return rc;
>>>>> }
>>>>> 
>>>>> -/* return a pointer to the data, but don't create a new copy of it */
>>>>> -int ompi_modex_recv_pointer(const mca_base_component_t *component,
>>>>> -const ompi_proc_t *proc,
>>>>> -void **buffer, opal_data_type_t type)
>>>>> -{
>>>>> -int rc;
>>>>> -char *name = mca_base_component_to_string(component);
>>>>> -
>>>>> -/* set defaults */
>>>>> -*buffer = NULL;
>>>>> -
>>>>> -if (NULL == name) {
>>>>> -return OMPI_ERR_OUT_OF_

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27744 - trunk/ompi/runtime

2013-01-04 Thread George Bosilca

Ralph,

First and foremost let me remind you that you are referring here to an OMPI 
level API, a layer where your expertise is not at its greatest. The interface 
you propose to keep and promote is unsafe, as it returns a pointer to a non 
aligned object, this that will crash on almost all non-x86 environments. So, in 
addition to never being used in the current code base, this function was never 
tested, and had a promiscuous behavior that make it unsafe to use without 
special care. There are 3 reasons too many to remove it.

Your email was full with factual statements, with little to none foundation.

You want to resurrect the bproc code we stopped maintaining in 2007, dropped 
support in 1.2 and completely removed the code in the 1.5. Even google can give 
you the answer to that (https://svn.open-mpi.org/trac/ompi/ticket/2755).

Now about the bitmap, I have to guess you're referring to the orte_bitmap. 
Where is that code today? Nowhere, as it was an almost identical copy of 
another similar structure. The direct result of my action of removing the 
orte_bitmap was to that the two bitmap structures got blended together, and 
today we only have a single consistent implementation of the bitmap class used 
in all the layers. I'll bear the responsibility.

However, this discussion lived way over all its expectations. Let's focus on 
constructive topics.

  George.

On Jan 5, 2013, at 00:16 , Ralph Castain <r...@open-mpi.org> wrote:

> 
> On Jan 4, 2013, at 2:20 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> Having unused and untested function lingering out of the interface just for 
>> the sake of it is counter-productive. Less code is usually equivalent to 
>> cleaner code, potentially less bugs, to a faster reading and understanding 
>> of the code, to a faster immersion for newbies. The removed function was 
>> trivial, not even worth keeping as a reference. It can be re-written in few 
>> seconds is the need arise.
> 
> So let me try to articulate this more clearly. You routinely complain about 
> changes being made to the code base that impact your "hidden" code in your 
> offline repos. Yet you feel free to remove a function from  the code base - 
> without warning - because you personally don't see it being used in the svn 
> repo or in *your* private code bases.
> 
> Does that summarize your point-of-view?
> 
> My point is that we routinely "flesh out" APIs to provide broader 
> functionality so it is available when needed. Many of our classes follow that 
> example. Having an appropriate function that fills out a capability also 
> follows that example. It may not be used by the code in the svn repo, or by 
> you personally - but it might have proven useful to others.
> 
> The fact that this function is trivial only makes its removal more laughable 
> - it didn't remove a ton of code, didn't cleanup any code, and couldn't have 
> contributed to bugs in other functions. So its removal was arbitrary - which 
> is why I'm annoyed by it.
> 
> 
>> 
>> Of course don't take my word on it: YAGNI 
>> (http://en.wikipedia.org/wiki/You_ain't_gonna_need_it).
>> 
>> Moreover, I am interesting in your first statement. Can you enlighten us by 
>> pinpoint to an example where this was an issue?
> 
> Sure - I'd like to see anyone go back and recover the bproc code. You may 
> find pieces of it in the repo, assuming you know what frameworks to look for, 
> and you may even be able to figure out a way to expose the code - but good 
> luck trying to re-integrate it into the system.
> 
> I've had to do it a couple of times - like when you whacked the opal_bitmap 
> class because you weren't seeing it used. At least in that case, the time 
> hadn't been too long and the code was contained enough so recovery wasn't too 
> painful. Still, I had to dig thru svn to find the specific changeset that 
> whacked it.
> 
> So whacking something just because *you* don't see it being used isn't the 
> best policy, IMO.
> 
> 
>> 
>> George.
>> 
>> On Jan 4, 2013, at 22:24 , Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> We've had zero luck using that approach in the past - finding a function 
>>> that has been removed is hard, to say the least. The modex_recv area 
>>> contains a balanced set of functions that includes both send/recv for each 
>>> class of API. It was done that way to make it easy for developers to use 
>>> whatever they needed - otherwise, people tend to write code directly into 
>>> their local areas.
>>> 
>>> I'd prefer to have some currently-unused function that completes the set. 
>>> Or let's set a policy and go thru every class and framework defined in 
&g

[OMPI devel] mpirun @ 100%

2013-01-07 Thread George Bosilca

I just noticed that mpirun (r27751) is taking 100% of CPU even for apps with no 
output.

George.

Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

2013-01-10 Thread George Bosilca

Our policy so far was that adding a paper to the list of publication on the 
Open MPI website was a discretionary action at the authors' request. I don't 
see any compelling reason to change. Moreover, Fujitsu being a contributor of 
the Open MPI community, there is no obstacle of adding a link to their paper -- 
at their request.

  George.

On Jan 10, 2013, at 00:15 , Rayson Ho <raysonlo...@gmail.com> wrote:

> Hi Ralph,
> 
> Since the whole journal is available online, and is reachable by
> Google, I don't believe we can get into copyright issues by providing
> a link to it (but then, I also know that there are countries that have
> more crazy web page linking rules!).
> 
> http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol48-3.html
> 
> Rayson
> 
> ==
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> 
> Scalable Cloud HPC: 10,000-node OGS/GE Amazon EC2 cluster
> http://blogs.scalablelogic.com/2012/11/running-1-node-grid-engine-cluster.html
> 
> 
> On Thu, Sep 20, 2012 at 6:46 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> I'm unaware of any formal criteria. The papers currently located there are 
>> those written by members of the OMPI community, but we can certainly link to 
>> something written by someone else, so long as we don't get into copyright 
>> issues.
>> 
>> On Sep 19, 2012, at 11:57 PM, Rayson Ho <raysonlo...@gmail.com> wrote:
>> 
>>> I found this paper recently, "MPI Library and Low-Level Communication
>>> on the K computer", available at:
>>> 
>>> http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf
>>> 
>>> What are the criteria for adding papers to the "Open MPI Publications" page?
>>> 
>>> Rayson
>>> 
>>> ==
>>> Open Grid Scheduler - The Official Open Source Grid Engine
>>> http://gridscheduler.sourceforge.net/
>>> 
>>> 
>>> On Fri, Nov 18, 2011 at 5:32 AM, George Bosilca <bosi...@eecs.utk.edu> 
>>> wrote:
>>>> Dear Yuki and Takahiro,
>>>> 
>>>> Thanks for the bug report and for the patch. I pushed a [nearly identical] 
>>>> patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. 
>>>> A special version for the 1.4 has been prepared and has been attached to 
>>>> the ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916).
>>>> 
>>>> Thanks,
>>>> george.
>>>> 
>>>> 
>>>> On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote:
>>>> 
>>>>> Dear Open MPI community,
>>>>> 
>>>>> I'm a member of MPI library development team in Fujitsu,
>>>>> Takahiro Kawashima, who sent mail before, is my colleague.
>>>>> We start to feed back.
>>>>> 
>>>>> First, we fixed about MPI_LB/MPI_UB and data packing problem.
>>>>> 
>>>>> Program crashes when it meets all of the following conditions:
>>>>> a: The type of sending data is contiguous and derived type.
>>>>> b: Either or both of MPI_LB and MPI_UB is used in the data type.
>>>>> c: The size of sending data is smaller than extent(Data type has gap).
>>>>> d: Send-count is bigger than 1.
>>>>> e: Total size of data is bigger than "eager limit"
>>>>> 
>>>>> This problem occurs in attachment C program.
>>>>> 
>>>>> An incorrect-address accessing occurs
>>>>> because an unintended value of "done" inputs and
>>>>> the value of "max_allowd" becomes minus
>>>>> in the following place in "ompi/datatype/datatype_pack.c(in version 
>>>>> 1.4.3)".
>>>>> 
>>>>> 
>>>>> (ompi/datatype/datatype_pack.c)
>>>>> 188 packed_buffer = (unsigned char *) iov[iov_count].iov_base;
>>>>> 189 done = pConv->bConverted - i * pData->size;  /* partial 
>>>>> data from last pack */
>>>>> 190 if( done != 0 ) {  /* still some data to copy from the 
>>>>> last time */
>>>>> 191 done = pData->size - done;
>>>>> 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, 
>>>>> pConv->pBaseBuf, pData, pConv->count );
>>>>> 193

Re: [OMPI devel] mpirun @ 100%

2013-01-15 Thread George Bosilca

That works perfectly. Thanks!

  George.

On Jan 15, 2013, at 00:07 , Ralph Castain <r...@open-mpi.org> wrote:

> Sorry for delay - was recovering from IT-induced computer failure. Fixed in 
> r27815 - an artifact caused by not using the progress threads. Thanks!
> 
> On Jan 7, 2013, at 1:30 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> I just noticed that mpirun (r27751) is taking 100% of CPU even for apps with 
>> no output.
>> 
>> George.
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [patch] MPI-2.2: Ordering of attribution deletion callbacks on MPI_COMM_SELF

2013-01-17 Thread George Bosilca

Takahiro,

Thanks for the patch. I deplore the lost of the hash table in the attribute 
management, as the potential of transforming all attributes operation to a 
linear complexity is not very appealing.

As you already took the decision C, it means that at the communicator 
destruction stage the hash table is not relevant anymore. Thus, I would have 
converted the hash table to an ordered list (ordered by the creation index, a 
global entity atomically updated every time an attribute is created), and 
proceed to destroy the attributed in the desired order. Thus instead of having 
a linear operation for every operation on attributes, we only have a single 
linear operation per communicator (and this during the destruction stage).

  George.

On Jan 16, 2013, at 16:37 , KAWASHIMA Takahiro  
wrote:

> Hi,
> 
> I've implemented ticket #3123 "MPI-2.2: Ordering of attribution deletion
> callbacks on MPI_COMM_SELF".
> 
>  https://svn.open-mpi.org/trac/ompi/ticket/3123
> 
> As this ticket says, attributes had been stored in unordered hash.
> So I've replaced opal_hash_table_t with opal_list_t and made necessary
> modifications for it. And I've also fixed some multi-threaded concurrent
> (get|set|delete)_attr call issues.
> 
> By this modification, following behavior changes are introduced.
> 
>  (A) MPI_(Comm|Type|Win)_(get|set|delete)_attr function may be slower
>  for MPI objects that has many attributes attached.
>  (B) When the user-defined delete callback function is called, the
>  attribute is already removed from the list. In other words,
>  if MPI_(Comm|Type|Win)_get_attr is called by the user-defined
>  delete callback function for the same attribute key, it returns
>  flag = false.
>  (C) Even if the user-defined delete callback function returns non-
>  MPI_SUCCESS value, the attribute is not reverted to the list.
> 
> (A) is due to a sequential list search instead of a hash. See find_value
> function for its implementation.
> (B) and (C) are due to an atomic deletion of the attribute to allow
> multi-threaded concurrent (get|set|delete)_attr call in MPI_THREAD_MULTIPLE.
> See ompi_attr_delete function for its implementation. I think this does
> not matter because MPI standard doesn't specify behavior in such cases.
> 
> The patch for Open MPI trunk is attached. If you like it, take in
> this patch.
> 
> Though I'm a employee of a company, this is my independent and private
> work at my home. No intellectual property from my company. If needed,
> I'll sign to Individual Contributor License Agreement.
> 
> Regards,
> KAWASHIMA Takahiro
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] MPI-2.2 status #2223, #3127

2013-01-18 Thread George Bosilca

Takahiro,

The MPI_Dist_graph effort is happening in 
ssh://h...@bitbucket.org/bosilca/ompi-topo. I would definitely be interested in 
seeing some test cases, and giving this branch a tough test.

  George.

On Jan 18, 2013, at 02:43 , "Kawashima, Takahiro"  
wrote:

> Hi,
> 
> Fujitsu is interested in completing MPI-2.2 on Open MPI and Open MPI
> -based Fujitsu MPI.
> 
> We've read wiki and tickets. These two tickets seem to be almost done
> but need testing and bug fixing.
> 
>  https://svn.open-mpi.org/trac/ompi/ticket/2223
>  MPI-2.2: MPI_Dist_graph_* functions missing
> 
>  https://svn.open-mpi.org/trac/ompi/ticket/3127
>  MPI-2.2: Add reduction support for MPI_C_*COMPLEX and MPI::*COMPLEX
> 
> My colleagues are planning to work on these. They will write test codes
> and try to fix bugs. Test codes and patches can be contributed to the
> community. If they cannot fix some bugs, we will report details. They
> are planning to complete them in around March.
> 
> With that two questions.
> 
> The latest statuses written in these ticket comments are correct?
> Is there any more progress?
> 
> Where are the latest codes?
> In ticket #2223 says it is on Jeff's ompi-topo-fixes bitbucket branch.
>  https://bitbucket.org/jsquyres/ompi-topo-fixes
> But Jeff seems to have one more branch with a similar name.
>  https://bitbucket.org/jsquyres/ompi-topo-fixes-fixed
> Ticket #3127 says it is on Jeff's mpi22-c-complex bitbucket branch.
> But there is no such branch now.
>  https://bitbucket.org/jsquyres/mpi22-c-complex
> 
> Best regards,
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Open MPI (not quite) on Cray XC30

2013-01-18 Thread George Bosilca

Luckily for us all the definitions contain the same constant (orte). r27864 
should fix this.

  George.


On Jan 18, 2013, at 06:21 , Paul Hargrove  wrote:

> My employer has a nice new Cray XC30 (aka Cascade), and I thought I'd give 
> Open MPI a quick test.
> 
> Given that it is INTENDED to be API-compatible with the XE series, I began 
> configuring with
> CC=cc CXX=CC FC=ftn --with-platform=lanl/cray_xe6/optimized-nopanasas
> However, since this is Intel h/w, I commented-out the following 2 lines in 
> the platform file:
> with_wrapper_cflags="-march=amdfam10"
> CFLAGS=-march=amdfam10
> 
> I am using PrgEnv-gnu/5.0.15, though PrgEnv-intel is the default on our system
> 
> As far as I know, use of 1.6.x is out - no ugni at all, right?
> So, I didn't even try.
> 
> I gave openmpi-1.7rc6 a try, but the ALPS headers and libs have moved (as 
> mentioned in ompi-trunk/config/orte_check_alps.m4).
> Perhaps one should CMR the updated-for-CLE-5 configure logic to the 1.7 
> branch?
> 
> Next, I tried a trunk nightly tarball: openmpi-1.9a1r27862.tar.bz2
> As I mentioned above, the trunk has the right logic for locating ALPS.
> However, it looks like there is some untested code, protected by "#if 
> WANT_CRAY_PMI2_EXT", that needs work:
> 
> make[2]: Entering directory 
> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
>   CC   db_pmi_component.lo
>   CC   db_pmi.lo
> ../../../../../orte/mca/db/pmi/db_pmi.c: In function 'store':
> ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: 'ptr' undeclared (first 
> use in this function)
> ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: (Each undeclared 
> identifier is reported only once
> ../../../../../orte/mca/db/pmi/db_pmi.c:202: error: for each function it 
> appears in.)
> make[2]: *** [db_pmi.lo] Error 1
> make[2]: Leaving directory 
> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/mca/db/pmi'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory 
> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte'
> make: *** [all-recursive] Error 1
> 
> I added the missing "char *ptr" declaration a few lines before it's first 
> use, and resumed the build.
> This time the build terminated at
> 
> make[2]: Entering directory 
> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/opal/tools/wrappers'
>   CC   opal_wrapper.o
>   CCLD opal_wrapper
> /usr/bin/ld: attempted static link of dynamic object 
> `../../../opal/.libs/libopen-pal.so'
> collect2: error: ld returned 1 exit status
> 
> So I went back to the platform file and changed
>enable_shared=yes
> to
>enable_shared=no
> No big deal there - I had to make the same change for our XE6.
> 
> And so I started back at configure (after a "make distclean", to be safe), 
> and here is the next error:
> 
> Making all in tools/orte-info
> make[2]: Entering directory 
> `/global/scratch/sd/hargrove/OMPI/openmpi-1.9a1r27862/BUILD/orte/tools/orte-info'
>   CCLD orte-info
> ../../../orte/.libs/libopen-rte.a(orte_info_support.o): In function 
> `orte_info_show_orte_version':
> orte_info_support.c:(.text+0xd70): multiple definition of 
> `orte_info_show_orte_version'
> version.o:version.c:(.text+0x4b0): first defined here
> ../../../orte/.libs/libopen-rte.a(orte_info_support.o):(.data+0x0): multiple 
> definition of `orte_info_type_orte'
> orte-info.o:(.data+0x10): first defined here
> /usr/bin/ld: link errors found, deleting executable `orte-info'
> collect2: error: ld returned 1 exit status
> make[2]: *** [orte-info] Error 1
> 
> I am not sure how to fix this, but I would guess this is probably a simple 
> fix for somebody who knows OMPI's build infrastructure better than I.
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] MPI-2.2 status #2223, #3127

2013-01-18 Thread George Bosilca

It's a fork from the official ompi (well the hg version of it). We will push 
back once we're done.

  George.

On Jan 18, 2013, at 15:42 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
wrote:

> George --
> 
> Should I pull from your repo into 
> https://bitbucket.org/jsquyres/ompi-topo-fixes-fixed?  Or did you effectively 
> fork, and you guys will put back to SVN when you're done?
> 
> 
> On Jan 18, 2013, at 5:47 AM, George Bosilca <bosi...@icl.utk.edu>
> wrote:
> 
>> Takahiro,
>> 
>> The MPI_Dist_graph effort is happening in 
>> ssh://h...@bitbucket.org/bosilca/ompi-topo. I would definitely be interested 
>> in seeing some test cases, and giving this branch a tough test.
>> 
>> George.
>> 
>> On Jan 18, 2013, at 02:43 , "Kawashima, Takahiro" 
>> <t-kawash...@jp.fujitsu.com> wrote:
>> 
>>> Hi,
>>> 
>>> Fujitsu is interested in completing MPI-2.2 on Open MPI and Open MPI
>>> -based Fujitsu MPI.
>>> 
>>> We've read wiki and tickets. These two tickets seem to be almost done
>>> but need testing and bug fixing.
>>> 
>>> https://svn.open-mpi.org/trac/ompi/ticket/2223
>>> MPI-2.2: MPI_Dist_graph_* functions missing
>>> 
>>> https://svn.open-mpi.org/trac/ompi/ticket/3127
>>> MPI-2.2: Add reduction support for MPI_C_*COMPLEX and MPI::*COMPLEX
>>> 
>>> My colleagues are planning to work on these. They will write test codes
>>> and try to fix bugs. Test codes and patches can be contributed to the
>>> community. If they cannot fix some bugs, we will report details. They
>>> are planning to complete them in around March.
>>> 
>>> With that two questions.
>>> 
>>> The latest statuses written in these ticket comments are correct?
>>> Is there any more progress?
>>> 
>>> Where are the latest codes?
>>> In ticket #2223 says it is on Jeff's ompi-topo-fixes bitbucket branch.
>>> https://bitbucket.org/jsquyres/ompi-topo-fixes
>>> But Jeff seems to have one more branch with a similar name.
>>> https://bitbucket.org/jsquyres/ompi-topo-fixes-fixed
>>> Ticket #3127 says it is on Jeff's mpi22-c-complex bitbucket branch.
>>> But there is no such branch now.
>>> https://bitbucket.org/jsquyres/mpi22-c-complex
>>> 
>>> Best regards,
>>> Takahiro Kawashima,
>>> MPI development team,
>>> Fujitsu
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] MPI-2.2 status #2223, #3127

2013-01-18 Thread George Bosilca

Long story short. It is freshly forked from the OMPI trunk, patched with 
topi-fixes (and not topo-fixes-fixed for some reason). Don't whack them yet, 
let me take a look more in details.

  George.


On Jan 18, 2013, at 17:10 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
wrote:

> Ok.  If it contains everything you put on the original topo-fixes (and 
> topo-fixes-fixed), I might as well kill those two repos and put your repo URL 
> on the ticket.
> 
> So -- before I whack those two -- can you absolutely confirm that you've got 
> everything from the topo-fixes-fixed repo?  IIRC, there was some other 
> fixes/updates to the topo base in there, not just the new dist_graph 
> improvements.
> 
> 
> On Jan 18, 2013, at 11:06 AM, George Bosilca <bosi...@icl.utk.edu>
> wrote:
> 
>> It's a fork from the official ompi (well the hg version of it). We will push 
>> back once we're done.
>> 
>> George.
>> 
>> On Jan 18, 2013, at 15:42 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
>> wrote:
>> 
>>> George --
>>> 
>>> Should I pull from your repo into 
>>> https://bitbucket.org/jsquyres/ompi-topo-fixes-fixed?  Or did you 
>>> effectively fork, and you guys will put back to SVN when you're done?
>>> 
>>> 
>>> On Jan 18, 2013, at 5:47 AM, George Bosilca <bosi...@icl.utk.edu>
>>> wrote:
>>> 
>>>> Takahiro,
>>>> 
>>>> The MPI_Dist_graph effort is happening in 
>>>> ssh://h...@bitbucket.org/bosilca/ompi-topo. I would definitely be 
>>>> interested in seeing some test cases, and giving this branch a tough test.
>>>> 
>>>> George.
>>>> 
>>>> On Jan 18, 2013, at 02:43 , "Kawashima, Takahiro" 
>>>> <t-kawash...@jp.fujitsu.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Fujitsu is interested in completing MPI-2.2 on Open MPI and Open MPI
>>>>> -based Fujitsu MPI.
>>>>> 
>>>>> We've read wiki and tickets. These two tickets seem to be almost done
>>>>> but need testing and bug fixing.
>>>>> 
>>>>> https://svn.open-mpi.org/trac/ompi/ticket/2223
>>>>> MPI-2.2: MPI_Dist_graph_* functions missing
>>>>> 
>>>>> https://svn.open-mpi.org/trac/ompi/ticket/3127
>>>>> MPI-2.2: Add reduction support for MPI_C_*COMPLEX and MPI::*COMPLEX
>>>>> 
>>>>> My colleagues are planning to work on these. They will write test codes
>>>>> and try to fix bugs. Test codes and patches can be contributed to the
>>>>> community. If they cannot fix some bugs, we will report details. They
>>>>> are planning to complete them in around March.
>>>>> 
>>>>> With that two questions.
>>>>> 
>>>>> The latest statuses written in these ticket comments are correct?
>>>>> Is there any more progress?
>>>>> 
>>>>> Where are the latest codes?
>>>>> In ticket #2223 says it is on Jeff's ompi-topo-fixes bitbucket branch.
>>>>> https://bitbucket.org/jsquyres/ompi-topo-fixes
>>>>> But Jeff seems to have one more branch with a similar name.
>>>>> https://bitbucket.org/jsquyres/ompi-topo-fixes-fixed
>>>>> Ticket #3127 says it is on Jeff's mpi22-c-complex bitbucket branch.
>>>>> But there is no such branch now.
>>>>> https://bitbucket.org/jsquyres/mpi22-c-complex
>>>>> 
>>>>> Best regards,
>>>>> Takahiro Kawashima,
>>>>> MPI development team,
>>>>> Fujitsu
>>>>> ___
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> 
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27881 - trunk/ompi/mca/btl/tcp

2013-01-22 Thread George Bosilca

Nobody cared about error cases so far, I don't personally see any incentive to 
push this patch in the 1.7 right now. But I won't be against as it is not 
hurting either.

  George.


On Jan 22, 2013, at 16:28 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
wrote:

> George --
> 
> Similar question on this one: should it be CMR'ed to v1.7?  (I kinda doubt 
> it's appropriate for v1.6)
> 
> 
> On Jan 21, 2013, at 6:41 AM, svn-commit-mai...@open-mpi.org wrote:
> 
>> Author: bosilca (George Bosilca)
>> Date: 2013-01-21 06:41:08 EST (Mon, 21 Jan 2013)
>> New Revision: 27881
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27881
>> 
>> Log:
>> Make the TCP BTL really fail-safe. It now trigger the error callback on
>> all pending fragments when the destination goes down. This allows the PML
>> to recalibrate its behavior, either find an alternate route or just give up.
>> 
>> Text files modified: 
>>  trunk/ompi/mca/btl/tcp/btl_tcp_endpoint.c |29 
>> +++--   
>>  trunk/ompi/mca/btl/tcp/btl_tcp_frag.c | 7 ++-   
>>   
>>  trunk/ompi/mca/btl/tcp/btl_tcp_proc.c | 2 +-
>>   
>>  3 files changed, 34 insertions(+), 4 deletions(-)
>> 
>> Modified: trunk/ompi/mca/btl/tcp/btl_tcp_endpoint.c
>> ==
>> --- trunk/ompi/mca/btl/tcp/btl_tcp_endpoint.cMon Jan 21 06:35:42 
>> 2013(r27880)
>> +++ trunk/ompi/mca/btl/tcp/btl_tcp_endpoint.c2013-01-21 06:41:08 EST 
>> (Mon, 21 Jan 2013)  (r27881)
>> @@ -2,7 +2,7 @@
>> * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>> * University Research and Technology
>> * Corporation.  All rights reserved.
>> - * Copyright (c) 2004-2008 The University of Tennessee and The University
>> + * Copyright (c) 2004-2013 The University of Tennessee and The University
>> * of Tennessee Research Foundation.  All rights
>> * reserved.
>> * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, 
>> @@ -295,6 +295,7 @@
>>if(opal_socket_errno != EINTR && opal_socket_errno != EAGAIN && 
>> opal_socket_errno != EWOULDBLOCK) {
>>BTL_ERROR(("send() failed: %s (%d)",
>>   strerror(opal_socket_errno), opal_socket_errno));
>> +btl_endpoint->endpoint_state = MCA_BTL_TCP_FAILED;
>>mca_btl_tcp_endpoint_close(btl_endpoint);
>>return -1;
>>}
>> @@ -359,6 +360,7 @@
>>mca_btl_tcp_endpoint_close(btl_endpoint);
>>btl_endpoint->endpoint_sd = sd;
>>if(mca_btl_tcp_endpoint_send_connect_ack(btl_endpoint) != 
>> OMPI_SUCCESS) {
>> +btl_endpoint->endpoint_state = MCA_BTL_TCP_FAILED;
>>mca_btl_tcp_endpoint_close(btl_endpoint);
>>OPAL_THREAD_UNLOCK(_endpoint->endpoint_send_lock);
>>OPAL_THREAD_UNLOCK(_endpoint->endpoint_recv_lock);
>> @@ -389,7 +391,6 @@
>> {
>>if(btl_endpoint->endpoint_sd < 0)
>>return;
>> -btl_endpoint->endpoint_state = MCA_BTL_TCP_CLOSED;
>>btl_endpoint->endpoint_retries++;
>>opal_event_del(_endpoint->endpoint_recv_event);
>>opal_event_del(_endpoint->endpoint_send_event);
>> @@ -401,6 +402,24 @@
>>btl_endpoint->endpoint_cache_pos= NULL;
>>btl_endpoint->endpoint_cache_length = 0;
>> #endif  /* MCA_BTL_TCP_ENDPOINT_CACHE */
>> +/**
>> + * If we keep failing to connect to the peer let the caller know about
>> + * this situation by triggering all the pending fragments callback and
>> + * reporting the error.
>> + */
>> +if( MCA_BTL_TCP_FAILED == btl_endpoint->endpoint_state ) {
>> +mca_btl_tcp_frag_t* frag = btl_endpoint->endpoint_send_frag;
>> +if( NULL == frag ) 
>> +frag = 
>> (mca_btl_tcp_frag_t*)opal_list_remove_first(_endpoint->endpoint_frags);
>> +while(NULL != frag) {
>> +frag->base.des_cbfunc(>btl->super, frag->endpoint, 
>> >base, OMPI_ERR_UNREACH);
>> +
>> +frag = 
>> (mca_btl_tcp_frag_t*)opal_list_remove_first(_endpoint->endpoint_frags);
>> +}
>> +} else {
>> +btl_endpoint->endpoint_state = MCA_BTL_TCP_C

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27880 - trunk/ompi/request

2013-01-22 Thread George Bosilca

To be honest it was hanging in one of my repos for some time. If I'm not 
mistaken it is somehow related to one active ticket (but I couldn't find the 
info). It might be good to push it upstream.

  George.

On Jan 22, 2013, at 16:27 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
wrote:

> George --
> 
> Is there any reason not to CMR this to v1.6 and v1.7?
> 
> 
> On Jan 21, 2013, at 6:35 AM, svn-commit-mai...@open-mpi.org wrote:
> 
>> Author: bosilca (George Bosilca)
>> Date: 2013-01-21 06:35:42 EST (Mon, 21 Jan 2013)
>> New Revision: 27880
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27880
>> 
>> Log:
>> My understanding is that an MPI_WAIT() on an inactive request should
>> return the empty status (MPI 3.0 page 52 line 46).
>> 
>> Text files modified: 
>>  trunk/ompi/request/req_wait.c | 3 +++   
>>   
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>> 
>> Modified: trunk/ompi/request/req_wait.c
>> ==
>> --- trunk/ompi/request/req_wait.cSat Jan 19 19:33:42 2013(r27879)
>> +++ trunk/ompi/request/req_wait.c2013-01-21 06:35:42 EST (Mon, 21 Jan 
>> 2013)  (r27880)
>> @@ -61,6 +61,9 @@
>>}
>>if( req->req_persistent ) {
>>if( req->req_state == OMPI_REQUEST_INACTIVE ) {
>> +if (MPI_STATUS_IGNORE != status) {
>> +*status = ompi_status_empty;
>> +}
>>return OMPI_SUCCESS;
>>}
>>req->req_state = OMPI_REQUEST_INACTIVE;
>> ___
>> svn-full mailing list
>> svn-f...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27881 - trunk/ompi/mca/btl/tcp

2013-01-23 Thread George Bosilca

While we always strive to improve this functionality, it was available as a 
separate software packages for quite some time.

  George.


On Jan 23, 2013, at 08:05 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> Are you going to develop anything further with regards to this functionality, 
> and target that stuff for v1.7?  Or should all of this just wait until 1.9?
> 
> (I don't really care either way; I'm asking out of curiosity)
> 
> 
> On Jan 22, 2013, at 7:24 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> Nobody cared about error cases so far, I don't personally see any incentive 
>> to push this patch in the 1.7 right now. But I won't be against as it is not 
>> hurting either.
>> 
>> George.
>> 
>> 
>> On Jan 22, 2013, at 16:28 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
>> wrote:
>> 
>>> George --
>>> 
>>> Similar question on this one: should it be CMR'ed to v1.7?  (I kinda doubt 
>>> it's appropriate for v1.6)
>>> 
>>> 
>>> On Jan 21, 2013, at 6:41 AM, svn-commit-mai...@open-mpi.org wrote:
>>> 
>>>> Author: bosilca (George Bosilca)
>>>> Date: 2013-01-21 06:41:08 EST (Mon, 21 Jan 2013)
>>>> New Revision: 27881
>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27881
>>>> 
>>>> Log:
>>>> Make the TCP BTL really fail-safe. It now trigger the error callback on
>>>> all pending fragments when the destination goes down. This allows the PML
>>>> to recalibrate its behavior, either find an alternate route or just give 
>>>> up.
>>>> 
>>>> Text files modified: 
>>>> trunk/ompi/mca/btl/tcp/btl_tcp_endpoint.c |29 
>>>> +++--   
>>>> trunk/ompi/mca/btl/tcp/btl_tcp_frag.c | 7 ++-  
>>>>
>>>> trunk/ompi/mca/btl/tcp/btl_tcp_proc.c | 2 +-   
>>>>
>>>> 3 files changed, 34 insertions(+), 4 deletions(-)
>>>> 
>>>> Modified: trunk/ompi/mca/btl/tcp/btl_tcp_endpoint.c
>>>> ==
>>>> --- trunk/ompi/mca/btl/tcp/btl_tcp_endpoint.c  Mon Jan 21 06:35:42 
>>>> 2013(r27880)
>>>> +++ trunk/ompi/mca/btl/tcp/btl_tcp_endpoint.c  2013-01-21 06:41:08 EST 
>>>> (Mon, 21 Jan 2013)  (r27881)
>>>> @@ -2,7 +2,7 @@
>>>> * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>>>> * University Research and Technology
>>>> * Corporation.  All rights reserved.
>>>> - * Copyright (c) 2004-2008 The University of Tennessee and The University
>>>> + * Copyright (c) 2004-2013 The University of Tennessee and The University
>>>> * of Tennessee Research Foundation.  All rights
>>>> * reserved.
>>>> * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, 
>>>> @@ -295,6 +295,7 @@
>>>>  if(opal_socket_errno != EINTR && opal_socket_errno != EAGAIN && 
>>>> opal_socket_errno != EWOULDBLOCK) {
>>>>  BTL_ERROR(("send() failed: %s (%d)",
>>>> strerror(opal_socket_errno), opal_socket_errno));
>>>> +btl_endpoint->endpoint_state = MCA_BTL_TCP_FAILED;
>>>>  mca_btl_tcp_endpoint_close(btl_endpoint);
>>>>  return -1;
>>>>  }
>>>> @@ -359,6 +360,7 @@
>>>>  mca_btl_tcp_endpoint_close(btl_endpoint);
>>>>  btl_endpoint->endpoint_sd = sd;
>>>>  if(mca_btl_tcp_endpoint_send_connect_ack(btl_endpoint) != 
>>>> OMPI_SUCCESS) {
>>>> +btl_endpoint->endpoint_state = MCA_BTL_TCP_FAILED;
>>>>  mca_btl_tcp_endpoint_close(btl_endpoint);
>>>>  OPAL_THREAD_UNLOCK(_endpoint->endpoint_send_lock);
>>>>  OPAL_THREAD_UNLOCK(_endpoint->endpoint_recv_lock);
>>>> @@ -389,7 +391,6 @@
>>>> {
>>>>  if(btl_endpoint->endpoint_sd < 0)
>>>>  return;
>>>> -btl_endpoint->endpoint_state = MCA_BTL_TCP_CLOSED;
>>>>  btl_endpoint->endpoint_retries++;
>>>>  opal_event_del(_endpoint->endpoint_recv_event);
>>>

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27881 - trunk/ompi/mca/btl/tcp

2013-01-24 Thread George Bosilca

http://fault-tolerance.org/

  George.

On Wed, Jan 23, 2013 at 5:10 PM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com> wrote:
> On Jan 23, 2013, at 10:27 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
>> While we always strive to improve this functionality, it was available as a 
>> separate software packages for quite some time.
>
> What separate software package are you referring to?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Open MPI on Cray XC30 - suspicous configury

2013-01-28 Thread George Bosilca

What Paul is saying is that there is a path mismatch between the two
cases. Few lines above using_cle5_install is only set to yes if
/usr/lib/alps/libalps.a exist. Then in the snippet pasted in Paul's
email if using_cle5_install is yes then you set the
orte_check_alps_libdir to something in /opt/cray/. Why not to /usr/ as
in the test few lines above?

On Mon, Jan 28, 2013 at 9:14 PM, Ralph Castain  wrote:
>
> On Jan 28, 2013, at 6:10 PM, Paul Hargrove  wrote:
>
> The following 2 fragment from config/orte_check_alps.m4 appear to be
> contradictory.
> By that I mean the first appears to mean that "--with-alps" with no argument
> means /opt/cray/alps/default/... for CLE5 and /usr/... for CLE4, while the
> second fragment appears to be doing the opposite:
>
>if test "$using_cle5_install" = "yes"; then
>orte_check_alps_libdir="/opt/cray/alps/default/lib64"
>else
>orte_check_alps_libdir="/usr/lib/alps"
>fi
>
>
>if test "$using_cle5_install" = "yes" ; then
>   AS_IF([test "$with_alps" = "yes"],
> [orte_check_alps_dir="/usr"],
> [orte_check_alps_dir="$with_alps"])
>else
>   AS_IF([test "$with_alps" = "yes"],
> [orte_check_alps_dir="/opt/cray/alps/default"],
> [orte_check_alps_dir="$with_alps"])
>fi
>
> At least based on header and lib locations on NERSC's XC30 (CLE 5.0.15) and
> XE6 (CLE 4.1.40), the first fragment is correctwhile the second fragment is
> "backwards" (the two calls to AS_IF should be exchanged, or the initial
> "test" should be inverted).
>
>
> ?? It looks correct to me - if with_alps is "yes", then no path was given
> and we have to look at a default location. If it isn't yes, then a path was
> given and we use it.
>
> Am I missing something?
>
>
> Note this same logic is present in both trunk and v1.7 (in SVN - I am not
> looking at tarballs this time).
>
> -Paul
>
>
>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI bugs] [Open MPI] #3489: Move r27954 to v1.7 branch

2013-01-28 Thread George Bosilca

Ralph,

What if I say it wasn't a "stale" option nobody cares about. You just
removed one of the critical pieces of the configury, completely
disabling the work of other people.

I am absolutely sorry that I didn't make it in the 27 minutes you
generously provided for comments. Removing from the trunk and pushing
it in the 1.7 branch in absolute agreement with yourself and all that
in a mere 27 minutes is an absolute feat (and not your first). For
some obscure reasons I had the feeling we had some level of protection
(gk, rm, a reasonable amount of time to people to comment), but I
guess those rules are for weaklings.

  George.

PS: I have so much fun reading a barely 3-weeks old thread on our
mailing list. Absolutely terrific:
http://www.open-mpi.org/community/lists/devel/2013/01/11901.php.

On Mon, Jan 28, 2013 at 9:22 PM, Open MPI  wrote:
> #3489: Move r27954 to v1.7 branch
> ---+---
> Reporter:  rhc |   Owner:  rhc
> Type:  changeset move request  |  Status:  closed
> Priority:  major   |   Milestone:  Open MPI 1.7
>  Version:  trunk   |  Resolution:  fixed
> Keywords:  |
> ---+---
> Changes (by rhc):
>
>  * status:  new => closed
>  * resolution:   => fixed
>
>
> Comment:
>
>  (In [27957]) Fixes #3489: Move r27954 to v1.7 branch
>
>  ---svn-pre-commit-ignore-below---
>
>  r27954 [[BR]]
>  Remove stale ft options.
>
>  cmr:v1.7
>
> --
> Ticket URL: 
> Open MPI 
>
> ___
> bugs mailing list
> b...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/bugs

Re: [OMPI devel] [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp

2013-02-01 Thread George Bosilca

Jeff,

So far, all interfaces specified via MCA parameters for the BTL TCP
are required to exist. Otherwise an error message is printed and an
error returned to the upper level, with the intent that no BTLs of
this type will be enabled (as an example btl_tcp_component.c:682).

If I correctly understand your commit, it change this [so far
consistent] behavior for a single of our TCP MCA parameter (if_seq)
to: print an error message and then continue. As you set
themca_btl_tcp_component.tcp_if_seq to NULL this is as if this
argument was never provided.

I prefer the old behavior for its corrective meaning (you fix it and
then it works), as well as for its consistency with the other BTL TCP
parameters.

  George.

On Fri, Feb 1, 2013 at 3:17 PM,   wrote:
> Author: jsquyres (Jeff Squyres)
> Date: 2013-02-01 15:17:43 EST (Fri, 01 Feb 2013)
> New Revision: 28016
> URL: https://svn.open-mpi.org/trac/ompi/changeset/28016
>
> Log:
> As the help message states, it's not an ''error'' if the specified
> interface is not found.  It should just be skipped.
>
> Text files modified:
>trunk/ompi/mca/btl/tcp/btl_tcp_component.c | 8 +---
>1 files changed, 5 insertions(+), 3 deletions(-)
>
> Modified: trunk/ompi/mca/btl/tcp/btl_tcp_component.c
> ==
> --- trunk/ompi/mca/btl/tcp/btl_tcp_component.c  Fri Feb  1 09:27:37 2013  
>   (r28015)
> +++ trunk/ompi/mca/btl/tcp/btl_tcp_component.c  2013-02-01 15:17:43 EST (Fri, 
> 01 Feb 2013)  (r28016)
> @@ -314,10 +314,12 @@
> ompi_process_info.nodename,
> mca_btl_tcp_component.tcp_if_seq,
> "Interface does not exist");
> -return OMPI_ERR_BAD_PARAM;
> +free(mca_btl_tcp_component.tcp_if_seq);
> +mca_btl_tcp_component.tcp_if_seq = NULL;
> +} else {
> +BTL_VERBOSE(("Node rank %d using TCP interface %s",
> + node_rank, mca_btl_tcp_component.tcp_if_seq));
>  }
> -BTL_VERBOSE(("Node rank %d using TCP interface %s",
> - node_rank, mca_btl_tcp_component.tcp_if_seq));
>  }
>  }
>
> ___
> svn mailing list
> s...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn

Re: [OMPI devel] [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp

2013-02-01 Thread George Bosilca

I did not say we abort, I say we prevent BTL TCP from being used. In
your example, I guess the TCP is disabled but the PML finds another
available interface and keeps going. If I try the same thing with
"--mca btl tcp,self" it does abort on my cluster.

---
mpirun -np 2 --mca btl tcp,self --mca btl_tcp_if_include eth3 ./ring_c
[dancer02][[48001,1],1][../../../../../ompi/ompi/mca/btl/tcp/btl_tcp_component.c:682:mca_btl_tcp_component_create_instances]
invalid interface "eth3"
[dancer01][[48001,1],0][../../../../../ompi/ompi/mca/btl/tcp/btl_tcp_component.c:682:mca_btl_tcp_component_create_instances]
invalid interface "eth3"
--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[48001,1],0]) is on host: node01
  Process 2 ([[48001,1],1]) is on host: node02
  BTLs attempted: self

Your MPI job is now going to abort; sorry.
---

The only reason I see for having the if_seq in first place, it to
nicely balance the TCP traffic over multiple interfaces. As your patch
set the if_seq to NULL, it basically allows the TCP BTL to use __all__
available interfaces, reaching exactly the opposite compared to the
usage of the if_seq specified by the user. As a result the application
will execute over all available interfaces, the result (especially in
terms of performance) might not be what the users expected. Very
confusing from my perspective.

  George.


On Fri, Feb 1, 2013 at 6:50 PM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com> wrote:
> On Feb 1, 2013, at 6:28 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
>> So far, all interfaces specified via MCA parameters for the BTL TCP
>> are required to exist. Otherwise an error message is printed and an
>> error returned to the upper level, with the intent that no BTLs of
>> this type will be enabled (as an example btl_tcp_component.c:682).
>
> Actually, it doesn't -- that's why I made this one match the other behavior.
>
> For example, if I exclude an interface that doesn't exist (on v1.6 HEAD):
>
> -
> [15:40] savbu-usnic:~/svn/ompi-1.6/examples % mpirun -np 2 --mca 
> btl_tcp_if_exclude lo,bogus ring_c
> Process 0 sending 10 to 1, tag 201 (2 processes in ring)
> Process 0 sent to 1
> Process 0 decremented value: 9
> Process 0 decremented value: 8
> Process 0 decremented value: 7
> Process 0 decremented value: 6
> Process 0 decremented value: 5
> Process 0 decremented value: 4
> Process 0 decremented value: 3
> Process 0 decremented value: 2
> Process 0 decremented value: 1
> Process 0 decremented value: 0
> Process 0 exiting
> Process 1 exiting
> [15:40] savbu-usnic:~/svn/ompi-1.6/examples %
> -
>
> Or if I include an interface that doesn't exist (although this one warns):
>
> -
> [15:40] savbu-usnic:~/svn/ompi-1.6/examples % mpirun -np 2 --mca 
> btl_tcp_if_include eth0,bogus ring_c
> [savbu-usnic][[7221,1],0][btl_tcp_component.c:682:mca_btl_tcp_component_create_instances]
>  invalid interface "bogus"
> [savbu-usnic][[7221,1],1][btl_tcp_component.c:682:mca_btl_tcp_component_create_instances]
>  invalid interface "bogus"
> Process 0 sending 10 to 1, tag 201 (2 processes in ring)
> Process 0 sent to 1
> Process 0 decremented value: 9
> Process 0 decremented value: 8
> Process 0 decremented value: 7
> Process 0 decremented value: 6
> Process 0 decremented value: 5
> Process 0 decremented value: 4
> Process 0 decremented value: 3
> Process 0 decremented value: 2
> Process 0 decremented value: 1
> Process 0 decremented value: 0
> Process 0 exiting
> Process 1 exiting
> [15:42] savbu-usnic:~/svn/ompi-1.6/examples %
> -
>
> Are there other cases that I'm missing where we *do* abort?
>
> If so, we should probably be consistent: pick one way (abort or not abort) 
> and do that in all cases.  I don't think I have much of an opinion here on 
> which way we should go; I can see multiple arguments:
>
> - We should abort: we have a large precedent in many other place in OMPI that 
> if a human asks for something OMPI can't deliver, we abort and make the human 
> figure it out.
>
> - We should warn/not abort: this is the behavior we've had for a long time.  
> Changing it may break backwards compatibility.
>
>
>
>> If I correctly understand your commit, it change this [so far
>> consistent] behavior for a single of our TCP MCA parameter (if_seq)
>>

Re: [OMPI devel] [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp

2013-02-04 Thread George Bosilca

If it ain't broke, don't fix it. I am more than skeptical about the
interest of this new notation.

The two behaviors you describe for include and exclude do not look
conflicting to me. Inclusion is a strong request, the user enforce the
usage of a specific interface. If the interface is not available, then
we have a problem. Exclude on the other side, must enforce that a
specific interface is not in use, fact that can be quite simple if the
interface is not available.

I'm not a fan of the nowarn option. Seems like a lot of code with
limited interest, especially if we only plan to support it in TCP.

If you need specialized arguments for some of your nodes here is what
I do: rename the binaries to .orig, and use the original name to
create a sh script that will change the value of mca_param_files to
something based on the host name (if such a file exists) and then call
the .orig executable. Works like a charm., even when a batch scheduler
is used.

  George.

On Mon, Feb 4, 2013 at 12:02 PM, Jeff Squyres (jsquyres)
 wrote:
> On Feb 1, 2013, at 9:59 PM, "Barrett, Brian W"  wrote:
>
>> I don't think this is right either. Excluding a device that doesn't exist 
>> has many use cases. Such as disabling a network that only exists on part of 
>> the cluster.  I'm not sure about what to do with seq; it's more like include 
>> than exclude.
>
> Hmm.  I've now given this quite a bit of thought.  Here's what I think:
>
> 1. Just like there might be good reasons to exclude non-existent interfaces 
> (e.g., networks that only include on part of the cluster), the same argument 
> could be made for *including* non-existent interfaces.
>
> 2. It seems odd to me to have different behavior for non-existent interfaces 
> between include, exclude, and/or seq.
>
> 3. We have a very strong precedent throughout OMPI that if a human asks for 
> something that OMPI can't deliver, OMPI should error.  According to this, and 
> according to the Law of Least Surprise, I would think that if I typo an 
> exclude interface name, OMPI should error and make a human figure it out.
>
> 4. If someone wants different includes/excludes in different parts of the 
> cluster, then they should have per-node values for these MCA params.
>
> 5. That being said, #4 is not always feasible.  Concrete example (which is 
> why this whole thing started, incidentally): in my MTT cluster at Cisco, I 
> have *some* nodes with back-to-back interfaces.  I can't think of a good way 
> to have per-node MCA params in an MTT run that is SLURM-queued and may end up 
> on random nodes in my cluster -- that may or may not include nodes with 
> loopback interfaces.
>
> So how about this compromise:
>
> If an invalid include, exclude, or if_seq interface is specified:
> - If that interface is prefaced with "nowarn:", silently ignore that token
> - Otherwise, display a show_help message and ignore the TCP BTL
>
> For example:
>
> mpirun --mca btl_tcp_if_include nowarn:eth5,eth6
>
> - If eth5 doesn't exist, the job will continue just as if eth5 wasn't 
> specified
> - If eth6 doesn't exist, the TCP BTL will disqualify itself
>
> (BTW: yes, I'm volunteering to code up whatever we agree on)
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r28029 - trunk/opal/class

2013-02-04 Thread George Bosilca

Ralph,

There are valid reasons why we decided not to add such macros.

Adding elements to a list do not increase the element ref count.
Similarly, removing an element from a list does not decrease its
refcount either. Thus, there is no obvious link between the refcount
of the elements in a list and the list itself. As a result, we can not
make the assumption that decreasing the refcount by one is correct,
and this even when we plan to get rid of one of our lists.

In addition, the list can contain elements that have been
OBJ_CONSTRUCT in which case this macro will lead to unexpected
behaviors.

  George.


On Mon, Feb 4, 2013 at 2:42 PM,   wrote:
> Author: rhc (Ralph Castain)
> Date: 2013-02-04 14:42:57 EST (Mon, 04 Feb 2013)
> New Revision: 28029
> URL: https://svn.open-mpi.org/trac/ompi/changeset/28029
>
> Log:
> The opal_list_t destructor doesn't release the items on the list prior to 
> destructing or releasing it. Provide two convenience macros for doing so.
>
> Text files modified:
>trunk/opal/class/opal_list.h |26 ++
>1 files changed, 26 insertions(+), 0 deletions(-)
>
> Modified: trunk/opal/class/opal_list.h
> ==
> --- trunk/opal/class/opal_list.hMon Feb  4 12:36:55 2013
> (r28028)
> +++ trunk/opal/class/opal_list.h2013-02-04 14:42:57 EST (Mon, 04 Feb 
> 2013)  (r28029)
> @@ -160,6 +160,32 @@
>   */
>  typedef struct opal_list_t opal_list_t;
>
> +/** Cleanly destruct a list
> + *
> + * The opal_list_t destructor doesn't release the items on the
> + * list - so provide two convenience macros that do so and then
> + * destruct/release the list object itself
> + *
> + * @param[in] list List to destruct or release
> + */
> +#define OPAL_LIST_DESTRUCT(list)\
> +do {\
> +opal_list_item_t *it;   \
> +while (NULL != (it = opal_list_remove_first(list))) {   \
> +OBJ_RELEASE(it);\
> +}   \
> +OBJ_DESTRUCT(list); \
> +} while(0);
> +
> +#define OPAL_LIST_RELEASE(list) \
> +do {\
> +opal_list_item_t *it;   \
> +while (NULL != (it = opal_list_remove_first(list))) {   \
> +OBJ_RELEASE(it);\
> +}   \
> +OBJ_RELEASE(list);  \
> +} while(0);
> +
>
>  /**
>   * Loop over a list.
> ___
> svn mailing list
> s...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn

Re: [OMPI devel] [OMPI svn] svn:open-mpi r28016 - trunk/ompi/mca/btl/tcp

2013-02-04 Thread George Bosilca

On Mon, Feb 4, 2013 at 8:45 PM, Jeff Squyres (jsquyres)
 wrote:
> That will still be quite difficult to do in MTT.  Remember: all the tests 
> that are run in MTT are shared across all of us via the ompi-tests SVN repo.  
> Are you suggesting that I alias every test in the ompi-tests SVN with a 
> public script that you should run that should look for some site-specific MCA 
> override param file?

As I was doing this with a single file it worked quite nicely, but
with MTT it can easily turn into a horror story. Extending the concept
to a more generic form doesn't seem to complicated (see below).

George.

#!/bin/sh

me=$(hostname)

[ -f $HOME/.openmpi/${me}.conf ] && export
OMPI_MCA_param_files=$HOME/.openmpi/${me}.conf

exec $*

Re: [OMPI devel] MCA variable system slides and notes

2013-02-05 Thread George Bosilca

The major benefit of the second method is that it has the obvious potential to 
save us some memory. Not much I guess, but somewhere in the order of few Kb.

But in order to save this memory, the originator must keep a pointer to the 
data in order to be able to free it after the mca_params framework is closed. 
This means for each string saved (due to the lack of the strdup in the 
mca_params framework), there will be sizeof(char*) bytes spend in bookkeeping. 
Thus the memory savings will be drastically lowered, and the benefit of the 
second approach is strongly compromised.


  George.

On Feb 5, 2013, at 12:46 , Nathan Hjelm  wrote:

> Notes:
> 
> Variable system currently takes ownership of string values. This is done so 
> strings can be freed when overwritten (by mca_base_var_set_value) or when the 
> variable is deregistered. This requires that initial string values be 
> allocated on the heap (not .DATA, heap, etc). Brian raised a good point that 
> projects/frameworks/components should be responsible for freeing anything 
> they allocate and that it shouldn't be the responsibility of the MCA variable 
> system to free these strings (though we have to handle the 
> mca_base_var_set_value case).
> 
> Some options:
> 1) Always duplicate the string passed in by the caller. The caller will have 
> to free the original value if it was allocated. Ex:
> 
> tmp = strdup ("some_string_value");
> backing_store = tmp;
> mca_base_var_register (..., MCA_BASE_VAR_TYPE_STRING, ..., _store);
> free (tmp);
> 
> 2) Add a flag indicating whether the variable system should call free on the 
> initial value. Ex:
> 
> backing_store = "some_string_value";
> mca_base_var_register (..., MCA_BASE_VAR_TYPE_STRING, ..., 
> MCA_BASE_VAR_FLAG_STATIC, ..., _store);
> 
> If the STATIC flag is not set the variable system takes ownership of the 
> string and frees it later. If the STATIC flag is set the variable system can 
> either 1) use the initial value, or 2) strdup the initial value. There are 
> issues with using the initial value without duplication since the registree 
> would need to ensure the initial value lives as long as the registered 
> variable (not a problem if the value is in .DATA or .BSS).
> 
> Thoughts on these options? Other options?
> 
> 
> List of initial supported types is adequate: char *, int, and bool. We can 
> re-evaluate later if there is a need for more types.
> 
> 
> We need to figure out how Open MPI could read all file values and build an 
> environment that could be passed to the backend to prevent the need to read 
> from files on the backend. This may necessitate modifying the mca_base_var 
> API.
> 
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] mpi/java question

2013-02-20 Thread George Bosilca

That is wrong with MPI_INT64_T ? (MPI 3.0 standard page 26.)

  George.

On Feb 20, 2013, at 21:12 , Ralph Castain  wrote:

> 
> On Feb 20, 2013, at 12:08 PM, Dmitri Gribenko  wrote:
> 
>> On Wed, Feb 20, 2013 at 10:05 PM, Ralph Castain  wrote:
>>> 
>>> On Feb 20, 2013, at 11:39 AM, Dmitri Gribenko  wrote:
>>> 
 On Wed, Feb 20, 2013 at 9:34 PM, Jeff Squyres (jsquyres)
  wrote:
> If someone could write some generic java code to figure out the size of a 
> java type (and either printf it out, or write it to a file, or otherwise 
> be able to give that value to a shell script), that would be a good start.
 
 No need for that -- type sizes in Java are fixed.
 
 http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
>>> 
>>> True - but the ones on the C-side are not, and that's the problem.
>> 
>> My point was that there is no need to write java code to detect type
>> sizes.  About C types -- don't we already check those anyway?  Sure,
>> we need to match these with java side, but there's no need to write
>> new code to check type sizes.
> 
> 
> I think you misunderstood - we are talking about writing build-system code 
> that matches the discovered C-type sizes to the corresponding known Java 
> type. This is the source of the reported problem.
> 
> And yes - Jeff misspoke in his note. I've straightened him out over the 
> phone. :-)
> 
>> 
>> Dmitri
>> 
>> -- 
>> main(i,j){for(i=2;;i++){for(j=2;j> (j){printf("%d\n",i);}}} /*Dmitri Gribenko */
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] v1.7.0rc7

2013-02-26 Thread George Bosilca

These warnings are now fixed (r28106). Thanks for reporting them.

  George.

On Feb 26, 2013, at 04:27 , marco atzeri  wrote:

>  CC   to_self.o
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:
>  In function ‘create_indexed_constant_gap_ddt’:
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:48:5:
>  warning: ‘MPI_Type_struct’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1579): MPI_Type_struct is superseded by 
> MPI_Type_create_struct in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:
>  In function ‘create_indexed_gap_ddt’:
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:89:5:
>  warning: ‘MPI_Address’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1057): MPI_Address is superseded by MPI_Get_address 
> in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:90:5:
>  warning: ‘MPI_Address’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1057): MPI_Address is superseded by MPI_Get_address 
> in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:93:5:
>  warning: ‘MPI_Type_struct’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1579): MPI_Type_struct is superseded by 
> MPI_Type_create_struct in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:99:5:
>  warning: ‘MPI_Address’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1057): MPI_Address is superseded by MPI_Get_address 
> in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:100:5:
>  warning: ‘MPI_Address’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1057): MPI_Address is superseded by MPI_Get_address 
> in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:105:5:
>  warning: ‘MPI_Type_struct’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1579): MPI_Type_struct is superseded by 
> MPI_Type_create_struct in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:
>  In function ‘create_indexed_gap_optimized_ddt’:
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:139:5:
>  warning: ‘MPI_Type_struct’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1579): MPI_Type_struct is superseded by 
> MPI_Type_create_struct in MPI-2.0
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:
>  In function ‘do_test_for_ddt’:
> /pub/devel/openmpi/openmpi-1.7rc7-1/src/openmpi-1.7rc7/test/datatype/to_self.c:307:5:
>  warning: ‘MPI_Type_extent’ is deprecated (declared at 
> ../../ompi/include/mpi.h:1541): MPI_Type_extent is superseded by 
> MPI_Type_get_extent in MPI-2.0
>  CCLD to_self.exe

Re: [OMPI devel] Open MPI BTL meeting in Knoxville

2013-03-05 Thread George Bosilca

All,

[This is in complement to the internal notes that Jeff sent out earlier.]

As you might have heard some of us had a meeting few weeks ago at UTK to talk 
about the BTL, and their possible move down at the OPAL level. As a result 
several key components have been identified as susceptible candidates that must 
be moved prior to the BTL. You might have already noticed some of the changes 
identified during this meeting have already begun.

Here is a comprehensive list of things to be moved. The ones marked with * have 
been already completed.
* Modex (ortedb)
- Mpool + rcache + conv
* Help messaging (get rid of ompi_show_help by replacing it with opal_show_help)
- RML / OOB
- BTL

Two additional things to be addressed and clearly defined during this move are:

- Naming + Endpoints: For now we'll go with an uint64_t packaged as an OPAL 
type (to be defined). This naming will only be used during the initial steps, 
up to when the upper layer (RTE or OMPI) is taking control, and the 
corresponding naming scheme will be used. This name is provided by the upper 
layer, OPAL will only used it as an index in the opal_db.

- Threads safety: Minimize the locking per unit of usage. For this we will 
cleanup the locking to only keep two methods: lower and upper (almost as they 
are today). The meaning is that lower case __always__ has the protected 
meaning, while the upper-case will be surrounded by an "if(threads_active)". 
Moreover, the upper-case version will be removed from the OPAL level into the 
OMPI label (thus their name will change from OPAL_* to OMPI_*). 

>From a technical perspective, few other ideas have bee throw around:

- orte_show_help should lose the DECLSPEC, and it's usage should be confined to 
the ORTE layer.
- fix UDCM !!!
- everything that is not performance critical from the MPI standard will be 
protected by a big lock. One lock per type of resources (attributes, info, 
whatever else)
- redo the dynamic processing layer

After all these discussions we ended up with a plan to move forward.
Step -2: Remove Solaris threads
Step -1: Fix UDCM/openib
Step 0: opal_db/modex_db down in the OPAL
Step 0.5: shared opal_db on the node (may be delayed it is not critical).
Step 1: move the BTLs and all the other needed components.
Step 2: Always enable locking in BTL. Evaluate the impact on the performance 
before enabling.
Step 3: Fix the atomics (lower case and upper case). The condition in 
upper-case should disappear.
Step 4: Fix the perfs (if necessary), and redesign the locking strategy.

  George.






On Mar 5, 2013, at 16:33 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> Sorry it took so long to forward these notes to everyone.  Here's some notes 
> from the BTL meeting we had in Knoxville a few weeks ago.
> 
>> Date: 
>>   Feb. 12, 2013
>> 
>> People: 
>>   Thomas Herault
>>   George Bosilca
>>   Jeff Squyres
>>   Brian Barrett
>>   Aurelien Bouteiller
>>   Ralf Castain
>>   Nathan Hjelmn
>> 
>> Goal: 
>>   Lay out the general design of moving the BTL framework into OPAL. 
>> 
>> 
>> -== Identifying dependencies ==-
>> 
>> BTL
>> +--> Modex
>> +--> Mpool + rcache + conv
>> +--> bml / allocators
>> +--> Help/*
>> +--> Naming + Endpoints?
>> +--> (RML/OOB)
>> +--> Threads
>> 
>> 
>>  ACTION PLAN 
>> 
>> 0. Remove Solaris Threads (--with-thread option is attached)
>> 1. Opal DB/modex
>> 1.b OpenIB UDCM independent from OOB
>> 2. Move BTL down to OPAL
>> 3. Move to locks to lowercase versions (that are always locking), look at 
>> perf.
>> 4. Look at conditions, atomics, etc
>> 4.5: add big locks on things that are maybe not thread safe and not 
>> performance critical
>> 5. Fix perf/redesign locking (in SM, in particular)
>> 6. Use BTL tcp in place of OOB in ORTE
>> 
>> 
>>  DETAIL OF ISSUES 
>> 
>> -== IB BTL boostrapping ==-
>> 
>> IB BTL is the only one that depends on OOB/RML
>> Options: 
>> 1. Use the TCP BTL to boostrap IB BTL. Brian doesn't like this, because
>> making it available is an enabler for bad practice that will creep in 
>> the codebase
>> 2. Remove OOB, Fix UDCM so it stops doing things it should not have 
>> done anyway. 
>> We settle for option 2.
>> 
>> -== SM initialization ==- 
>> 
>> Some technical discussion on the way the shared segment is created and 
>> the sync mechanism for the shared file. There are a number of issues, 
>> that seem to benefit from the fact that the modex synchronize before 
>> we attempt the file access. There may be trouble if the modex is 
>> removed (or is not

Re: [OMPI devel] RFC: assert() to ensure OBJ_CONSTRUCT'ed objects don't get destroyed

2013-03-07 Thread George Bosilca

Please refrain from doing so, the assumption #1 this patch is based on is 
false. First, OBJ_CONSTRUCT can be run to construct a specific type of object 
in a preallocated memory region (not only on the stack or heap). In fact, it is 
the only way we can dynamically initialize an object in a memory allocated 
outside the OBJ_NEW system. Second, OBJ_CONSTRUCT is not necessarily matched 
with an OBJ_DESTRUCT, it work just fine with OBJ_RELEASE. In fact I use these 
feature in several places.

An example will be a memory region without a predefined size, that I manipulate 
as opal_list_item_t. This fragment gets allocated when it's size is know, then 
gets OBJ_CONSTRUCT'ed and then used. The reference count is playing its role, 
when nobody is using the object anymore, it will be automatically released. 
With the change you propose such usage will be prohibited. 

The feature you are looking for, the one that might have shorten Ralph's 
debugging time, is already in the opal_object_t. One should use the 
cls_init_file_name and cls_init_lineno fields to see where the object was first 
initialized as these fields are set either by the OBJ_NEW or by the 
OBJ_CONSTRUCT call.

  George.

PS: The second patch (ref count == 1 in OBJ_DESTRUCT) is trivial but reasonable.

On Mar 7, 2013, at 22:10 , Jeff Squyres (jsquyres)  wrote:

> WHAT: Simple patch that will fail an assert() if you OBJ_CONSTRUCT an object 
> and its ref count goes to 0 (in debug builds only).
> 
> WHY: To catch bugs.
> 
> WHERE: opal/class/opal_class.h
> 
> WHEN: Timeout of 1 week -- COB, Thurs, 14 Mar, 2013
> 
> MORE DETAIL:
> 
> On the call on Tuesday, we talked about some ideas for catching bugs with 
> respect to object reference counting.  After the call, Brian, Ralph, and I 
> came up with two useful asserts to help catch bugs (in debug builds only):
> 
> 1. If you OBJ_CONSTRUCT an object, its ref count should never go to zero.
> 2. When a object is destroyed, its refcount should be exactly 1.
> 
> This RFC is for #1.  The attached patch doesn't seem to cause any problems 
> (and we didn't expect it to).  But it's a good addition to the other 
> asserts() that are already in the object code already.
> 
> As for #2, Ralph has previously found bugs in the ORTE layer that would have 
> been much easier to track down if #2 were in place.  I'll send an RFC for #2 
> when I have managed to fix all the problems that it has found in the OMPI 
> layer...  :-)
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> Index: opal/class/opal_object.h
> ===
> --- opal/class/opal_object.h  (revision 28147)
> +++ opal/class/opal_object.h  (working copy)
> @@ -169,7 +169,7 @@
>  * @param NAME   Name of the class to initialize
>  */
> #if OPAL_ENABLE_DEBUG
> -#define OPAL_OBJ_STATIC_INIT(BASE_CLASS) { OPAL_OBJ_MAGIC_ID, 
> OBJ_CLASS(BASE_CLASS), 1, __FILE__, __LINE__ }
> +#define OPAL_OBJ_STATIC_INIT(BASE_CLASS) { OPAL_OBJ_MAGIC_ID, 1, 
> OBJ_CLASS(BASE_CLASS), 1, __FILE__, __LINE__ }
> #else
> #define OPAL_OBJ_STATIC_INIT(BASE_CLASS) { OBJ_CLASS(BASE_CLASS), 1 }
> #endif
> @@ -184,6 +184,10 @@
> /** Magic ID -- want this to be the very first item in the
> struct's memory */
> uint64_t obj_magic_id;
> +/* flag whether this was initialized using construct
> + * versus obj_new
> + */
> +bool constructed;
> #endif
> opal_class_t *obj_class;/**< class descriptor */
> volatile int32_t obj_reference_count;   /**< reference count */
> @@ -252,6 +256,7 @@
> object->obj_magic_id = OPAL_OBJ_MAGIC_ID;
> object->cls_init_file_name = file;
> object->cls_init_lineno = line;
> +object->constructed = false;
> return object;
> }
> #define OBJ_NEW(type)   \
> @@ -313,6 +318,8 @@
> assert(NULL != ((opal_object_t *) (object))->obj_class);\
> assert(OPAL_OBJ_MAGIC_ID == ((opal_object_t *) 
> (object))->obj_magic_id); \
> if (0 == opal_obj_update((opal_object_t *) (object), -1)) { \
> +/* constructed objects are not allowed to reach 0 */\
> +assert(!(((opal_object_t *) (object))->constructed));   \
> OBJ_SET_MAGIC_ID((object), 0);  \
> opal_obj_run_destructors((opal_object_t *) (object));   \
> OBJ_REMEMBER_FILE_AND_LINENO( object, __FILE__, __LINE__ ); \
> @@ -344,6 +351,7 @@
> OBJ_CONSTRUCT_INTERNAL((object), OBJ_CLASS(type));  \
> } while (0)
> 
> +#if OPAL_ENABLE_DEBUG
> #define OBJ_CONSTRUCT_INTERNAL(object, type)\
> do {\
> OBJ_SET_MAGIC_ID((object), OPAL_OBJ_MAGIC_ID);  \
> @@ -352,11 +360,24 @@
> }

Re: [OMPI devel] RFC: assert() to ensure OBJ_CONSTRUCT'ed objects don't get destroyed

2013-03-08 Thread George Bosilca

On Mar 8, 2013, at 11:55 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> On Mar 7, 2013, at 7:37 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> An example will be a memory region without a predefined size, that I 
>> manipulate as opal_list_item_t. This fragment gets allocated when it's size 
>> is know, then gets OBJ_CONSTRUCT'ed and then used. The reference count is 
>> playing its role, when nobody is using the object anymore, it will be 
>> automatically released. With the change you propose such usage will be 
>> prohibited. 
> 
> Ah, ok -- are you saying to do the following:
> 
> myobj = malloc(...);
> OBJ_CONSTRUCT(myobj, ...);
> 
> ?
> 
> If so, yes, I agree, #1 would disallow that [valid] use case.  And we 
> wouldn't want to disallow that.

Yes, this is what I do. As I explained in my previous email, I do't have other 
choice, as I don't know the size of the object before-hand (preventing the 
usage of OBJ_NEW). I thought about a new version of OBJ_NEW with an argument 
(which will be the size of memory to allocate), but I discarded it as an 
overkill.

> But that's ok; #2 is the important one -- #1 just seemed like a good 
> compliment to what was already there that we figured we'd do at the same 
> time.  But we didn't know if there were other valid use cases that #1 would 
> violate, which is why we RFC'ed/asked.
> 
>> PS: The second patch (ref count == 1 in OBJ_DESTRUCT) is trivial but 
>> reasonable.
> 
> Yeah -- unfortunately, while the patch to add that assert() is trivial, it's 
> finding lots of ref counting bugs in the MPI layer, so I don't want to commit 
> it yet.  :-)  I'll come back with more info after I've sorted through them…

Correctly used, OBJ_NEW / OBJ_CONSTRUCT / OBJ_DESTRUCT, are not a bad set of 
macros. When an object is not needed and known not to be refereed anymore, it 
can safely be OBJ_DESTRUCT despite the fact that its reference count is not 1. 
Otherwise, in all BTLs we will have to put all fragments back in the right 
place, remove them from all lists, before calling the destructor. In other 
words, no collection class in Open MPI will work correctly, especially not the 
one we use the most often the ompi_free_list.

  George.

> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: assert() to ensure OBJ_CONSTRUCT'ed objects don't get destroyed

2013-03-08 Thread George Bosilca

I'm sorry Ralph, I'm puzzled by your approach. You knowingly use a broken 
example to justify a patch that under correct/consistent usage solve a 
non-issue?

Why can't you use any decent memory debugger (fence, purify or valgrind) to 
identify the issue you describe below?

  George.

On Mar 8, 2013, at 02:19 , Ralph Castain <r...@open-mpi.org> wrote:

> 
> On Mar 7, 2013, at 4:37 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> Please refrain from doing so, the assumption #1 this patch is based on is 
>> false. First, OBJ_CONSTRUCT can be run to construct a specific type of 
>> object in a preallocated memory region (not only on the stack or heap). In 
>> fact, it is the only way we can dynamically initialize an object in a memory 
>> allocated outside the OBJ_NEW system. Second, OBJ_CONSTRUCT is not 
>> necessarily matched with an OBJ_DESTRUCT, it work just fine with 
>> OBJ_RELEASE. In fact I use these feature in several places.
>> 
>> An example will be a memory region without a predefined size, that I 
>> manipulate as opal_list_item_t. This fragment gets allocated when it's size 
>> is know, then gets OBJ_CONSTRUCT'ed and then used. The reference count is 
>> playing its role, when nobody is using the object anymore, it will be 
>> automatically released. With the change you propose such usage will be 
>> prohibited. 
>> 
>> The feature you are looking for, the one that might have shorten Ralph's 
>> debugging time, is already in the opal_object_t. One should use the 
>> cls_init_file_name and cls_init_lineno fields to see where the object was 
>> first initialized as these fields are set either by the OBJ_NEW or by the 
>> OBJ_CONSTRUCT call.
> 
> Not exactly. Consider the case where we have a library - e.g., ORTE. Down in 
> the library, perhaps several function calls down, we receive a pointer to an 
> object. The library, not knowing any better, uses OBJ_RETAIN to indicate that 
> this object is being used and therefore should not be released. It then 
> returns and allows an async procedure to run.
> 
> Up above, one caller to the library uses OBJ_NEW to create the object. Thus, 
> the reference count system is in-play and governs when the data goes away.
> 
> However, in another place, the caller uses OBJ_CONSTRUCT to initialize the 
> object, and OBJ_DESTRUCTs it when the high-level call returns. In this case, 
> the reference count system is ignored - OBJ_DESTRUCT destroys the object 
> regardless of the reference count. So suddenly the library is working with 
> garbage, with no way to know that it happened.
> 
> So now let's be specific to see how your suggestion doesn't solve the problem 
> (I actually had tried it). Consider the OOB operating asynchronously. In the 
> first case, where the opal_buffer_t is created via OBJ_NEW, we can point the 
> message system at the data field in the buffer and just OBJ_RETAIN it.
> 
> However, in the second case, the OBJ_RETAIN won't work - the calling code 
> releases the data area, but the OOB has no idea that happened. So the pointer 
> to the data field actually isn't valid any more, but there is no way to 
> detect it. Likewise, looking for opal_object_t info is useless as the fields 
> no longer exist.
> 
> Yes, I realize this is an incorrect program - the caller isn't allowed to 
> release data until the async operation concludes. But the fact that a library 
> function, especially one that is hidden down low in the code base, is async 
> may not be immediately apparent. The resulting "bug" is extremely hard to 
> chase down, especially as it almost inevitably is exposed as a race condition.
> 
> The goal of this RFC was to make such problems immediately apparent. Perhaps 
> one solution would be to get rid of OBJ_DESTRUCT and just have everyone use 
> OBJ_RELEASE, augmented with a flag to indicate whether or not to free the 
> backing memory. Another might be to have OBJ_DESTRUCT respect ref counts, but 
> Jeff, Brian and I didn't like that one. The proposed solution was another way 
> to approach it that would force the above scenario to be recognized only 
> during debug builds, thus allowing it to be identified and corrected.
> 
> Your example use-case is certainly something I hadn't considered - so perhaps 
> we'll have to think of another way to detect my situation while still 
> allowing what you do, or perhaps add a new OBJ_CONSTRUCT_WITH_RETAIN (or 
> whatever) macro for use in your use-case that corrects the ref count?
> 
> 
>> 
>> George.
>> 
>> PS: The second patch (ref count == 1 in OBJ_DESTRUCT) is trivial but 
>> reasonable.
>> 
>> On Mar 7, 2013, at 22:10 , Jeff Squyres (jsquyres) <jsquy...

Re: [OMPI devel] RFC: assert() to ensure OBJ_CONSTRUCT'ed objects don't get destroyed

2013-03-08 Thread George Bosilca


On Mar 8, 2013, at 15:10 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> Isn't that what assert()'s are for?  :-)
> 
> The point is that for Ralph's case, the #2 assert would cause a notification 
> *right at the point of the error* (i.e., the errant DESTRUCT).  The other 
> tools you mention all cause notifications long after the DESTRUCT (i.e., when 
> the freed memory is used, at which time it may be totally jumbled/corrupt/the 
> FILE/line info gone).  You know how hard it can be to track down memory 
> corruption; having a notification right at the point of the error is a Good 
> Thing.

This is indeed true. However, a tool like valgrind keep trace of where the 
memory was allocated and freed, so this might provide enough info to identify 
and fix the issue.

> Don't forget that we have lots of other asserts in the OBJ system (in the 
> debug case only, of course).
> 
> And per my other mail, if I find legitimate cases where the destructors are 
> invoked where refcount!=1, we'll figure something else out.

I am not arguing about the validity of the assert, but about the fact that now 
we can't just OBJ_DESTRUCT something, we are required to first set it's 
refcount to 1.

I have a more advanced use case for you. Based on the MPI standard, 
MPI_Finalize can be called while the user still has non-complete requests 
returned by any of the non-blocking calls (there are some drawbacks of course, 
but this is not specifically prohibited). Thus, these requests will not have a 
ref count to 1, so they will not be able to be destructed. This is exactly what 
our code is doing today:

pml_base_close.c:58:OBJ_DESTRUCT(_pml_base_send_requests);
pml_base_close.c:59:OBJ_DESTRUCT(_pml_base_recv_requests);

and then ompi_freelist.c:86.

  George.


> 
> 
> 
> On Mar 8, 2013, at 7:22 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> I'm sorry Ralph, I'm puzzled by your approach. You knowingly use a broken 
>> example to justify a patch that under correct/consistent usage solve a 
>> non-issue?
>> 
>> Why can't you use any decent memory debugger (fence, purify or valgrind) to 
>> identify the issue you describe below?
>> 
>> George.
>> 
>> On Mar 8, 2013, at 02:19 , Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> 
>>> On Mar 7, 2013, at 4:37 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>> 
>>>> Please refrain from doing so, the assumption #1 this patch is based on is 
>>>> false. First, OBJ_CONSTRUCT can be run to construct a specific type of 
>>>> object in a preallocated memory region (not only on the stack or heap). In 
>>>> fact, it is the only way we can dynamically initialize an object in a 
>>>> memory allocated outside the OBJ_NEW system. Second, OBJ_CONSTRUCT is not 
>>>> necessarily matched with an OBJ_DESTRUCT, it work just fine with 
>>>> OBJ_RELEASE. In fact I use these feature in several places.
>>>> 
>>>> An example will be a memory region without a predefined size, that I 
>>>> manipulate as opal_list_item_t. This fragment gets allocated when it's 
>>>> size is know, then gets OBJ_CONSTRUCT'ed and then used. The reference 
>>>> count is playing its role, when nobody is using the object anymore, it 
>>>> will be automatically released. With the change you propose such usage 
>>>> will be prohibited. 
>>>> 
>>>> The feature you are looking for, the one that might have shorten Ralph's 
>>>> debugging time, is already in the opal_object_t. One should use the 
>>>> cls_init_file_name and cls_init_lineno fields to see where the object was 
>>>> first initialized as these fields are set either by the OBJ_NEW or by the 
>>>> OBJ_CONSTRUCT call.
>>> 
>>> Not exactly. Consider the case where we have a library - e.g., ORTE. Down 
>>> in the library, perhaps several function calls down, we receive a pointer 
>>> to an object. The library, not knowing any better, uses OBJ_RETAIN to 
>>> indicate that this object is being used and therefore should not be 
>>> released. It then returns and allows an async procedure to run.
>>> 
>>> Up above, one caller to the library uses OBJ_NEW to create the object. 
>>> Thus, the reference count system is in-play and governs when the data goes 
>>> away.
>>> 
>>> However, in another place, the caller uses OBJ_CONSTRUCT to initialize the 
>>> object, and OBJ_DESTRUCTs it when the high-level call returns. In this 
>>> case, the reference count system is ignored - OBJ_DESTRUCT destroys the 
>>> object regardless of t

Re: [OMPI devel] RFC: assert() to ensure OBJ_CONSTRUCT'ed objects don't get destroyed

2013-03-08 Thread George Bosilca

On Mar 8, 2013, at 15:56 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:

> On Mar 8, 2013, at 9:39 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> I have a more advanced use case for you. Based on the MPI standard, 
>> MPI_Finalize can be called while the user still has non-complete requests 
>> returned by any of the non-blocking calls (there are some drawbacks of 
>> course, but this is not specifically prohibited).
> 
> Actually, it is prohibited by MPI-3 p359:41-48:
> 
> Before an MPI process invokes MPI_FINALIZE, the process must perform all MPI 
> calls needed to complete its involvement in MPI communications: It must 
> locally complete all MPI operations that it initiated and must execute 
> matching calls needed to complete MPI communications initiated by other 
> processes. For example, if the process executed a non- blocking send, it must 
> eventually call MPI_WAIT, MPI_TEST, MPI_REQUEST_FREE, or any derived 
> function; if the process is the target of a send, then it must post the 
> matching receive; if it is part of a group executing a collective operation, 
> then it must have completed its participation in the operation.
> 
>> Thus, these requests will not have a ref count to 1, so they will not be 
>> able to be destructed. This is exactly what our code is doing today:
>> 
>> pml_base_close.c:58:OBJ_DESTRUCT(_pml_base_send_requests);
>> pml_base_close.c:59:OBJ_DESTRUCT(_pml_base_recv_requests);
>> 
>> and then ompi_freelist.c:86.
> 
> 
> If the app REQUEST_FREE'd a nonblocking send/receive, don't we block in 
> ompi_mpi_finalize() before the call to pml_base_close(), such that the PMLs 
> will be drained before we get to destroying the PMLs?

We don't, as we have no way of knowing there are pending requests in the 
pipeline. There is a separation between who create the requests and who release 
them. They are created by the selected PML, and are destroyed by the base, 
after the selected PML has been turned off.

  George.


> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: assert() to ensure OBJ_CONSTRUCT'ed objects don't get destroyed

2013-03-08 Thread George Bosilca

It depends. The usage of MPI is valid. Totally weird and absolutely grotesque, 
but valid. What is invalid is the access of the array value. There is no 
completion call for the irecv and no guarantee for completion on MPI_Finalize, 
so making a decision on the content of buf[i] is incorrect.

I think the rationale for allowing MPI_Request_free was to take advantage of 
the FIFO ordering for the match to allow the user to implement it's own 
consistency protocols.

   if (0 == rank) {
   MPI_Isend(buf, SIZE, MPI_CHAR, 1, 123, MPI_COMM_WORLD, );
   MPI_Request_free();
   MPI_Send(buf, SIZE, MPI_CHAR, 1, 123, MPI_COMM_WORLD, );
   } else if (1 == rank) {
   MPI_Irecv(buf, SIZE, MPI_CHAR, 0, 123, MPI_COMM_WORLD, );
   MPI_Request_free();
   MPI_Recv(buf, SIZE, MPI_CHAR, 0, 123, MPI_COMM_WORLD, );
   }

Is a valid usage.

  George.

On Mar 8, 2013, at 16:38 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:

> On Mar 8, 2013, at 10:20 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>>> If the app REQUEST_FREE'd a nonblocking send/receive, don't we block in 
>>> ompi_mpi_finalize() before the call to pml_base_close(), such that the PMLs 
>>> will be drained before we get to destroying the PMLs?
>> 
>> We don't, as we have no way of knowing there are pending requests in the 
>> pipeline. There is a separation between who create the requests and who 
>> release them. They are created by the selected PML, and are destroyed by the 
>> base, after the selected PML has been turned off.
> 
> 
> Here's an interesting case -- do you think that this is a valid MPI 
> application?  And if it is, what is the expected behavior?
> 
> -
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> #define SIZE 33554432
> 
> int main(int argc, char *argv[])
> {
>int i, rank;
>char *buf;
>MPI_Request req;
> 
>MPI_Init(NULL, NULL);
>MPI_Comm_rank(MPI_COMM_WORLD, );
> 
>buf = malloc(SIZE);
>assert(buf);
>memset(buf, rank, SIZE);
>if (0 == rank) {
>MPI_Isend(buf, SIZE, MPI_CHAR, 1, 123, MPI_COMM_WORLD, );
>MPI_Request_free();
>} else if (1 == rank) {
>MPI_Irecv(buf, SIZE, MPI_CHAR, 0, 123, MPI_COMM_WORLD, );
>MPI_Request_free();
>}
> 
>MPI_Finalize();
> 
>if (1 == rank) {
>for (i = 0; i < SIZE; ++i) {
>assert(1 == buf[i]);
>}
>}
> 
>return 0;
> }
> -
> 
> On the SVN trunk, this application will fail the assert(1 == buf[i]).
> 
> MPI-3 p360 shows a *similar* case, but it's not exactly the same (example 8.7 
> shows Request_free one *one* side, not *both* sides).
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: assert() to ensure OBJ_CONSTRUCT'ed objects don't get destroyed

2013-03-08 Thread George Bosilca

On Mar 8, 2013, at 17:37 , "Jeff Squyres (jsquyres)"  wrote:

> He removed a bunch of text in the middle (see 
> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/143).  In short: there is 
> NO way for a user to know when a REQUEST_FREEd request has completed, because 
> *matching* happens in order.  In your example below, it's possible for the 
> Send to overtake the Isend, as long as the matching happened in order:
> 
>>  if (0 == rank) {
>>  MPI_Isend(buf, SIZE, MPI_CHAR, 1, 123, MPI_COMM_WORLD, );
>>  MPI_Request_free();
>>  MPI_Send(buf, SIZE, MPI_CHAR, 1, 123, MPI_COMM_WORLD, );
>>  } else if (1 == rank) {
>>  MPI_Irecv(buf, SIZE, MPI_CHAR, 0, 123, MPI_COMM_WORLD, );
>>  MPI_Request_free();
>>  MPI_Recv(buf, SIZE, MPI_CHAR, 0, 123, MPI_COMM_WORLD, );
>>  }

Right, because both operations are non-blocking only the matching is important. 
Moreover, using MPI_Send instead of the MPI_Isend is not good either, as the 
MPI_Send does not guarantee completion. If we go for a MPI_Recv instead of the 
MPI_Irecv on the rank 1 first operation, my example become correct.

> Regardless, this therefore probably makes a case for destroying something 
> when the refcount is 1.  I had to stop working on that for the moment and 
> will likely get back to it next week -- I'll check and see what happens; it 
> may still be possible that those lists are empty when we close the PML, 
> anyway.

Check pml_ob1_component.c:230. The commented out test seems to be doing what 
you look for, making sure that when the PML is closed no allocated requests are 
outside of the free list (like the matching or pending queues).

  George.

Re: [OMPI devel] assert in opal_datatype_is_contiguous_memory_layout

2013-04-08 Thread George Bosilca

Eric,

Thanks for the report. I used your example to replicate the issue and I confirm 
it appears in all versions in debug mode. However, the assert in the convertor 
code is correct and your code as well. The issue is more complex, and it is 
triggered by a usage of the convertor which should have been prevented.

If I'm not mistaken, Edgar (CC'ed on this email) is the maintainer of that 
particular code path. Hopefully, he will be able to fix the code based on the 
following analysis.

The underlying issue is that when the convertor is created with no data to 
convert, it is automatically marked as COMPLETED. Once in this state, no 
further conversion calls should be made, or they will trigger the issue you 
encountered. Unfortunately, the code in the OMPIIO doesn't check if there is 
more data to handle before going into the opal_convertor_raw function (function 
which as I said above is not supposed to be called on a completed convertor). 
The function ompi_io_ompio_decode_datatype, assume that there is at least one 
segment in the file, fact that explain the call to opal_convertor_raw. 

I modified the ompi_convertor_raw to accept he case where the convertor is 
already completed and return the same value as opal_convertor_pack/unpack 
(r28305), so now we have a consistent interface for the convertor. However, 
this lead to a division with zero in the OMPIIO layer as the number of iovecs 
returned by opal_convertor_raw is now zero, and this is not handled. I hope 
Edgar will be able to fix that part.

  George.

On Apr 5, 2013, at 23:10 , Eric Chamberland  
wrote:

> Hi all,
> 
> (Sorry, I have sent this to "users" but I should have sent it to "devel" list 
> instead.  Sorry for the mess...)
> 
> I have attached a very small example which raise an assertion.
> 
> The problem is arising from a process which does not have any element to 
> write in a file (and then in the MPI_File_set_view)...
> 
> You can see this "bug" with openmpi 1.6.3, 1.6.4 and 1.7.0 configured with:
> 
> ./configure --enable-mem-debug --enable-mem-profile --enable-memchecker
> --with-mpi-param-check --enable-debug
> 
> Just compile the given example (idx_null.cc) as-is with
> 
> mpicxx -o idx_null idx_null.cc
> 
> and run with 3 processes:
> 
> mpirun -n 3 idx_null
> 
> You can modify the example by commenting "#define WITH_ZERO_ELEMNT_BUG" to 
> see that everything is going well when all processes have something to write.
> 
> There is no "bug" if you use openmpi 1.6.3 (and higher) without the debugging 
> options.
> 
> Also, all is working well with mpich-3.0.3 configured with:
> 
> ./configure --enable-g=yes
> 
> 
> So, is this a wrong "assert" in openmpi?
> 
> Is there a real problem to use this example in a "release" mode?
> 
> Thanks,
> 
> Eric
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r28319 - trunk/opal/datatype

2013-04-10 Thread George Bosilca

2 * yes.

  George.

On Apr 10, 2013, at 15:04 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
wrote:

> George --
> 
> I'm guessing this should be CMR'ed to v1.7, right?
> 
> Does it need to go to v1.6, too?
> 
> 
> On Apr 9, 2013, at 7:01 PM, svn-commit-mai...@open-mpi.org wrote:
> 
>> Author: bosilca (George Bosilca)
>> Date: 2013-04-09 19:01:54 EDT (Tue, 09 Apr 2013)
>> New Revision: 28319
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/28319
>> 
>> Log:
>> Fix an issue identified by Thomas Jahns and his colleague when the data
>> representation is not correctly optimized (it is off by the extend).
>> 
>> During the data representation process, if the opportunity to merge several
>> items appear, we replace them with the new merged element. However, if one
>> of the components of this merged element was comming from a "loop 
>> representation"
>> then the new first element of this loop must have a displacement moved by the
>> extent of the loop.
>> 
>> Text files modified: 
>>  trunk/opal/datatype/opal_datatype_optimize.c |56 
>> +++ 
>>  1 files changed, 33 insertions(+), 23 deletions(-)
>> 
>> Modified: trunk/opal/datatype/opal_datatype_optimize.c
>> ==
>> --- trunk/opal/datatype/opal_datatype_optimize.c Tue Apr  9 18:08:03 
>> 2013(r28318)
>> +++ trunk/opal/datatype/opal_datatype_optimize.c 2013-04-09 19:01:54 EDT 
>> (Tue, 09 Apr 2013)  (r28319)
>> @@ -73,15 +73,12 @@
>> {
>>dt_elem_desc_t* pElemDesc;
>>ddt_elem_desc_t opt_elem;
>> -OPAL_PTRDIFF_TYPE last_disp = 0;
>>dt_stack_t* pStack;/* pointer to the position on the stack */
>>int32_t pos_desc = 0;  /* actual position in the description of 
>> the derived datatype */
>> -int32_t stack_pos = 0, last_type = OPAL_DATATYPE_UINT1;
>> -int32_t type = OPAL_DATATYPE_LOOP, nbElems = 0, changes = 0;
>> -int32_t optimized = 0, continuity;
>> +int32_t stack_pos = 0, last_type = OPAL_DATATYPE_UINT1, last_length = 0;
>> +int32_t type = OPAL_DATATYPE_LOOP, nbElems = 0, continuity;
>> +OPAL_PTRDIFF_TYPE total_disp = 0, last_extent = 1, last_disp = 0;
>>uint16_t last_flags = 0x;  /* keep all for the first datatype */
>> -OPAL_PTRDIFF_TYPE total_disp = 0, last_extent = 1;
>> -int32_t last_length = 0;
>>uint32_t i;
>> 
>>pStack = (dt_stack_t*)alloca( sizeof(dt_stack_t) * 
>> (pData->btypes[OPAL_DATATYPE_LOOP]+2) );
>> @@ -134,7 +131,8 @@
>>/* the whole loop is contiguous */
>>if( !continuity ) {
>>if( 0 != last_length ) {
>> -CREATE_ELEM( pElemDesc, last_type, 
>> OPAL_DATATYPE_FLAG_BASIC, last_length, last_disp, last_extent );
>> +CREATE_ELEM( pElemDesc, last_type, 
>> OPAL_DATATYPE_FLAG_BASIC,
>> + last_length, last_disp, 
>> last_extent );
>>pElemDesc++; nbElems++;
>>last_length = 0;
>>}
>> @@ -144,9 +142,9 @@
>>   + loop->loops * end_loop->size);
>>last_type   = OPAL_DATATYPE_UINT1;
>>last_extent = 1;
>> -optimized++;
>>} else {
>>int counter = loop->loops;
>> +OPAL_PTRDIFF_TYPE merged_disp = 0;
>>/* if the previous data is contiguous with this piece and 
>> it has a length not ZERO */
>>if( last_length != 0 ) {
>>if( continuity ) {
>> @@ -155,27 +153,42 @@
>>last_type= OPAL_DATATYPE_UINT1;
>>last_extent  = 1;
>>counter--;
>> +merged_disp = loop->extent;  /* merged loop, 
>> update the disp of the remaining elems */
>>}
>> -CREATE_ELEM( pElemDesc, last_type, 
>> OPAL_DATATYPE_FLAG_BASIC, last_length, last_disp, last_extent );
>> +CREATE_ELEM( pElemDesc, last_type, 
>> OPAL_DATATYPE_FLAG_BASIC,
>> + last_length, last_disp, last_extent );
>>pElemDesc++; nbElems++;
>>

Re: [OMPI devel] Bugfix for pending zero byte packages

2013-04-25 Thread George Bosilca

Sure, it should be included in the 1.6 as well.

  George.

On Apr 25, 2013, at 03:39 , Jeff Squyres (jsquyres)  wrote:

> Ok; thanks.
> 
> It looks like this should go to v1.6, too -- right (Nathan/George/Brian)?
> 
> 
> 
> On Apr 24, 2013, at 9:31 PM, Ralph Castain  wrote:
> 
>> This was already resolved - Nathan applied it, and it has been moved into 
>> v1.7
>> 
>> On Apr 24, 2013, at 5:53 PM, "Jeff Squyres (jsquyres)"  
>> wrote:
>> 
>>> George / Brian / Nathan --
>>> 
>>> Can you guys comment on this patch?
>>> 
>>> 
>>> On Apr 4, 2013, at 4:40 PM, Martin SCHREIBER  
>>> wrote:
>>> 
 Dear developers,
 
 it seems that for messages of size 0 no convertor is created due to
 optimizations issues.
 However, this is not considered in MCA_PML_OB1_SEND_REQUEST_RESET when
 processing pending send requests.
 
 A fix (or something similar) for this issue is provided below. Please be
 aware, that I'm not an OpenMPI developer and that e. g. req_bytes_packed
 is the wrong field to check.
 
 If you have any questions or if you need any further information (stack
 trace, etc.), don't hesitate to ask!
 
 Best regards & thank you for developing OpenMPI,
 
 Martin
 
 
 
 
 
 
 affected versions:
openmpi-1.7
openmpi-1.6.4
and probably versions below
 
 
 file which needs a fix:
pml_ob1_sendreq.h
 
 
 Inserting the if statement of the code-snippet below fixes the problem for 
 me.
 
 
 #define MCA_PML_OB1_SEND_REQUEST_RESET(sendreq)
 \
 /* check for zero-package since convertor is not created for zero-size 
 packages */  \
 if (sendreq->req_send.req_bytes_packed > 0)
 \
 {  
 \
 size_t _position = 0;  
 \
 opal_convertor_set_position(>req_send.req_base.req_convertor, 
 \
 &_position);   
 \
 assert( 0 == _position );  
 \
 }
 
 -- 
 Dipl.-Inf. Martin Schreiber
 Chair of Scientific Computing, http://www5.in.tum.de
 Technische Universität München, Fakultät für Informatik
 Boltzmannstr. 3 / Zi. 2.5.57, 85748 Garching, Germany
 Phone: +49-89-289-18630, Fax: +49-89-289-18607
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [EXTERNAL] Developer meeting: mid/late summer?

2013-04-27 Thread George Bosilca

I would but that particular week I'm teaching a summer school. Hopefully you 
can setup a webex.

  George.

On Apr 27, 2013, at 00:21 , "Jeff Squyres (jsquyres)"  
wrote:

> Ok, we can probably do this.
> 
> Is anyone else interested?
> 
> 
> On Apr 24, 2013, at 1:25 PM, "Barrett, Brian W"  wrote:
> 
>> I could probably do Monday afternoon and Tuesday morning?  The problem with 
>> Friday afternoon is that it means I can't grab the Friday afternoon flights 
>> home and have to stay until Saturday.  Monday afternoon / Tuesday morning 
>> would mean no weekend travel.
>> 
>> Brian
>> 
>> --
>> Brian W. Barrett
>> Scalable System Software Group
>> Sandia National Laboratories
>> 
>> From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] on behalf of 
>> Jeff Squyres (jsquyres) [jsquy...@cisco.com]
>> Sent: Wednesday, April 24, 2013 7:54 AM
>> To: Open MPI Developers List
>> Subject: [EXTERNAL] [OMPI devel] Developer meeting: mid/late summer?
>> 
>> The idea came up on the call yesterday that it might well be worthwhile to 
>> have another development meeting mid/late summer.  Of particular interest is 
>> design discussions about asynchronous progress.
>> 
>> The next MPI Forum meeting runs 1pm June 4 (Tue) through noon June 7 (Fri).  
>> Assuming that several/many of us may be attending that meeting any way, we 
>> could try to schedule around that (e.g., morning of Tuesday and/or afternoon 
>> of Friday).
>> 
>> The next MPI Forum meeting is in Madrid (11-13 Sep), followed by EuroMPI 
>> (15-18 Sep).  That doesn't seem like a good opportunity for an Open MPI dev 
>> meeting.
>> 
>> Thoughts?
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [OMPI svn] svn:open-mpi r28417 - trunk/ompi/mca/vprotocol/base

2013-04-30 Thread George Bosilca

This commit broke the trunk.

  George.

On Apr 30, 2013, at 17:21 , svn-commit-mai...@open-mpi.org wrote:

> Author: hjelmn (Nathan Hjelm)
> Date: 2013-04-30 11:21:42 EDT (Tue, 30 Apr 2013)
> New Revision: 28417
> URL: https://svn.open-mpi.org/trac/ompi/changeset/28417
> 
> Log:
> vprotocol: remove the old output and use the framework output
> 
> Text files modified: 
>   trunk/ompi/mca/vprotocol/base/base.h  | 1 - 
>   
>   trunk/ompi/mca/vprotocol/base/vprotocol_base_select.c | 2 +-
>   
>   2 files changed, 1 insertions(+), 2 deletions(-)
> 
> Modified: trunk/ompi/mca/vprotocol/base/base.h
> ==
> --- trunk/ompi/mca/vprotocol/base/base.h  Tue Apr 30 06:10:23 2013
> (r28416)
> +++ trunk/ompi/mca/vprotocol/base/base.h  2013-04-30 11:21:42 EDT (Tue, 
> 30 Apr 2013)  (r28417)
> @@ -21,7 +21,6 @@
> BEGIN_C_DECLS
> 
> struct mca_pml_v_t {
> -int output;
> size_t  host_pml_req_recv_size;
> size_t  host_pml_req_send_size;
> mca_pml_base_component_thost_pml_component;
> 
> Modified: trunk/ompi/mca/vprotocol/base/vprotocol_base_select.c
> ==
> --- trunk/ompi/mca/vprotocol/base/vprotocol_base_select.c Tue Apr 30 
> 06:10:23 2013(r28416)
> +++ trunk/ompi/mca/vprotocol/base/vprotocol_base_select.c 2013-04-30 
> 11:21:42 EDT (Tue, 30 Apr 2013)  (r28417)
> @@ -124,7 +124,7 @@
> free(om);
> }
> 
> -mca_base_components_close(mca_pml_v.output, 
> +mca_base_components_close(ompi_vprotocol_base_framework.framework_output,
>   
> _vprotocol_base_framework.framework_components, 
>   (mca_base_component_t *) best_component);
> 
> ___
> svn mailing list
> s...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27880 - trunk/ompi/request

2013-04-30 Thread George Bosilca

Takahiro,

I went over this ticket and attached a new patch. Basically I went over all the 
possible cases, both in test and wait, and ensure the behavior is always 
consistent. Please give it a try, and let us know of the outcome.

  Thanks,
George.



On Jan 25, 2013, at 00:53 , "Kawashima, Takahiro" <t-kawash...@jp.fujitsu.com> 
wrote:

> Jeff,
> 
> I've filed the ticket.
> https://svn.open-mpi.org/trac/ompi/ticket/3475
> 
> Thanks,
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
> 
>> Many thanks for the summary!
>> 
>> Can you file tickets about this stuff against 1.7?  Included your patches, 
>> etc. 
>> 
>> These are pretty obscure issues and I'm ok not fixing them in the 1.6 branch 
>> (unless someone has a burning desire to get them fixed in 1.6). 
>> 
>> But we should properly track and fix these in the 1.7 series. I'd mark them 
>> as "critical" so that they don't get lost in the wilderness of other bugs. 
>> 
>> Sent from my phone. No type good. 
>> 
>> On Jan 22, 2013, at 8:57 PM, "Kawashima, Takahiro" 
>> <t-kawash...@jp.fujitsu.com> wrote:
>> 
>>> George,
>>> 
>>> I reported the bug three months ago.
>>> Your commit r27880 resolved one of the bugs reported by me,
>>> in another approach.
>>> 
>>> http://www.open-mpi.org/community/lists/devel/2012/10/11555.php
>>> 
>>> But other bugs are still open.
>>> 
>>> "(1) MPI_SOURCE of MPI_Status for a null request must be MPI_ANY_SOURCE."
>>> in my previous mail is not fixed yet. This can be fixed by my patch
>>> (ompi/mpi/c/wait.c and ompi/request/request.c part only) attached
>>> in my another mail.
>>> 
>>> http://www.open-mpi.org/community/lists/devel/2012/10/11561.php
>>> 
>>> "(2) MPI_Status for an inactive request must be an empty status."
>>> in my previous mail is partially fixed. MPI_Wait is fixed by your
>>> r27880. But MPI_Waitall and MPI_Testall should be fixed.
>>> Codes similar to your r27880 should be inserted to
>>> ompi_request_default_wait_all and ompi_request_default_test_all.
>>> 
>>> You can confirm the fixes by the test program status.c attached in
>>> my previous mail. Run with -n 2. 
>>> 
>>> http://www.open-mpi.org/community/lists/devel/2012/10/11555.php
>>> 
>>> Regards,
>>> Takahiro Kawashima,
>>> MPI development team,
>>> Fujitsu
>>> 
>>>> To be honest it was hanging in one of my repos for some time. If I'm not 
>>>> mistaken it is somehow related to one active ticket (but I couldn't find 
>>>> the info). It might be good to push it upstream.
>>>> 
>>>> George.
>>>> 
>>>> On Jan 22, 2013, at 16:27 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
>>>> wrote:
>>>> 
>>>>> George --
>>>>> 
>>>>> Is there any reason not to CMR this to v1.6 and v1.7?
>>>>> 
>>>>> 
>>>>> On Jan 21, 2013, at 6:35 AM, svn-commit-mai...@open-mpi.org wrote:
>>>>> 
>>>>>> Author: bosilca (George Bosilca)
>>>>>> Date: 2013-01-21 06:35:42 EST (Mon, 21 Jan 2013)
>>>>>> New Revision: 27880
>>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27880
>>>>>> 
>>>>>> Log:
>>>>>> My understanding is that an MPI_WAIT() on an inactive request should
>>>>>> return the empty status (MPI 3.0 page 52 line 46).
>>>>>> 
>>>>>> Text files modified: 
>>>>>> trunk/ompi/request/req_wait.c | 3 +++
>>>>>>  
>>>>>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>>>>> 
>>>>>> Modified: trunk/ompi/request/req_wait.c
>>>>>> ==
>>>>>> --- trunk/ompi/request/req_wait.cSat Jan 19 19:33:42 2013(r27879)
>>>>>> +++ trunk/ompi/request/req_wait.c2013-01-21 06:35:42 EST (Mon, 21 
>>>>>> Jan 2013)(r27880)
>>>>>> @@ -61,6 +61,9 @@
>>>>>>  }
>>>>>>  if( req->req_persistent ) {
>>>>>>  if( req->req_state == OMPI_REQUEST_INACTIVE ) {
>>>>>> +if (MPI_STATUS_IGNORE != status) {
>>>>>> +*status = ompi_status_empty;
>>>>>> +}
>>>>>>  return OMPI_SUCCESS;
>>>>>>  }
>>>>>>  req->req_state = OMPI_REQUEST_INACTIVE;
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Datatype initialization bug?

2013-05-17 Thread George Bosilca

Takahiro,

Nice catch, I really wonder how this one survived for soo long. I pushed a 
patch in r28535 addressing this issue. It is not the best solution, but it 
provide an easy way to address the issue.

A little bit of history. A datatype is composed by (let's keep it short) 2 
component, a high-level description containing among others the size and the 
name of the datatype and a low level description (the desc_t part) containing 
the basic predefined elements in the datatype. As most of the predefined 
datatypes defined in the MPI layer are synonyms to some basic predefined 
datatypes (such as the equivalent POSIX types MPI_INT32_T), the design of the 
datatype allowed for the sharing of the desc_t part between datatypes. This 
approach allows us to have similar datatypes (MPI_INT and MPI_INT32_T) with 
different names but with the same backend internal description. However, when 
we split the datatype engine in two, we duplicate this common description (in 
OPAL and OMPI). The OMPI desc_t was pointing to OPAL desc_t for almost 
everything … except the datatypes that were not defined by OPAL such as the 
Fortran one. This turned the management of the common desc_t into a nightmare … 
with the effect you noticed few days ago. Too bad for the optimization part. I 
now duplicate the desc_t between the two layers, and all OMPI datatypes have 
now their own desc_t.

Thanks for finding and analyzing so deeply this issue.
  George.

On May 16, 2013, at 12:04 , KAWASHIMA Takahiro  
wrote:

> Hi,
> 
> I'm reading the datatype code in Open MPI trunk and have a question.
> A bit long.
> 
> See the following program.
> 
> 
> #include 
> #include 
> 
> struct opal_datatype_t;
> extern int opal_init(int *pargc, char ***pargv);
> extern int opal_finalize(void);
> extern void opal_datatype_dump(struct opal_datatype_t *type);
> extern struct opal_datatype_t opal_datatype_int8;
> 
> int main(int argc, char **argv)
> {
>opal_init(NULL, NULL);
>opal_datatype_dump(_datatype_int8);
>MPI_Init(NULL, NULL);
>opal_datatype_dump(_datatype_int8);
>MPI_Finalize();
>opal_finalize();
>return 0;
> }
> 
> 
> All variables/functions declared as 'extern' are defined in OPAL.
> opal_datatype_dump() function outputs internal data of a datatype.
> I expect the same output on two opal_datatype_dump() calls.
> But when I run it on an x86_64 machine, I get the following output.
> 
> 
> ompi-trunk/opal-datatype-dump && ompiexec -n 1 ompi-trunk/opal-datatype-dump
> [ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 
> 1 used 1
> true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8)
> nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---]
>   contain OPAL_INT8
> --C---P-D--[---][---]  OPAL_INT8 count 1 disp 0x0 (0) extent 8 (size 8)
> No optimized description
> 
> [ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 
> 1 used 1
> true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8)
> nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---]
>   contain OPAL_INT8
> --C---P-D--[---][---]   count 1 disp 0x0 (0) extent 8 (size 
> 8971008)
> No optimized description
> 
> 
> The former output is what I expected. But the latter one is not
> identical to the former one and its content datatype has no name
> and a very large size.
> 
> This line is output in opal_datatype_dump_data_desc() function in
> opal/datatype/opal_datatype_dump.c file. It refers
> opal_datatype_basicDatatypes[pDesc->elem.common.type]->name and
> opal_datatype_basicDatatypes[pDesc->elem.common.type]->size for
> the content datatype.
> 
> In this case, pDesc->elem.common.type is
> opal_datatype_int8.desc.desc[0].elem.common.type and is initialized to 7
> in opal_datatype_init() function in opal/datatype/opal_datatype_module.c
> file, which is called during opal_init() function.
> opal_datatype_int8.desc.desc points _datatype_predefined_elem_desc[7*2].
> 
> But if we call MPI_Init() function, the value is overwritten.
> ompi_datatype_init() function in ompi/datatype/ompi_datatype_module.c
> file, which is called during MPI_Init() function, has similar
> procedure to initialize OMPI datatypes.
> 
> On initializing ompi_mpi_aint in it, ompi_mpi_aint.dt.super.desc.desc
> points _datatype_predefined_elem_desc[7*2], which is also pointed
> by opal_datatype_int8, because ompi_mpi_aint is defined by
> OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE macro and it uses
> OPAL_DATATYPE_INITIALIZER_INT8 macro. So
> opal_datatype_int8.desc.desc[0].elem.common.type is overwritten
> to 37.
> 
> Therefore in the second opal_datatype_dump() function call in my
> program,

Re: [OMPI devel] Datatype initialization bug?

2013-05-22 Thread George Bosilca

Takahiro,

I used your second patch the one that remove the copy of the description in the 
OMPI level (r28553). Thanks for your help and your patience in investigating 
this issues.

  George.


On May 22, 2013, at 02:05 , "Kawashima, Takahiro"  
wrote:

> George,
> 
> Thanks for your quick response.
> Your fix seemed good to me last week, but this week my colleague
> found it's not sufficient. There are two issues.
> 
> (A) We should update opt_desc too.
> 
>In current ompi_datatype_init, we copy OPAL desc to OMPI desc.
>But opt_desc still points to OPAL desc. We should update
>opt_desc to point copied OMPI desc.
> 
> (B) Fortran desc is not properly set.
> 
>See the attached result-before.txt. It is the output of the
>attached show_ompi_datatype.c. Fortran basic datatypes,
>like MPI_CHARACTER, MPI_REAL, MPI_DOUBLE_PRECISION, have
>wrong desc_t.
> 
>It is because these datatypes are statically initialized with
>OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE_FORTRAN macro and
>desc and opt_desc point to one element of
>ompi_datatype_predefined_elem_desc array with an OPAL index.
>For example, desc of ompi_mpi_character points to
>ompi_datatype_predefined_elem_desc[OPAL_DATATYPE_INT1].
>If we use ompi_datatype_predefined_elem_desc, we should use
>an OMPI datatype index (OMPI_DATATYPE_MPI_INT8_T etc.) and not
>an OPAL datatype index (OPAL_DATATYPE_INT1 etc.).
> 
>Therefore the condition (pDesc != datatype->super.desc.desc)
>in ompi_datatype_init becomes true and we copy desc from the
>wrong part currently.
>i.e. copy from ompi_datatype_predefined_elem_desc[OPAL_DATATYPE_INT1]
>  to   
> ompi_datatype_predefined_elem_desc[OMPI_DATATYPE_MPI_CHARACTER].
> 
> The initialization part of ompi_mpi_character in
> ompi_datatype_internal.h and ompi_datatype_module.c:
> 
> ompi_predefined_datatype_t ompi_mpi_character =  
> OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE_FORTRAN (INT, CHARACTER, 1, 
> OPAL_ALIGNMENT_CHAR, 0 );
> 
> #define OMPI_DATATYPE_INITIALIZER_FORTRAN( TYPE, NAME, SIZE, ALIGN, FLAGS )   
>\
>{  
>   \
>OPAL_OBJ_STATIC_INIT(opal_datatype_t), 
>   \
>OPAL_DATATYPE_FLAG_BASIC | 
>   \
>OMPI_DATATYPE_FLAG_PREDEFINED |
>   \
>OMPI_DATATYPE_FLAG_DATA_FORTRAN | (FLAGS) /*flag*/,
>   \
>OPAL_DATATYPE_ ## TYPE ## SIZE /*id*/, 
>   \
>(((uint32_t)1)<<(OPAL_DATATYPE_ ## TYPE ## SIZE)) /*bdt_used*/,
>   \
>SIZE /*size*/, 
>   \
>0 /*true_lb*/, SIZE /*true_ub*/, 0 /*lb*/, SIZE /*ub*/,
>   \
>(ALIGN) /*align*/, 
>   \
>1 /*nbElems*/, 
>   \
>OPAL_DATATYPE_INIT_NAME(TYPE ## SIZE) /*name*/,
>   \
>OMPI_DATATYPE_INIT_DESC_PREDEFINED(TYPE, SIZE) /*desc*/,   
>   \
>OMPI_DATATYPE_INIT_DESC_PREDEFINED(TYPE, SIZE) /*opt_desc*/,   
>   \
>OPAL_DATATYPE_INIT_BTYPES_ARRAY_ ## TYPE ## SIZE /*btypes*/
>   \
> 
> #define OMPI_DATATYPE_INIT_DESC_PREDEFINED(TYPE, SIZE)
>\
>{  
>   \
>1 /*length*/, 1 /*used*/,  
>   \
>&(ompi_datatype_predefined_elem_desc[2 * OPAL_DATATYPE_ ## TYPE ## 
> SIZE]) /*desc*/ \
>}
> 
> int32_t ompi_datatype_init( void )
> {
>int32_t i;
> 
>for( i = 0; i < OMPI_DATATYPE_MPI_MAX_PREDEFINED; i++ ) {
>ompi_datatype_t* datatype = 
> (ompi_datatype_t*)ompi_datatype_basicDatatypes[i];
>dt_elem_desc_t* pDesc;
> 
>if( 0 == datatype->super.size ) continue;
> 
>/**
> * Most of the OMPI datatypes have been initialized with the basic 
> desc of the
> * OPAL datatypes. Thus don't modify the desc, instead rebase the desc 
> back into
> * the OMPI predefined_elem_desc and update the fields there.
> */
>pDesc = _datatype_predefined_elem_desc[2 * i];
>if( pDesc != datatype->super.desc.desc ) {
>memcpy(pDesc, datatype->super.desc.desc, 2 * 
> sizeof(dt_elem_desc_t));
>datatype->super.desc.desc = pDesc;
>} else {
>datatype->super.desc.desc[0].elem.common.flags = 
> OPAL_DATATYPE_FLAG_PREDEFINED |
> 
> OPAL_DATATYPE_FLAG_DATA

Re: [OMPI devel] RFC: Add static initializer for opal_mutex_t

2013-06-08 Thread George Bosilca

All Windows objects that are managed as HANDLES can easily be modified to have 
static initializer. A clean solution is attached to the question at 
stackoverflow:
http://stackoverflow.com/questions/3555859/is-it-possible-to-do-static-initialization-of-mutexes-in-windows

That being said I think having a static initializer for a synchronization 
object is a dangerous thing. It has many subtleties and too many hidden 
limitations. As an example they can only be used on the declaration of the 
object, and can't be safely used for locally static object (they must be 
global).

What are the instances in the Open MPI code where such a statically defined 
mutex need to be used before it has a chance of being correctly initialized?

  George.


On Jun 7, 2013, at 18:38 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:

> Perhaps I was wrong -- I thought we had no static initializer because there 
> was no static initializer for mutexes in windows.  
> 
> 
> On Jun 7, 2013, at 9:28 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> Im curious to know why Windows support is to be blamed for the lack of such 
>> functionality?
>> 
>> George.
>> 
>> On Jun 7, 2013, at 18:08 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>> wrote:
>> 
>>> Nathan forgot to mention that we didn't have this before because of 
>>> Windows.  But now we don't have Windows support, so...
>>> 
>>> 
>>> On Jun 7, 2013, at 9:01 AM, "Hjelm, Nathan T" <hje...@lanl.gov> wrote:
>>> 
>>>> What: Add a static initializer for opal_mutex_t for both posix and solaris 
>>>> threads.
>>>> 
>>>> Why: Enables the use of opal locks that don't have to be OBJ_CONSTRUCT'ed.
>>>> 
>>>> When: This is a trivial addition but I would like some review/testing of 
>>>> the code (I don't have solaris). Setting timeout to Tuesday, June 11, 2013
>>>> 
>>>> 
>>>> diff --git a/opal/threads/mutex_unix.h b/opal/threads/mutex_unix.h
>>>> index 27528e6..28b1744 100644
>>>> --- a/opal/threads/mutex_unix.h
>>>> +++ b/opal/threads/mutex_unix.h
>>>> @@ -81,6 +81,25 @@ OPAL_DECLSPEC OBJ_CLASS_DECLARATION(opal_mutex_t);
>>>> * POSIX threads
>>>> /
>>>> 
>>>> +#if !OPAL_ENABLE_MULTI_THREADS && OPAL_ENABLE_DEBUG
>>>> +#define OPAL_MUTEX_STATIC_INIT  \
>>>> +  { \
>>>> +  .super = OPAL_OBJ_STATIC_INIT(opal_object_t), \
>>>> +  .m_lock_pthread = PTHREAD_MUTEX_INITIALIZER,  \
>>>> +  .m_lock_debug = 0,\
>>>> +  .m_lock_file = NULL,  \
>>>> +  .m_lock_line = 0, \
>>>> +  .m_lock_atomic = 0\
>>>> +  }
>>>> +#else
>>>> +#define OPAL_MUTEX_STATIC_INIT  \
>>>> +  { \
>>>> +  .super = OPAL_OBJ_STATIC_INIT(opal_object_t), \
>>>> +  .m_lock_pthread = PTHREAD_MUTEX_INITIALIZER,  \
>>>> +  .m_lock_atomic = 0\
>>>> +  }
>>>> +#endif
>>>> +
>>>> static inline int opal_mutex_trylock(opal_mutex_t *m)
>>>> {
>>>> #if OPAL_ENABLE_DEBUG
>>>> @@ -130,6 +149,25 @@ static inline void opal_mutex_unlock(opal_mutex_t *m)
>>>> * Solaris threads
>>>> /
>>>> 
>>>> +#if !OPAL_ENABLE_MULTI_THREADS && OPAL_ENABLE_DEBUG
>>>> +#define OPAL_MUTEX_STATIC_INIT  \
>>>> +  { \
>>>> +  .super = OPAL_OBJ_STATIC_INIT(opal_object_t), \
>>>> +  .m_lock_solaris = DEFAULTMUTEX,   \
>>>> +  .m_lock_debug = 0,\
>>>> +  .m_lock_file = NULL,  \
>>>> +  .m_lock_line = 0, \
>>>> +  .m_lock_atomic = 0\
>>>> +  }
>>>> +#else
>>>> +#define OPAL_MUTEX_STATIC_INIT  \
>>>> +  { \
>>>> +  .super = OPAL_OBJ_STATIC_INIT(opal_object_t), \
>>>> +

Re: [OMPI devel] RFC: Add static initializer for opal_mutex_t

2013-06-10 Thread George Bosilca


On Jun 10, 2013, at 17:18 , Nathan Hjelm <hje...@lanl.gov> wrote:

> On Sat, Jun 08, 2013 at 12:28:02PM +0200, George Bosilca wrote:
>> All Windows objects that are managed as HANDLES can easily be modified to 
>> have static initializer. A clean solution is attached to the question at 
>> stackoverflow:
>> http://stackoverflow.com/questions/3555859/is-it-possible-to-do-static-initialization-of-mutexes-in-windows
> 
> Not the cleanest solution (and I don't know how handles work) so I held off 
> on proposing adding a static initializer until the windows code was gone.

Nothing really fancy, a HANDLE is basically an untyped location storage (a 
void*).

>> That being said I think having a static initializer for a synchronization 
>> object is a dangerous thing. It has many subtleties and too many hidden 
>> limitations. As an example they can only be used on the declaration of the 
>> object, and can't be safely used for locally static object (they must be 
>> global).
> 
> I have never seen any indication that a statically initialized mutex is not 
> safe for static objecs. The man page for thread_mutex_init uses the static 
> initializer on a static mutex: http://linux.die.net/man/3/pthread_mutex_init

It is thread safe for global static objects, but might not be thread safe for 
local static objects.

>> What are the instances in the Open MPI code where such a statically defined 
>> mutex need to be used before it has a chance of being correctly initialized?
> 
> MPI_T_thread_init may be called from any thread (or multiple threads at the 
> same time). The current code uses atomics to protect the initialization of 
> the mutex. I would prefer to declare the mpit lock like:
> 
> opal_mutex_t mpit_big_lock = OPAL_MUTEX_STATIC_INIT;
> 
> and remove the atomics. It would be much cleaner and should work fine on all 
> currently supported platforms.

OK, almost a corner-case.

> how does mutex static initializer works

A more detailed explanation in the "Static Initializers for Mutexes and 
Condition Variables" part of the 
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_init.html

  George.

> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: improve the hash function used by opal_hash_table_t

2013-06-11 Thread George Bosilca

The one-at-the-time version computes on chars, if the performance of the hash 
function is a critical element in the equation then you will be better off 
avoiding its usage. I would suggest going with Murmur 
(http://en.wikipedia.org/wiki/MurmurHash) instead, which is faster and perform 
well in random distribution. Another interesting features is that there are 
specialized derivative for strings based key, a feature that might prove 
helpful with the MCA parameters and the MPI_T stuff.

  George.


On Jun 11, 2013, at 23:32 , Nathan Hjelm  wrote:

> What: Implement a better hash function in opal_hash_table_t. The function is 
> a simple one-at-a-time Jenkin's hash (see 
> http://en.wikipedia.org/wiki/Jenkins_hash_function) and has good collision 
> rates and isn't overly complex or slow.
> 
> Why: I am preparing an update to the MCA variable system (adding performance 
> variables) which will include a hash-based lookup function (in preperation 
> for the inevitable MPI_T_cvar/pvar/category lookup functions-- MPI 3.0 
> errata). The current hash function is not very good so now seems like a good 
> time to update it.
> 
> When: Will push this to trunk on Thursday if there are no objections.
> 
> Patch below
> 
> Index: opal/class/opal_hash_table.c
> ===
> --- opal/class/opal_hash_table.c  (revision 28609)
> +++ opal/class/opal_hash_table.c  (working copy)
> @@ -356,14 +356,20 @@
> static inline uint32_t opal_hash_value(size_t mask, const void *key,
>size_t keysize)
> {
> -size_t h, i;
> -const unsigned char *p;
> -
> -h = 0;
> -p = (const unsigned char *)key;
> -for (i = 0; i < keysize; i++, p++)
> -h = HASH_MULTIPLIER*h + *p;
> -return (uint32_t)(h & mask);
> +const unsigned char *p = (const unsigned char *) key;
> +uint32_t h = 0, i;
> +
> +for (i = 0 ; i < keysize ; ++i, ++p) {
> +h += *p;
> +h += h << 10;
> +h ^= h >> 6;
> +}
> +
> +h += h << 3;
> +h ^= h >> 11;
> +h += h << 15;
> +
> +return h & (uint32_t) mask;
> }
> 
> int opal_hash_table_get_value_ptr(opal_hash_table_t* ht, const void* key,
> 
> 
> 
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: improve the hash function used by opal_hash_table_t

2013-06-11 Thread George Bosilca

On Jun 12, 2013, at 00:22 , Nathan Hjelm  wrote:

> Though a hardware accelerated crc32 (if available) would probably work great 
> as well.

http://google-opensource.blogspot.fr/2011/04/introducing-cityhash.html
with code available under MIT @ https://code.google.com/p/cityhash/

  George.

Re: [OMPI devel] BTL sendi

2013-06-19 Thread George Bosilca

Then let me provide a more elaborate answer.

In the original design of the btl_sendi operation we do not provide an upper 
limit for the sendi (in the same sense as the eager protocol). Thus, an upper 
layer (PML in this instance) cannot know if the sendi will succeed or not 
before the call itself. Thus, in order to avoid several ping-pong between 
software layers, we force the sendi to either succeed or return a descriptor 
(up to the BTL eager size), identical to what the btl_alloc would have returned.

At this point the PML is forced to pack itself the data in the retuned 
descriptor, without knowing if the BTL is elf can do better (some kind of 
IN_PLACE). Therefore, this approach make sense in the case where a copy of the 
data is to be done, aka. the start_copy function. In the case of start_prepare 
there is no copy of the data as the PML entrust the BTL with the preparation of 
the optimal descriptor.

That being said, I would not be against changing the btl_sendi rules slightly. 
Let's imagine that upon failure to immediately send the data we require from 
btl_sendi to return a descriptor that instead of being similar to what 
btl_alloc would return is similar to what btl_prepare_src would return. Such a 
scenario is possible as both calls have access to the convertor, which is the 
most critical piece of the data. This will cover the case of the start_prepare 
and of the start_copy, as will allow us to expand the usages of the btl_sendi 
capability.

  George.

On Jun 18, 2013, at 22:52 , "Jeff Squyres (jsquyres)"  
wrote:

> George replied to me in IM -- posting here for completeness:
> 
>> Yes, there is a reason. if sendi succeeds, it sends a very small data (at 
>> least on the devices that supports it), otherwise it returns a descriptor 
>> similar to btl_alloc()
>> thus you will have to pack the data yourself, and the PML doesn't know if 
>> IN_PLACE should be used or not
>> thus the resulting solution is slower than the default in the spart_prepare 
>> case (which is prepare_src + send)
> 
> 
> On Jun 14, 2013, at 3:46 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
>> In working on the upcoming Cisco USNIC BTL, we noticed that btl.sendi is 
>> invoked by OB1 in the non-MCA_BTL_FLAGS_SEND_INPLACE case.
>> 
>> Is there a reason for this?  Or is it only because no one who uses INPLACE 
>> has cared about sendi?
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Problem when using struct types at specific offsets

2013-06-21 Thread George Bosilca

Thomas,

I'm not aware about any other issue with the datatypes.

There might an easy way to see what the issue with your application is. If you 
can debug your application, and know exactly which datatype has problems, then 
attach with gdb and call ompi_datatype_dump(type), where type is the datatype 
creating problems. With the resulting output it should be pretty easy to 
reproduce a test case and/or identify the problem.

  George.

On Jun 21, 2013, at 16:33 , Thomas Jahns  wrote:

> our IT service provider has applied the patch to openmpi 1.6.4 and the C
> test-case I provided now works but the original code which uses a bigger 
> number
> of struct dataypes still fails.
> 
> Has anyone already discovered a potential problem with the fix provided in
> r28319? I'm asking because developing the C test case is quite some amount of
> work and is not easily reproducible with every Fortran compiler because it
> depends on the stack layout.
> 
> Regards, Thomas
> -- 
> Thomas Jahns
> DKRZ GmbH, Department: Application software
> 
> Deutsches Klimarechenzentrum
> Bundesstraße 45a
> D-20146 Hamburg
> 
> Phone: +49-40-460094-151
> Fax: +49-40-460094-270
> Email: Thomas Jahns 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RGET issue when send is less than receive

2013-06-21 Thread George Bosilca

The amount of bytes received is atomically updated on the completion callback, 
and the completion test is clearly spelled-out int the 
recv_request_pml_complete_check function (of course minus the lock part). Rolf 
I think your patch is correct.

That being said req_bytes_expected is a special value, one that should only be 
used to check from truncation. Otherwise, req_bytes_packed is the value we 
should compare against.

  George.

On Jun 21, 2013, at 17:40 , Nathan Hjelm  wrote:

> I thought I fixed this problem awhile back (though looking at the code its 
> possible I never committed the fix). I will have to look through my local 
> repository and see what happened to that fix. Your fix might not work 
> correctly since a RGET can be broken up into multiple get operations. It may 
> work, I would just need to test it to make sure.
> 
> -Nathan
> 
> On Fri, Jun 21, 2013 at 08:25:29AM -0700, Rolf vandeVaart wrote:
>> I ran into a hang in a test in which the sender sends less data than the 
>> receiver is expecting.  For example, the following shows the receiver 
>> expecting twice what the sender is sending.
>> 
>> Rank 0:  MPI_Send(buf, BUFSIZE, MPI_INT, 1, 99, MPI_COMM_WORLD)
>> Rank 1: MPI_Recv(buf, BUFSIZE*2,  MPI_INT, 0, 99, MPI_COMM_WORLD)
>> 
>> This is also reproducible using one of the intel tests and adjusting the 
>> eager value for the openib BTL.
>> 
>> ?  mpirun -np 2 -host frick,frack -mca btl_openib_eager_limit 56 
>> MPI_Send_overtake_c
>> 
>> In most cases, this works just fine.  However, when the PML protocol used is 
>> the RGET protocol, the test hangs.   Below is a proposed fix for this issue.
>> I believe we want to be checking against req_bytes_packed rather than 
>> req_bytes_expected as req_bytes_expected is what the user originally told us.
>> Otherwise, with the current code, we never send a FIN message back to the 
>> sender.
>> 
>> Any thoughts?
>> 
>> [rvandevaart@sm065 ompi-trunk]$ svn diff ompi/mca/pml/ob1/pml_ob1_recvreq.c
>> Index: ompi/mca/pml/ob1/pml_ob1_recvreq.c
>> ===
>> --- ompi/mca/pml/ob1/pml_ob1_recvreq.c(revision 28633)
>> +++ ompi/mca/pml/ob1/pml_ob1_recvreq.c (working copy)
>> @@ -335,7 +335,7 @@
>> /* is receive request complete */
>> OPAL_THREAD_ADD_SIZE_T(>req_bytes_received, frag->rdma_length);
>> -if (recvreq->req_bytes_expected <= recvreq->req_bytes_received) {
>> +if (recvreq->req_recv.req_bytes_packed <= recvreq->req_bytes_received) {
>> mca_pml_ob1_send_fin(recvreq->req_recv.req_base.req_proc,
>>   bml_btl,
>>  frag->rdma_hdr.hdr_rget.hdr_des,
>> 
>> 
>> 
>> ---
>> This email message is for the sole use of the intended recipient(s) and may 
>> contain
>> confidential information.  Any unauthorized review, use, disclosure or 
>> distribution
>> is prohibited.  If you are not the intended recipient, please contact the 
>> sender by
>> reply email and destroy all copies of the original message.
>> ---
> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Problem when using MPI_Type_create_struct + MPI_Type_dup

2013-06-24 Thread George Bosilca

Thomas,

I tried your test with the current svn version of the 1.6 (to be 1.6.5 I 
guess), and your test pass without any issues.

  George.

On Jun 24, 2013, at 15:22 , Thomas Jahns  wrote:

> Hello,
> 
> the following code exposes a problem we are experiencing with our OpenMPI
> 1.6.[24] installations.
> 
> My colleague Moritz Hanke isolated the problem to an interaction of
> MPI_Type_create_struct with a previous MPI_Type_dup. When MPI_Type_dup on line
> 67 of the example is replaced with a straight assignment of MPI_INT to 
> sends[0],
> the problem goes away.
> 
> We are using a patched version of OpenMPI which includes the changes from 
> r28319.
> 
> Regards, Thomas Jahns
> -- 
> Thomas Jahns
> DKRZ GmbH, Department: Application software
> 
> Deutsches Klimarechenzentrum
> Bundesstraße 45a
> D-20146 Hamburg
> 
> Phone: +49-40-460094-151
> Fax: +49-40-460094-270
> Email: Thomas Jahns 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] RFC MPI 2.2 Dist_graph addition

2013-06-24 Thread George Bosilca

WHAT:Support for MPI 2.2 dist_graph

WHY: To become [almost entierly] MPI 2.2 compliant

WHEN:Monday July 1st

As discussed during the last phone call, a missing functionality of the MPI 2.2 
standard (the distributed graph topology) is ready for prime-time. The attached 
patch provide a minimal version (no components supporting reordering), that 
will complete the topology support in Open MPI.

It is somehow a major change compared with what we had before and it reshape 
the way we deal with topologies completely. Where our topologies were mainly 
storage components (they were not capable of creating the new communicator as 
an example), the new version is built around a [possibly] common representation 
(in mca/topo/topo.h), but the functions to attach and retrieve the topological 
information are specific to each component. As a result the ompi_create_cart 
and ompi_create_graph functions become useless and have been removed.

In addition to adding the internal infrastructure to manage the topology 
information, it updates the MPI interface, and the debuggers support and 
provides all Fortran interfaces. From a correctness point of view it passes all 
the tests we have in ompi-tests for the cart and graph topology, and some 
tests/applications for the dist_graph interface.

I don't think there is a need for a long wait on this one so I would like to 
propose a short deadline, a week from now on Monday July 1st. A patch based on 
Open MPI trunk r28670 is attached below.

Thanks,
  George.






dist_graph.patch
Description: Binary data

Re: [OMPI devel] Cross Memory Attach support in OpenMPI

2013-06-27 Thread George Bosilca

https://svn.open-mpi.org/trac/ompi/changeset/26134

  George.

On Jun 27, 2013, at 16:43 , Lukasz Flis  wrote:

> Dear All,
> 
> Some time ago there was a discussion on this list regarding enabling CMA
> support in OpenMPI. There were 2 positive votes
> 
> http://www.open-mpi.org/community/lists/devel/2012/01/10208.php
> 
> I have checked the latest releases today and haven't seen any trace of
> CMA support.
> 
> Since CMA is available from kernel 3.2 and in RHEL 6.3 and above maybe
> it would be worth to consider adding this feature?
> 
> Is there any reason for not including the patch yet?
> 
> Best Regards
> --
> Lukasz Flis
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC MPI 2.2 Dist_graph addition

2013-07-01 Thread George Bosilca

Guys,

Thanks for the patch and for the tests. All these changes/cleanups are correct, 
I have incorporate them all in the patch. Please find below the new patch.

As the deadline for the RFC is today, I'll move forward and push the changes 
into the trunk, and if there are still issues we can work them out directly in 
the trunk.

Thanks,
  George.

PS: I will push your tests in our tests base as well.


On Jul 1, 2013, at 06:39 , "Kawashima, Takahiro"  
wrote:

> George,
> 
> My colleague was working on your ompi-topo bitbucket repository
> but it was not completed. But he found bugs in your patch attached
> in your previous mail and created the fixing patch. See the attached
> patch, which is a patch against Open MPI trunk + your patch.
> 
> His test programs are also attached. test_1 and test_2 can run
> with nprocs=5, and test_3 and test_4 can run with nprocs>=3.
> 
> Though I'm not sure about the contents of the patch and the test
> programs, I can ask him if you have any questions.
> 
> Regards,
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
> 
>> WHAT:Support for MPI 2.2 dist_graph
>> 
>> WHY: To become [almost entierly] MPI 2.2 compliant
>> 
>> WHEN:Monday July 1st
>> 
>> As discussed during the last phone call, a missing functionality of the MPI 
>> 2.2 standard (the distributed graph topology) is ready for prime-time. The 
>> attached patch provide a minimal version (no components supporting 
>> reordering), that will complete the topology support in Open MPI.
>> 
>> It is somehow a major change compared with what we had before and it reshape 
>> the way we deal with topologies completely. Where our topologies were mainly 
>> storage components (they were not capable of creating the new communicator 
>> as an example), the new version is built around a [possibly] common 
>> representation (in mca/topo/topo.h), but the functions to attach and 
>> retrieve the topological information are specific to each component. As a 
>> result the ompi_create_cart and ompi_create_graph functions become useless 
>> and have been removed.
>> 
>> In addition to adding the internal infrastructure to manage the topology 
>> information, it updates the MPI interface, and the debuggers support and 
>> provides all Fortran interfaces. From a correctness point of view it passes 
>> all the tests we have in ompi-tests for the cart and graph topology, and 
>> some tests/applications for the dist_graph interface.
>> 
>> I don't think there is a need for a long wait on this one so I would like to 
>> propose a short deadline, a week from now on Monday July 1st. A patch based 
>> on Open MPI trunk r28670 is attached below.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC MPI 2.2 Dist_graph addition

2013-07-01 Thread George Bosilca

The patch has been pushed into the trunk in r28687.

  George.


On Jul 1, 2013, at 13:55 , George Bosilca <bosi...@icl.utk.edu> wrote:

> Guys,
> 
> Thanks for the patch and for the tests. All these changes/cleanups are 
> correct, I have incorporate them all in the patch. Please find below the new 
> patch.
> 
> As the deadline for the RFC is today, I'll move forward and push the changes 
> into the trunk, and if there are still issues we can work them out directly 
> in the trunk.
> 
> Thanks,
>  George.
> 
> PS: I will push your tests in our tests base as well.
> 
> 
> On Jul 1, 2013, at 06:39 , "Kawashima, Takahiro" <t-kawash...@jp.fujitsu.com> 
> wrote:
> 
>> George,
>> 
>> My colleague was working on your ompi-topo bitbucket repository
>> but it was not completed. But he found bugs in your patch attached
>> in your previous mail and created the fixing patch. See the attached
>> patch, which is a patch against Open MPI trunk + your patch.
>> 
>> His test programs are also attached. test_1 and test_2 can run
>> with nprocs=5, and test_3 and test_4 can run with nprocs>=3.
>> 
>> Though I'm not sure about the contents of the patch and the test
>> programs, I can ask him if you have any questions.
>> 
>> Regards,
>> Takahiro Kawashima,
>> MPI development team,
>> Fujitsu
>> 
>>> WHAT:Support for MPI 2.2 dist_graph
>>> 
>>> WHY: To become [almost entierly] MPI 2.2 compliant
>>> 
>>> WHEN:Monday July 1st
>>> 
>>> As discussed during the last phone call, a missing functionality of the MPI 
>>> 2.2 standard (the distributed graph topology) is ready for prime-time. The 
>>> attached patch provide a minimal version (no components supporting 
>>> reordering), that will complete the topology support in Open MPI.
>>> 
>>> It is somehow a major change compared with what we had before and it 
>>> reshape the way we deal with topologies completely. Where our topologies 
>>> were mainly storage components (they were not capable of creating the new 
>>> communicator as an example), the new version is built around a [possibly] 
>>> common representation (in mca/topo/topo.h), but the functions to attach and 
>>> retrieve the topological information are specific to each component. As a 
>>> result the ompi_create_cart and ompi_create_graph functions become useless 
>>> and have been removed.
>>> 
>>> In addition to adding the internal infrastructure to manage the topology 
>>> information, it updates the MPI interface, and the debuggers support and 
>>> provides all Fortran interfaces. From a correctness point of view it passes 
>>> all the tests we have in ompi-tests for the cart and graph topology, and 
>>> some tests/applications for the dist_graph interface.
>>> 
>>> I don't think there is a need for a long wait on this one so I would like 
>>> to propose a short deadline, a week from now on Monday July 1st. A patch 
>>> based on Open MPI trunk r28670 is attached below.
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] Barrier Implementation Oddity

2013-07-01 Thread George Bosilca

Yes, Bruck for barrier is a variant of the dissemination algorithm as described 
in:
 - Debra Hensgen, Raphael Finkel, and Udi Manbet. Two algorithms for barrier 
synchronization. International Journal of Parallel Programming, 17(1):1–17, 
1988.

  George.


On Jun 29, 2013, at 12:05 , Ronny Brendel  wrote:

> Hi,
> 
> I am digging through openmpi to find out what algorithm the barrier actually 
> uses.
> seems to be bruck/dissemination.
> 
> However i stumbled upon something odd.
> in: ompi/mca/coll/tuned/coll_tuned_barrier.c
> function: ompi_coll_tuned_barrier_intra_bruck
> 
> I think you intend (by the comments and the function name) to send to the 
> previous node and receive from the next. But actually it looks like you are 
> doing the reverse. (which should then be the dissemination algorithm)
> 
> It's no big deal, I'm just a bit confused right now, and wonder if I'm 
> missing something. I hope you can help me understand.
> 
> cheers,
> Ronny
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC MPI 2.2 Dist_graph addition

2013-07-01 Thread George Bosilca

Ahem …

  George.



topo.patch
Description: Binary data

On Jul 1, 2013, at 13:55 , George Bosilca <bosi...@icl.utk.edu> wrote:

> Guys,
> 
> Thanks for the patch and for the tests. All these changes/cleanups are 
> correct, I have incorporate them all in the patch. Please find below the new 
> patch.
> 
> As the deadline for the RFC is today, I'll move forward and push the changes 
> into the trunk, and if there are still issues we can work them out directly 
> in the trunk.
> 
> Thanks,
>  George.
> 
> PS: I will push your tests in our tests base as well.
> 
> 
> On Jul 1, 2013, at 06:39 , "Kawashima, Takahiro" <t-kawash...@jp.fujitsu.com> 
> wrote:
> 
>> George,
>> 
>> My colleague was working on your ompi-topo bitbucket repository
>> but it was not completed. But he found bugs in your patch attached
>> in your previous mail and created the fixing patch. See the attached
>> patch, which is a patch against Open MPI trunk + your patch.
>> 
>> His test programs are also attached. test_1 and test_2 can run
>> with nprocs=5, and test_3 and test_4 can run with nprocs>=3.
>> 
>> Though I'm not sure about the contents of the patch and the test
>> programs, I can ask him if you have any questions.
>> 
>> Regards,
>> Takahiro Kawashima,
>> MPI development team,
>> Fujitsu
>> 
>>> WHAT:Support for MPI 2.2 dist_graph
>>> 
>>> WHY: To become [almost entierly] MPI 2.2 compliant
>>> 
>>> WHEN:Monday July 1st
>>> 
>>> As discussed during the last phone call, a missing functionality of the MPI 
>>> 2.2 standard (the distributed graph topology) is ready for prime-time. The 
>>> attached patch provide a minimal version (no components supporting 
>>> reordering), that will complete the topology support in Open MPI.
>>> 
>>> It is somehow a major change compared with what we had before and it 
>>> reshape the way we deal with topologies completely. Where our topologies 
>>> were mainly storage components (they were not capable of creating the new 
>>> communicator as an example), the new version is built around a [possibly] 
>>> common representation (in mca/topo/topo.h), but the functions to attach and 
>>> retrieve the topological information are specific to each component. As a 
>>> result the ompi_create_cart and ompi_create_graph functions become useless 
>>> and have been removed.
>>> 
>>> In addition to adding the internal infrastructure to manage the topology 
>>> information, it updates the MPI interface, and the debuggers support and 
>>> provides all Fortran interfaces. From a correctness point of view it passes 
>>> all the tests we have in ompi-tests for the cart and graph topology, and 
>>> some tests/applications for the dist_graph interface.
>>> 
>>> I don't think there is a need for a long wait on this one so I would like 
>>> to propose a short deadline, a week from now on Monday July 1st. A patch 
>>> based on Open MPI trunk r28670 is attached below.
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

Re: [OMPI devel] RFC MPI 2.2 Dist_graph addition

2013-07-01 Thread George Bosilca

As all examples are working perfectly in my version of the code I was puzzled 
by Jeff's issue. It turns out it's a side effect of trying to push as few items 
as possible instead of just pushing everything in the trunk. I'll fix it in few 
minutes, meanwhile I'll drop few words about what was the issue.

One might have notice that this framework came without any component. The 
reason is that all the components in development are still in "paper in 
progress" stage, and thus not being pushed in the trunk. However, the level of 
functionality required by the MPI 2.2 standard is provided by the functions in 
the base, so it will work reasonably well as is. However, it need a "module", 
otherwise the functions in the base done't have a placeholder to attach to. 
Thus it is crucial to have a decoy component, one that can provide the empty 
module to have the base functions copied over.

So the problem Jeff noticed was the lack of a basic component in the topo 
framework.

  George.



On Jul 1, 2013, at 15:51 , "Jeff Squyres (esquires)" <jsquy...@cisco.com> wrote:

> George --
> 
> All 4 tests fail for me -- can you have a look?
> 
> -
> [6:50] savbu-usnic-a:~/s/o/dist_graph ❯❯❯ mpirun --mca btl tcp,sm,self --host 
> mpi001,mpi002,mpi003,mpi004 -np 5 --bynode distgraph_test_1
> [mpi002:5304] *** An error occurred in MPI_Dist_graph_create
> [mpi002:5304] *** reported by process [46910457249793,46909632806913]
> [mpi002:5304] *** on communicator MPI_COMM_WORLD
> [mpi002:5304] *** MPI_ERR_OTHER: known error not in list
> [mpi002:5304] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
> now abort,
> [mpi002:5304] ***and potentially your MPI job)
> [savbu-usnic-a:24610] 4 more processes have sent help message 
> help-mpi-errors.txt / mpi_errors_are_fatal
> [savbu-usnic-a:24610] Set MCA parameter "orte_base_help_aggregate" to 0 to 
> see all help / error messages
> [6:50] savbu-usnic-a:~/s/o/dist_graph ❯❯❯ mpirun --mca btl tcp,sm,self --host 
> mpi001,mpi002,mpi003,mpi004 -np 5 --bynode distgraph_test_2
> [mpi002:5316] *** An error occurred in MPI_Dist_graph_create_adjacent
> [mpi002:5316] *** reported by process [46910457053185,46909632806913]
> [mpi002:5316] *** on communicator MPI_COMM_WORLD
> [mpi002:5316] *** MPI_ERR_OTHER: known error not in list
> [mpi002:5316] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
> now abort,
> [mpi002:5316] ***and potentially your MPI job)
> [savbu-usnic-a:24615] 4 more processes have sent help message 
> help-mpi-errors.txt / mpi_errors_are_fatal
> [savbu-usnic-a:24615] Set MCA parameter "orte_base_help_aggregate" to 0 to 
> see all help / error messages
> [6:51] savbu-usnic-a:~/s/o/dist_graph ❯❯❯ mpirun --mca btl tcp,sm,self --host 
> mpi001,mpi002,mpi003,mpi004 -np 5 --bynode distgraph_test_3
> [mpi001:5338] *** An error occurred in MPI_Dist_graph_create_adjacent
> [mpi001:5338] *** reported by process [46910469242881,46909632806916]
> [mpi001:5338] *** on communicator MPI_COMM_WORLD
> [mpi001:5338] *** MPI_ERR_OTHER: known error not in list
> [mpi001:5338] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
> now abort,
> [mpi001:5338] ***and potentially your MPI job)
> [savbu-usnic-a:24797] 4 more processes have sent help message 
> help-mpi-errors.txt / mpi_errors_are_fatal
> [savbu-usnic-a:24797] Set MCA parameter "orte_base_help_aggregate" to 0 to 
> see all help / error messages
> [6:51] savbu-usnic-a:~/s/o/dist_graph ❯❯❯ mpirun --mca btl tcp,sm,self --host 
> mpi001,mpi002,mpi003,mpi004 -np 5 --bynode distgraph_test_4
> [mpi001:5351] *** An error occurred in MPI_Dist_graph_create
> [mpi001:5351] *** reported by process [46910442110977,46909632806912]
> [mpi001:5351] *** on communicator MPI_COMM_WORLD
> [mpi001:5351] *** MPI_ERR_OTHER: known error not in list
> [mpi001:5351] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
> now abort,
> [mpi001:5351] ***and potentially your MPI job)
> [savbu-usnic-a:24891] 4 more processes have sent help message 
> help-mpi-errors.txt / mpi_errors_are_fatal
> [savbu-usnic-a:24891] Set MCA parameter "orte_base_help_aggregate" to 0 to 
> see all help / error messages
> [6:51] savbu-usnic-a:~/s/o/dist_graph ❯❯❯ 
> -
> 
> 
> 
> On Jul 1, 2013, at 8:41 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> The patch has been pushed into the trunk in r28687.
>> 
>> George.
>> 
>> 
>> On Jul 1, 2013, at 13:55 , George Bosilca <bosi...@icl.utk.edu> wrote:
>> 
>>> Guys,
>>> 
>>> Thanks for the patch and for the tests. All these changes/cleanups are 
>>> correct, I have incorporate them all in the patch. Please fi

[OMPI devel] RFC: OMPI_FREE_LIST_{GET|WAIT} lose the rc argument

2013-07-02 Thread George Bosilca

Our macros for the OMPI-level free list had one extra argument, a possible 
return value to signal that the operation of retrieving the element from the 
free list failed. However in this case the returned pointer was set to NULL as 
well, so the error code was redundant. Moreover, this was a continuous source 
of warnings when the picky mode was on.

The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and 
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of 
using the return code.

Deadline: July 4th

  George.




free_list_get.patch
Description: Binary data

Re: [OMPI devel] [EXTERNAL] RFC: OMPI_FREE_LIST_{GET|WAIT} lose the rc argument

2013-07-02 Thread George Bosilca

I definitively wonder why ? Whoever was the "resistance" might have had a good 
(r at least valid) orison. I can't find any trace of your patch, but I would 
definitively be interested to take a look at it (if you can resend it) to avoid 
triggering the same type of opposition.

  George.


On Jul 2, 2013, at 17:17 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> +1.
> 
> I submitted a patch like this a while ago, and it met violent resistance.  
> :-)  Although no one on the call today remembers exactly who raised the 
> resistance... 
> 
> 
> 
> On Jul 2, 2013, at 10:40 AM, "Barrett, Brian W" <bwba...@sandia.gov> wrote:
> 
>> On 7/2/13 8:22 AM, "George Bosilca" <bosi...@icl.utk.edu> wrote:
>> 
>>> Our macros for the OMPI-level free list had one extra argument, a possible 
>>> return value to signal that the operation of retrieving the element from 
>>> the free list failed. However in this case the returned pointer was set to 
>>> NULL as well, so the error code was redundant. Moreover, this was a 
>>> continuous source of warnings when the picky mode was on.
>>> 
>>> The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and 
>>> OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead 
>>> of using the return code.
>>> 
>>> Deadline: July 4th
>> 
>> Works for me.
>> 
>> Brian
>> 
>> --
>>  Brian W. Barrett
>>  Scalable System Software Group
>>  Sandia National Laboratories
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [EXTERNAL] RFC: OMPI_FREE_LIST_{GET|WAIT} lose the rc argument

2013-07-04 Thread George Bosilca

RFC completed at revision r28722.

  George.


On Jul 2, 2013, at 18:17 , "Barrett, Brian W" <bwba...@sandia.gov> wrote:

> Jeff thought it was me and I thought it was you, so I think we're ok :).
> 
> Brian
> 
> On 7/2/13 9:45 AM, "George Bosilca" <bosi...@icl.utk.edu> wrote:
> 
>> I definitively wonder why ? Whoever was the "resistance" might have had a
>> good (r at least valid) orison. I can't find any trace of your patch, but
>> I would definitively be interested to take a look at it (if you can
>> resend it) to avoid triggering the same type of opposition.
>> 
>> George.
>> 
>> 
>> On Jul 2, 2013, at 17:17 , Jeff Squyres (jsquyres) <jsquy...@cisco.com>
>> wrote:
>> 
>>> +1.
>>> 
>>> I submitted a patch like this a while ago, and it met violent
>>> resistance.  :-)  Although no one on the call today remembers exactly
>>> who raised the resistance...
>>> 
>>> 
>>> 
>>> On Jul 2, 2013, at 10:40 AM, "Barrett, Brian W" <bwba...@sandia.gov>
>>> wrote:
>>> 
>>>> On 7/2/13 8:22 AM, "George Bosilca" <bosi...@icl.utk.edu> wrote:
>>>> 
>>>>> Our macros for the OMPI-level free list had one extra argument, a
>>>>> possible return value to signal that the operation of retrieving the
>>>>> element from the free list failed. However in this case the returned
>>>>> pointer was set to NULL as well, so the error code was redundant.
>>>>> Moreover, this was a continuous source of warnings when the picky mode
>>>>> was on.
>>>>> 
>>>>> The attached parch remove the rc argument from the OMPI_FREE_LIST_GET
>>>>> and OMPI_FREE_LIST_WAIT macros, and change to check if the item is
>>>>> NULL instead of using the return code.
>>>>> 
>>>>> Deadline: July 4th
>>>> 
>>>> Works for me.
>>>> 
>>>> Brian
>>>> 
>>>> --
>>>> Brian W. Barrett
>>>> Scalable System Software Group
>>>> Sandia National Laboratories
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> --
>  Brian W. Barrett
>  Scalable System Software Group
>  Sandia National Laboratories
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Annual OMPI membership review: SVN accounts

2013-07-09 Thread George Bosilca

Indeed Thomas is now part of UTK.

  George.

On Jul 9, 2013, at 7:47, Brice Goglin  wrote:

> Le 09/07/2013 00:32, Jeff Squyres (jsquyres) a écrit :
>> INRIA
>> 
>> bgoglin:  Brice Goglin 
>> arougier: Antoine Rougier 
>> sthibaul: Samuel Thibault 
>> mercier:  Guillaume Mercier  **NO COMMITS IN LAST YEAR**
>> nfurmento:Nathalie Furmento  **NO COMMITS IN 
>> LAST YEAR**
>> herault:  Thomas Herault  **NO COMMITS IN LAST YEAR**
> 
> You can remove arougier.
> herault isn't Inria anymore, he's UTK now, not sure what to do with him.
> 
> Brice
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype

2013-07-14 Thread George Bosilca

Takahiro,

Nice catch. That particular code was an over-optimizations … that failed. 
Please try with the patch below.

Let me know if it's working as expected, I will push it in the trunk once 
confirmed.

  George.


Index: ompi/datatype/ompi_datatype_args.c
===
--- ompi/datatype/ompi_datatype_args.c  (revision 28787)
+++ ompi/datatype/ompi_datatype_args.c  (working copy)
@@ -449,9 +449,10 @@
 }
 /* For duplicated datatype we don't have to store all the information */
 if( MPI_COMBINER_DUP == args->create_type ) {
-position[0] = args->create_type;
-position[1] = args->d[0]->id; /* On the OMPI - layer, copy the 
ompi_datatype.id */
-return OMPI_SUCCESS;
+ompi_datatype_t* temp_data = args->d[0];
+return __ompi_datatype_pack_description(temp_data,
+packed_buffer,
+next_index );
 }
 position[0] = args->create_type;
 position[1] = args->ci;



On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro  
wrote:

> Hi,
> 
> I encountered an assertion failure in Open MPI trunk and found a bug.
> 
> See the attached program. This program can be run with mpiexec -n 1.
> This program calls MPI_Put and writes one int value to the target side.
> The target side datatype is equivalent to MPI_INT, but is a derived
> datatype created by MPI_Type_contiguous and MPI_Type_Dup.
> 
> This program aborts with the following output.
> 
> ==
>  dt1 (0x2626160) 
> type 2 count ints 1 count disp 0 count datatype 1
> ints: 1 
> types:MPI_INT 
>  dt2 (0x2626340) 
> type 1 count ints 0 count disp 0 count datatype 1
> types:0x2626160 
> put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: 
> __ompi_datatype_create_from_packed_description: Assertion `data_id < 45' 
> failed.
> [ppc:05244] *** Process received signal ***
> [ppc:05244] Signal: Aborted (6)
> [ppc:05244] Signal code:  (-6)
> [ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0]
> [ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5]
> [ppc:05244] [ 2] /lib/libc.so.6(abort+0x180) [0x7fe589f39fc0]
> [ppc:05244] [ 3] /lib/libc.so.6(__assert_fail+0xf1) [0x7fe589f30301]
> [ppc:05244] [ 4] /ompi/lib/libmpi.so.0(+0x6504e) [0x7fe58a4e804e]
> [ppc:05244] [ 5] 
> /ompi/lib/libmpi.so.0(ompi_datatype_create_from_packed_description+0x23) 
> [0x7fe58a4e8cf6]
> [ppc:05244] [ 6] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd04b) [0x7fe5839a104b]
> [ppc:05244] [ 7] 
> /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_sendreq_recv_put+0xa8) 
> [0x7fe5839a3ae5]
> [ppc:05244] [ 8] /ompi/lib/openmpi/mca_osc_rdma.so(+0x86cc) [0x7fe58399c6cc]
> [ppc:05244] [ 9] /ompi/lib/openmpi/mca_btl_self.so(mca_btl_self_send+0x87) 
> [0x7fe58510bb04]
> [ppc:05244] [10] /ompi/lib/openmpi/mca_osc_rdma.so(+0xc44b) [0x7fe5839a044b]
> [ppc:05244] [11] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd69d) [0x7fe5839a169d]
> [ppc:05244] [12] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_flush+0x50) 
> [0x7fe5839a1776]
> [ppc:05244] [13] 
> /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_module_fence+0x8e6) 
> [0x7fe5839a84ab]
> [ppc:05244] [14] /ompi/lib/libmpi.so.0(MPI_Win_fence+0x16f) [0x7fe58a54127d]
> [ppc:05244] [15] ompi-trunk/put_dup_type() [0x400d10]
> [ppc:05244] [16] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fe589f23c8d]
> [ppc:05244] [17] put_dup_type() [0x400b09]
> [ppc:05244] *** End of error message ***
> --
> mpiexec noticed that process rank 0 with PID 5244 on node ppc exited on 
> signal 6 (Aborted).
> --
> ==
> 
> __ompi_datatype_create_from_packed_description function, in which the
> assertion failure occurred, seems to expect the value of data_id is an
> ID of a predefined datatype. In my environment, the value of data_id
> is 68, that is an ID of the datatype created by MPI_Type_contiguous.
> 
> On one-sided communication, the target side datatype is encoded as
> 'description' at the origin side and then it is decoded at the target
> side. I think there are problems in both encoding stage and decoding
> stage.
> 
> __ompi_datatype_pack_description function in
> ompi/datatype/ompi_datatype_args.c file encodes the datatype.
> For MPI_COMBINER_DUP on line 451, it encodes only create_type and id
> and returns immediately. It doesn't encode the information of the base
> dataype (in my case, the datatype created by MPI_Type_contiguous).
> 
> __ompi_datatype_create_from_packed_description function in
> ompi/datatype/ompi_datatype_args.c file decodes the description.
> For MPI_COMBINER_DUP in line 557, it expects the value of data_id is
> an ID

Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype

2013-07-14 Thread George Bosilca

Takahiro,

Please find below another patch, this time hopefully fixing all issues. The 
problem with my original patch and with yours was that they try to address the 
packing of the data representation without fixing the computation of the 
required length. As a result the length on the packer and unpacker differs and 
the unpacking of the subsequent data is done from a wrong location.

I changed the code to force the preparation of the packed data representation 
before returning the length the first time. This way we can compute exactly how 
many bytes we need, including the potential alignment requirements. As a result 
the amount on both sides (the packer and the unpacker) are now identical, and 
the entire process works flawlessly (or so I hope).

Let me know if you still notice issues with this patch. I'll push the tomorrow 
in the trunk, so it can soak for a few days before propagation to the branches.

George.




packed.patch
Description: Binary data

On Jul 14, 2013, at 20:28 , KAWASHIMA Takahiro  
wrote:

> George,
> 
> A improved patch is attached. Latter half is same as your patch.
> But again, I'm not sure this is a correct solution.
> 
> It works correctly for my attached put_dup_type_3.c.
> Run as "mpiexec -n 1 ./put_dup_type_3".
> It will print seven OKs if succeeded.
> 
> Regards,
> KAWASHIMA Takahiro
> 
>> No. My patch doesn't work for a more simple case,
>> just a duplicate of MPI_INT.
>> 
>> Datatype is too complex for me ...
>> 
>> Regards,
>> KAWASHIMA Takahiro
>> 
>>> George,
>>> 
>>> Thanks. But no, your patch does not work correctly.
>>> 
>>> The assertion failure disappeared by your patch but the value of the
>>> target buffer of MPI_Put is not a correct one.
>>> 
>>> In rdma OSC (and pt2pt OSC), the following data are packed into
>>> the send buffer in ompi_osc_rdma_sendreq_send function on the
>>> origin side.
>>> 
>>>  - header
>>>  - datatype description
>>>  - user data
>>> 
>>> User data are written at the offset of
>>> (sizeof(ompi_osc_rdma_send_header_t) + total_pack_size).
>>> 
>>> In the case of my program attached in my previous mail, total_pack_size
>>> is 32 because ompi_datatype_set_args set 8 for MPI_COMBINER_DUP and
>>> 24 for MPI_COMBINER_CONTIGUOUS. See the following code.
>>> 
>>> 
>>> int32_t ompi_datatype_set_args(... snip ...)
>>> {
>>>... snip ...
>>>switch(type){
>>>... snip ...
>>>case MPI_COMBINER_DUP:
>>>/* Recompute the data description packed size based on the 
>>> optimization
>>> * for MPI_COMBINER_DUP.
>>> */
>>>pArgs->total_pack_size = 2 * sizeof(int);  total_pack_size = 8
>>>break;
>>>... snip ...
>>>}
>>>...
>>>for( pos = 0; pos < cd; pos++ ) {
>>>... snip ...
>>>if( !(ompi_datatype_is_predefined(d[pos])) ) {
>>>... snip ...
>>>pArgs->total_pack_size += 
>>> ((ompi_datatype_args_t*)d[pos]->args)->total_pack_size;  
>>> total_pack_size += 24
>>>... snip ...
>>>}
>>>... snip ...
>>>}
>>>... snip ...
>>> }
>>> 
>>> 
>>> But on the target side, user data are read at the offset of
>>> (sizeof(ompi_osc_rdma_send_header_t) + 24)
>>> because ompi_osc_base_datatype_create function, which is called
>>> by ompi_osc_rdma_sendreq_recv_put function, progress the offset
>>> only 24 bytes. Not 32 bytes.
>>> 
>>> So the wrong data are written to the target buffer.
>>> 
>>> We need to take care of total_pack_size in the origin side.
>>> 
>>> I modified ompi_datatype_set_args function as a trial.
>>> 
>>> Index: ompi/datatype/ompi_datatype_args.c
>>> ===
>>> --- ompi/datatype/ompi_datatype_args.c  (revision 28778)
>>> +++ ompi/datatype/ompi_datatype_args.c  (working copy)
>>> @@ -129,7 +129,7 @@
>>> /* Recompute the data description packed size based on the 
>>> optimization
>>>  * for MPI_COMBINER_DUP.
>>>  */
>>> -pArgs->total_pack_size = 2 * sizeof(int);
>>> +pArgs->total_pack_size = 0;
>>> break;
>>> 
>>> case MPI_COMBINER_CONTIGUOUS:
>>> 
>>> This patch in addition to your patch works correctly for my program.
>>> But I'm not sure this is a correct solution.
>>> 
>>> Regards,
>>> KAWASHIMA Takahiro
>>> 
 Takahiro,
 
 Nice catch. That particular code was an over-optimizations … that failed. 
 Please try with the patch below.
 
 Let me know if it's working as expected, I will push it in the trunk once 
 confirmed.
 
  George.
 
 
 Index: ompi/datatype/ompi_datatype_args.c
 ===
 --- ompi/datatype/ompi_datatype_args.c (revision 28787)
 +++ ompi/datatype/ompi_datatype_args.c (working copy)
 @@ -449,9 +449,10

Re: [OMPI devel] [bug] One-sided communication with a duplicated datatype

2013-07-15 Thread George Bosilca

Thanks for testing it. It is now in trunk r28790.

  George.


On Jul 15, 2013, at 12:29 , KAWASHIMA Takahiro  
wrote:

> George,
> 
> Thanks. I've confirmed your patch.
> I wrote a simple program to test your patch and no problems are found.
> The test program is attached to this mail.
> 
> Regards,
> KAWASHIMA Takahiro
> 
>> Takahiro,
>> 
>> Please find below another patch, this time hopefully fixing all issues. The 
>> problem with my original patch and with yours was that they try to address 
>> the packing of the data representation without fixing the computation of the 
>> required length. As a result the length on the packer and unpacker differs 
>> and the unpacking of the subsequent data is done from a wrong location.
>> 
>> I changed the code to force the preparation of the packed data 
>> representation before returning the length the first time. This way we can 
>> compute exactly how many bytes we need, including the potential alignment 
>> requirements. As a result the amount on both sides (the packer and the 
>> unpacker) are now identical, and the entire process works flawlessly (or so 
>> I hope).
>> 
>> Let me know if you still notice issues with this patch. I'll push the 
>> tomorrow in the trunk, so it can soak for a few days before propagation to 
>> the branches.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: revised ORTE error handling

2013-07-15 Thread George Bosilca

Ralph,

Sorry for the late answer, we have quite a few things on our todo list right 
now. Here are few concerns I'm having about the proposed approach.

1. We would have preferred to have a list of processes for the 
ompi_errhandler_runtime_callback function. We don't necessary care about the 
error code, but having a list will allow us to move the notifications per bulk 
instead of one by one.

2. You made the registration of the callbacks ordered, and added special 
arguments to append or prepend callbacks to the list. Right now I can't figure 
out a good reason on how to use it especially that the order might be impose on 
the order the modules are loaded by the frameworks, thus not something we can 
easily control.

3. The callback list. The concept is useful, I don't know about the 
implementation. The current version doesn't support stopping the propagation of 
the error signal, which might be an issue in some cases. I can picture the fact 
that one level know about the issue, and know how to fix it, so the error does 
not need to propagate to other levels. This can be implemented in the old way 
interrupts were managed in DOS, with basically a simple _get / _set type of 
interface. If a callback wants to propagate the error it has first to retrieve 
the ancestor on the moment when it registered the callback and then explicitly 
calls it upon error.

Again, nothing major in the short term as it will take a significant amount of 
work to move the only user of such error handling capability (the FT prototype) 
back over the current version of the ORTE.

Regards,
  George.

On Jul 3, 2013, at 06:45 , Ralph Castain  wrote:

>  NOTICE: This RFC modifies the MPI-RTE interface 
> 
> WHAT: revise the RTE error handling to allow registration of callbacks upon 
> RTE-detected errors
> 
> WHY: currently, the RTE aborts the process if an RTE-detected error occurs. 
> This allows the upper layers (e.g., MPI) no chance to implement their own 
> error response strategy, and it precludes allowing user-defined error 
> handling.
> 
> TIMEOUT:  let's go for July 19th, pending further discussion
> 
> George and I were talking about ORTE's error handling the other day in 
> regards to the right way to deal with errors in the updated OOB. 
> Specifically, it seemed a bad idea for a library such as ORTE to be aborting 
> the job on its own prerogative. If we lose a connection or cannot send a 
> message, then we really should just report it upwards and let the application 
> and/or upper layers decide what to do about it.
> 
> The current code base only allows a single error callback to exist, which 
> seemed unduly limiting. So, based on the conversation, I've modified the 
> errmgr interface to provide a mechanism for registering any number of error 
> handlers (this replaces the current "set_fault_callback" API). When an error 
> occurs, these handlers will be called in order until one responds that the 
> error has been "resolved" - i.e., no further action is required. The default 
> MPI layer error handler is specified to go "last" and calls mpi_abort, so the 
> current "abort" behavior is preserved unless other error handlers are 
> registered.
> 
> In the register_callback function, I provide an "order" param so you can 
> specify "this callback must come first" or "this callback must come last". 
> Seemed to me that we will probably have different code areas registering 
> callbacks, and one might require it go first (the default "abort" will always 
> require it go last). So you can append and prepend, or go first/last.
> 
> The errhandler callback function passes the name of the proc involved (which 
> can be yourself for internal errors) and the error code. This is a change 
> from the current fault callback which returned an opal_pointer_array of 
> process names.
> 
> The work is available for review in my bitbucket:
> 
> https://bitbucket.org/rhc/ompi-errmgr
> 
> I've attached the svn diff as well.
> 
> Appreciate your comments - nothing in concrete.
> Ralph
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: add support for large counts using derived datatypes

2013-07-16 Thread George Bosilca

Nathan,

I read your code and it's definitively looking good. I have however few minor 
issues with your patch.

1. MPI_Aint is unsigned as it must represent the difference between two memory 
arbitrary locations. In your MPI_Type_get_[true_]extent_x you go through size_t 
possibly reducing it's extent. I would suggest you used ssize_t instead.
2. In several other locations size_t is used as a conversion base. In some of 
these location there is even a comment talking about ssize_t … 
3. We had a policy that we only export one single MPI level function per file 
in the mpi directory. You changed this as some of the files exports now two 
function (the original function together with the _x version).
4. In the OPAL datatype stuff sometimes you use size_t and sometimes ssize_t 
for the same type of logic (set and get count as an example). Why?
5. You change the comments in the opal_datatype.h with "question marks"? the 
cache boundary must be known, it can't be somewhere between x-y bytes ago …
6. I'm not sure the change of nbElems from uint32_t to size_t (in 
opal/datatype/opal_datatype.h) is doing what you expect…

Btw, I have a question to you fellow MPI Forum attendees. I just can't remember 
why the MPI forum felt there was a need for the MPI_Type_get[_true]_extent_x? 
MPI_Count can't be bigger than MPI_Aint, so I don't see what is the benefit of 
extending the MPI_Type_get_true_extent(MPI_Datatype, MPI_Aint*, MPI_Aint*) and 
MPI_Type_get_extent(MPI_Datatype, MPI_Aint*, MPI_Aint*) with the corresponding 
_X versions?

George.

On Jul 16, 2013, at 21:14 , Nathan Hjelm  wrote:

> What: Add support for the MPI-3 MPI_Count datatype and functions: 
> MPI_Get_elements_x, MPI_Status_set_elements_x, MPI_Type_get_extent_x, 
> MPI_Type_get_true_extent_x, and MPI_Type_size_x. This will be CMR'd to 1.7.3 
> if there are no objections.
> 
> Why: MPI_Count is required by the MPI 3.0 standard. This will add another 
> checkmark by MPI 3 support.
> 
> When: Setting a short timeout of one week (Tues, July 23, 2013). Most of the 
> changes add the new functionality but there are some changes that affect the 
> datatype engine.
> 
> Details follow.
> 
> -Nathan
> 
> Repository @ github: https://github.com/hjelmn/ompi-count.git
> 
> Relevant commits:
> General support: 
> https://github.com/hjelmn/ompi-count/commit/db54d13404a241642fa783d5b3cc74edcb1103f2
> Fortran support: 
> https://github.com/hjelmn/ompi-count/commit/293adf84be52c2cd8acfe31be19cfe0afe14752d
> Others: 
> https://github.com/hjelmn/ompi-count/commit/6c6ca8e539da675632c249c891ff93fdbc9d8de8
>
> https://github.com/hjelmn/ompi-count/commit/9638ef1f245f12bb98abbf5f47e1ecfd1a018862
>
> https://github.com/hjelmn/ompi-count/commit/e158aa152d122e554b89498f5a71284ce1361a99
> 
> Add support for MPI_Count type and MPI_COUNT datatype and add the required
> MPI-3 functions MPI_Get_elements_x, MPI_Status_set_elements_x,
> MPI_Type_get_extent_x, MPI_Type_get_true_extent_x, and MPI_Type_size_x.
> This commit adds only the C bindings. Fortran bindins will be added in
> another commit. For now the MPI_Count type is define to have the same size
> as MPI_Offset. The type is required to be at least as large as MPI_Offset
> and MPI_Aint. The type was initially intended to be a ssize_t (if it was
> the same size as a long long) but there were issues compiling romio with
> that definition (despite the inclusion of stddef.h).
> 
> I updated the datatype engine to use size_t instead of uint32_t to support
> large datatypes. This will require some review to make sure that 1) the
> changes are beneficial, 2) nothing was broken by the change (I doubt
> anything was), and 3) there are no performance regressions due to this
> change.
> 
> George, please look over these changes and let me know if you see anything 
> wrong with my updates to the datatype engine.
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: add support for large counts using derived datatypes

2013-07-16 Thread George Bosilca


On Jul 16, 2013, at 22:29 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

> On Jul 16, 2013, at 4:22 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> Btw, I have a question to you fellow MPI Forum attendees. I just can't 
>> remember why the MPI forum felt there was a need for the 
>> MPI_Type_get[_true]_extent_x? MPI_Count can't be bigger than MPI_Aint,
> 
> Yes, it can -- it has to be the largest integer type (i.e., it even has to be 
> able to handle an MPI_Offset).

Technicalities! In the entire standard MPI_Offset is only used to access files, 
not to build datatypes. As such there is no way to have the extent of an 
datatype bigger than MPI_Aint. Thus, these accessors returning MPI_Count are a 
useless overkill, as they cannot offer more precision that what the version 
returning MPI_Aint is already offering.

  George.

PS: I hope nobody has the idea to define the MPI_Offset as a signed type …


>> so I don't see what is the benefit of extending the 
>> MPI_Type_get_true_extent(MPI_Datatype, MPI_Aint*, MPI_Aint*) and 
>> MPI_Type_get_extent(MPI_Datatype, MPI_Aint*, MPI_Aint*) with the 
>> corresponding _X versions?
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: add support for large counts using derived datatypes

2013-07-16 Thread George Bosilca

Apparently I just can't type that freaking word. Thanks Nathan for pointing out 
the truth ;)

  George.

On Jul 16, 2013, at 22:56 , Nathan Hjelm <hje...@lanl.gov> wrote:

> I think you meant signed. It is signed in both configure.ac and 
> ompi_datatype_module.c.
> 
> -Nathan
> 
> On Tue, Jul 16, 2013 at 10:48:12PM +0200, George Bosilca wrote:
>> It's a typo, MPI_Aint is of course unsigned.
>> 
>>  George.
>> 
>> On Jul 16, 2013, at 22:37 , David Goodell (dgoodell) <dgood...@cisco.com> 
>> wrote:
>> 
>>> On Jul 16, 2013, at 3:22 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>> 
>>>> I read your code and it's definitively looking good. I have however few 
>>>> minor issues with your patch.
>>>> 
>>>> 1. MPI_Aint is unsigned as it must represent the difference between two 
>>>> memory arbitrary locations. In your MPI_Type_get_[true_]extent_x you go 
>>>> through size_t possibly reducing it's extent. I would suggest you used 
>>>> ssize_t instead.
>>> 
>>> MPI_Aint must be signed for Fortran compatibility (among other reasons).  
>>> If OMPI's MPI_Aint is unsigned then that's a bug in OMPI.
>>> 
>>> -Dave
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: add support for large counts using derived datatypes

2013-07-16 Thread George Bosilca


On Jul 16, 2013, at 23:07 , "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> 
wrote:

> On Jul 16, 2013, at 5:03 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>>> Yes, it can -- it has to be the largest integer type (i.e., it even has to 
>>> be able to handle an MPI_Offset).
>> 
>> Technicalities! In the entire standard MPI_Offset is only used to access 
>> files, not to build datatypes. As such there is no way to have the extent of 
>> an datatype bigger than MPI_Aint.
> 
> Datatypes are used in FILE_SET_VIEW.

Doesn't matter. There you don't create a datatype, you force one on the view 
you have of the file. I guess the forum was a little overzealous …

  George.


> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: add support for large counts using derived datatypes

2013-07-16 Thread George Bosilca


On Jul 16, 2013, at 23:11 , "David Goodell (dgoodell)" <dgood...@cisco.com> 
wrote:

> On Jul 16, 2013, at 4:03 PM, George Bosilca <bosi...@icl.utk.edu>
> wrote:
> 
>> On Jul 16, 2013, at 22:29 , Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>> wrote:
>> 
>>> On Jul 16, 2013, at 4:22 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>> 
>>>> Btw, I have a question to you fellow MPI Forum attendees. I just can't 
>>>> remember why the MPI forum felt there was a need for the 
>>>> MPI_Type_get[_true]_extent_x? MPI_Count can't be bigger than MPI_Aint,
>>> 
>>> Yes, it can -- it has to be the largest integer type (i.e., it even has to 
>>> be able to handle an MPI_Offset).
>> 
>> Technicalities! In the entire standard MPI_Offset is only used to access 
>> files, not to build datatypes. As such there is no way to have the extent of 
>> an datatype bigger than MPI_Aint.
> 
> That's not true.  You can obtain a datatype with an extent outside the range 
> of an MPI_Aint by nesting types.  Just create a config of size 1, then create 
> a type a very large extent from your contig with MPI_Type_create_resized, 
> then create a second contig of that resized with a count >1.

Sure. But the only reason you create such a nested type is to access files 
(otherwise you can't go over the MPI_Aint boundary safely). Thus I would have 
expected the limit to be similar to MPI_Offset and not a new type MPI_Count …

Oh I see now. MPI_Aint is the largest difference in memory and MPI_Offset is 
the largest difference for files. Thus, MPI_Count is the largest of the two, so 
it can adapt in all cases. I'm happy with this conclusion … Thanks everyone.

  George.

> 
>> Thus, these accessors returning MPI_Count are a useless overkill, as they 
>> cannot offer more precision that what the version returning MPI_Aint is 
>> already offering.
>> 
>> George.
>> 
>> PS: I hope nobody has the idea to define the MPI_Offset as a signed type …
> 
> Not sure if you're joking here... MPI_Offset must also be signed, again, for 
> Fortran interoperability.
> 
> -Dave
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

[OMPI devel] ompi_info

2013-07-16 Thread George Bosilca

I would like to question the choice for the new … spartan ompi_info output? I 
would not mind restoring the default behavior, aka. have a verbose "--all", 
instead of some [random] MCA params.

Btw, something is wrong i the following output. I have an "btl = sm,self" in my 
.openmpi/mca-params.conf so I should not even see the BTL TCP parameters.

Thanks,
  George.


$ompi_info --param all all
 MCA btl: parameter "btl_tcp_if_include" (current value: "",
  data source: default, level: 1 user/basic, type:
  string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to use for MPI communication
  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
  with btl_tcp_if_exclude.
 MCA btl: parameter "btl_tcp_if_exclude" (current value:
  "127.0.0.1/8,sppp", data source: default, level: 1
  user/basic, type: string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to NOT use for MPI
  communication -- all devices not matching these
  specifications will be used (e.g.,
  "eth0,192.168.0.0/16").  If set to a non-default
  value, it is mutually exclusive with
  btl_tcp_if_include.
 MCA pml: performance "pml_ob1_unexpected_msgq_length" (type:
  unsigned, class: size)
  Number of unexpected messages received by each peer
  in a communicator
 MCA pml: performance "pml_ob1_posted_recvq_length" (type:
  unsigned, class: size)
  Number of unmatched receives posted for each peer
  in a communicator

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1109 matches

Mail list logo