[OMPI devel] Trunk problem: VT breakage

2011-06-29 Thread Ralph Castain
It appears I cannot build the trunk on Mac - I hit this issue when I updated 
from the trunk and rebuilt from autogen this evening:

make[7]: *** No rule to make target `vt_filthandler.cc', needed by 
`vtfilter-vt_filthandler.o'.  Stop.

Vanilla configure - I didn't turn VT off like I usually do.

Any help would be appreciated. 




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24836

2011-06-29 Thread Jeff Squyres
That's still not quite right, per OMPI conventions.

If you don't find it, you shouldn't warn, you should have some other 
AC_MSG_CHECKING and then indicate that that particular check failed.

E.g.

AC_MSG_CHECKING([if can use dynamic SL support])
AS_IF([test "$1_have_dynamic_sl" = "1"],
  [AC_MSG_RESULT([yes])],
  [AC_MSG_RESULT([no])
   AS_IF([test "$enable_openib_dynamic_sl" = "yes"],
 [AC_MSG_WARN([--enable-openib-dynamic-sl was specified but the])
  AC_MSG_WARN([appropriate header files could not be found.])
  AC_MSG_ERROR([Cannot continue])])
  ])

(I just typed that in my mail client; I don't know if all the [] and () match 
properly)



On Jun 29, 2011, at 10:52 AM, klit...@osl.iu.edu wrote:

> Author: kliteyn
> Date: 2011-06-29 10:52:11 EDT (Wed, 29 Jun 2011)
> New Revision: 24836
> URL: https://svn.open-mpi.org/trac/ompi/changeset/24836
> 
> Log:
> Changed default behavior when opensm-devel pachege not found - warn, not exit
> 
> 
> Text files modified: 
>   trunk/ompi/config/ompi_check_openib.m4 | 2 +-   
>
>   1 files changed, 1 insertions(+), 1 deletions(-)
> 
> Modified: trunk/ompi/config/ompi_check_openib.m4
> ==
> --- trunk/ompi/config/ompi_check_openib.m4(original)
> +++ trunk/ompi/config/ompi_check_openib.m42011-06-29 10:52:11 EDT (Wed, 
> 29 Jun 2011)
> @@ -195,7 +195,7 @@
># ib_types.h, but it doesn't include any other IB-related 
> files.
>AC_CHECK_HEADER([infiniband/complib/cl_types_osd.h],
>[$1_have_dynamic_sl=1],
> -   [AC_MSG_ERROR([opensm-devel package not found 
> - please install it or disable dynamic SL support with 
> \"--disable-openib-dynamic-sl\"])],
> +   [AC_MSG_WARN([opensm-devel package not found 
> - please install it or disable dynamic SL support with 
> \"--disable-openib-dynamic-sl\"])],
>[])
>fi
> 
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




[OMPI devel] TIPC BTL Segmentation fault

2011-06-29 Thread Xin He

Hi,

As I advanced in my implementation of TIPC BTL, I added the component 
and tried to run hello_c program to test.


Then I got this segmentation fault. It seemed happening after the call 
"mca_btl_tipc_add_procs".


The error message displayed:

[oak:23192] *** Process received signal ***
[oak:23192] Signal: Segmentation fault (11)
[oak:23192] Signal code:  (128)
[oak:23192] Failing at address: (nil)
[oak:23192] [ 0] /lib/libpthread.so.0(+0xfb40) [0x7fec2a40fb40]
[oak:23192] [ 1] /usr/lib/libmpi.so.0(+0x1e6c10) [0x7fec2b2afc10]
[oak:23192] [ 2] /usr/lib/libmpi.so.0(+0x1e71f2) [0x7fec2b2b01f2]
[oak:23192] [ 3] /usr/lib/openmpi/mca_pml_ob1.so(+0x59f2) [0x7fec264fc9f2]
[oak:23192] [ 4] /usr/lib/openmpi/mca_pml_ob1.so(+0x5e5a) [0x7fec264fce5a]
[oak:23192] [ 5] /usr/lib/openmpi/mca_pml_ob1.so(+0x2386) [0x7fec264f9386]
[oak:23192] [ 6] /usr/lib/openmpi/mca_pml_ob1.so(+0x24a0) [0x7fec264f94a0]
[oak:23192] [ 7] /usr/lib/openmpi/mca_pml_ob1.so(+0x22fb) [0x7fec264f92fb]
[oak:23192] [ 8] /usr/lib/openmpi/mca_pml_ob1.so(+0x3a60) [0x7fec264faa60]
[oak:23192] [ 9] /usr/lib/libmpi.so.0(+0x67f51) [0x7fec2b130f51]
[oak:23192] [10] /usr/lib/libmpi.so.0(MPI_Init+0x173) [0x7fec2b161c33]
[oak:23192] [11] hello_i(main+0x22) [0x400936]
[oak:23192] [12] /lib/libc.so.6(__libc_start_main+0xfe) [0x7fec2a09bd8e]
[oak:23192] [13] hello_i() [0x400859]
[oak:23192] *** End of error message ***

I used gdb to check the stack:
(gdb) bt
#0  0x77afac10 in opal_obj_run_constructors (object=0x6ca980)
at ../opal/class/opal_object.h:427
#1  0x77afb1f2 in opal_list_construct (list=0x6ca958) at 
class/opal_list.c:88

#2  0x72d479f2 in opal_obj_run_constructors (object=0x6ca958)
at ../../../../opal/class/opal_object.h:427
#3  0x72d47e5a in mca_pml_ob1_comm_construct (comm=0x6ca8c0)
at pml_ob1_comm.c:55
#4  0x72d44386 in opal_obj_run_constructors (object=0x6ca8c0)
at ../../../../opal/class/opal_object.h:427
#5  0x72d444a0 in opal_obj_new (cls=0x72f6c040)
at ../../../../opal/class/opal_object.h:477
#6  0x72d442fb in opal_obj_new_debug (type=0x72f6c040,
file=0x72d62840 "pml_ob1.c", line=182)
at ../../../../opal/class/opal_object.h:252
#7  0x72d45a60 in mca_pml_ob1_add_comm (comm=0x601060) at 
pml_ob1.c:182
#8  0x7797bf51 in ompi_mpi_init (argc=1, argv=0x7fffdf58, 
requested=0,

provided=0x7fffde28) at runtime/ompi_mpi_init.c:770
#9  0x779acc33 in PMPI_Init (argc=0x7fffde5c, 
argv=0x7fffde50)

at pinit.c:84
#10 0x00400936 in main (argc=1, argv=0x7fffdf58) at hello_c.c:17

It seems the error happened when an object is constructed. Any idea why 
this is happening?


Thanks.

Best regards,
Xin




Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

2011-06-29 Thread Kawashima
Hi Jeff,

> > First, we created a new BTL component, 'tofu BTL'. It's not so special
> > one but dedicated to our Tofu interconnect. But its latency was not
> > enough for us.
> > 
> > So we created a new framework, 'LLP', and its component, 'tofu LLP'.
> > It bypasses request object creation in PML and BML/BTL, and sends
> > a message immediately if possible.
> 
> Gotcha.  Was the sendi pml call not sufficient?  (sendi = "send immediate")  
> This call was designed to be part of a latency reduction mechanism.  I forget 
> offhand what we don't do before calling sendi, but the rationale was that if 
> the message was small enough, we could skip some steps in the sending process 
> and "just send it."

I know sendi, but its latency was not sufficient for us.
To come at sendi call, we must do:
  - allocate send request (MCA_PML_OB1_SEND_REQUEST_ALLOC)
  - initialize send request (MCA_PML_OB1_SEND_REQUEST_INIT)
  - select BTL module (mca_pml_ob1_send_request_start)
  - select protocol (mca_pml_ob1_send_request_start_btl)
We want to eliminate these overheads. We want to send more immediately.

Here is a code snippet:



#if OMPI_ENABLE_LLP
static inline int mca_pml_ob1_call_llp_send(void *buf,
size_t size,
int dst,
int tag,
ompi_communicator_t *comm)
{
int rc;
mca_pml_ob1_comm_proc_t *proc = &comm->c_pml_comm->procs[dst];
mca_pml_ob1_match_hdr_t *match = mca_pml_ob1.llp_send_buf;

match->hdr_common.hdr_type = MCA_PML_OB1_HDR_TYPE_MATCH;
match->hdr_common.hdr_flags = 0;
match->hdr_ctx = comm->c_contextid;
match->hdr_src = comm->c_my_rank;
match->hdr_tag = tag;
match->hdr_seq = proc->send_sequence + 1;

rc = MCA_LLP_CALL(send(buf, size, OMPI_PML_OB1_MATCH_HDR_LEN,
   (bool)OMPI_ENABLE_OB1_PAD_MATCH_HDR,
   ompi_comm_peer_lookup(comm, dst),
   MCA_PML_OB1_HDR_TYPE_MATCH));

if (rc == OMPI_SUCCESS) {
/* NOTE this is not thread safe */
OPAL_THREAD_ADD32(&proc->send_sequence, 1);
}

return rc;
}
#endif

int mca_pml_ob1_send(void *buf,
 size_t count,
 ompi_datatype_t * datatype,
 int dst,
 int tag,
 mca_pml_base_send_mode_t sendmode,
 ompi_communicator_t * comm)
{
int rc;
mca_pml_ob1_send_request_t *sendreq;

#if OMPI_ENABLE_LLP
/* try to send message via LLP if
 *   - one of LLP modules is available, and
 *   - datatype is basic, and
 *   - data is small, and
 *   - communication mode is standard, buffered, or ready, and
 *   - destination is not myself
 */
if (((datatype->flags & DT_FLAG_BASIC) == DT_FLAG_BASIC) &&
(datatype->size * count < mca_pml_ob1.llp_max_payload_size) &&
(sendmode == MCA_PML_BASE_SEND_STANDARD ||
 sendmode == MCA_PML_BASE_SEND_BUFFERED ||
 sendmode == MCA_PML_BASE_SEND_READY) &&
(dst != comm->c_my_rank)) {
rc = mca_pml_ob1_call_llp_send(buf, datatype->size * count, dst, tag, 
comm);
if (rc != OMPI_ERR_NOT_AVAILABLE) {
/* successfully sent out via LLP or unrecoverable error occurred */
return rc;
}
}
#endif

MCA_PML_OB1_SEND_REQUEST_ALLOC(comm, dst, sendreq, rc);
if (rc != OMPI_SUCCESS)
return rc;

MCA_PML_OB1_SEND_REQUEST_INIT(sendreq,
  buf,
  count,
  datatype,
  dst, tag,
  comm, sendmode, false);

PERUSE_TRACE_COMM_EVENT (PERUSE_COMM_REQ_ACTIVATE,
 &(sendreq)->req_send.req_base,
 PERUSE_SEND);

MCA_PML_OB1_SEND_REQUEST_START(sendreq, rc);
if (rc != OMPI_SUCCESS) {
MCA_PML_OB1_SEND_REQUEST_RETURN( sendreq );
return rc;
}

ompi_request_wait_completion(&sendreq->req_send.req_base.req_ompi);

rc = sendreq->req_send.req_base.req_ompi.req_status.MPI_ERROR;
ompi_request_free( (ompi_request_t**)&sendreq );
return rc;
}



mca_pml_ob1_send is body of MPI_Send in Open MPI. Region of
OMPI_ENABLE_LLP is added by us.

We don't have to use a send request if we could "send immediately".
So we try to send via LLP at first. If LLP could not send immediately
because of interconnect busy or something, LLP returns
OMPI_ERR_NOT_AVAILABLE, and we continue normal PML/BML/BTL send(i).
Since we want to use simple memcpy instead of complex convertor,
we restrict datatype that can go into the LLP.

Of course, we cannot use LLP on MPI_Isend.

> Note, too, that the coll modules can be la

Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

2011-06-29 Thread Jeff Squyres
On Jun 29, 2011, at 3:57 AM, Kawashima wrote:

> First, we created a new BTL component, 'tofu BTL'. It's not so special
> one but dedicated to our Tofu interconnect. But its latency was not
> enough for us.
> 
> So we created a new framework, 'LLP', and its component, 'tofu LLP'.
> It bypasses request object creation in PML and BML/BTL, and sends
> a message immediately if possible.

Gotcha.  Was the sendi pml call not sufficient?  (sendi = "send immediate")  
This call was designed to be part of a latency reduction mechanism.  I forget 
offhand what we don't do before calling sendi, but the rationale was that if 
the message was small enough, we could skip some steps in the sending process 
and "just send it."

Note, too, that the coll modules can be laid overtop of each other -- e.g., if 
you only implement barrier (and some others) in tofu coll, then you can supply 
NULL for the other function pointers and the coll base will resolve those 
functions to other coll modules automatically.

> Also, we modified tuned COLL to implement interconnect-and-topology-
> specific bcast/allgather/alltoall/allreduce algorithm. These algorithm
> implementations also bypass PML/BML/BTL to eliminate protocol and software
> overhead.

Good.  As Sylvain mentioned, that was the intent of the coll framework -- it 
certainly isn't *necessary* for coll's to always implement their underlying 
sends/receives with the BTL.  The sm coll does this, for example -- it uses its 
own shared memory block for talking to other the sm coll's in other processes 
on the same node, but it doesn't go through the sm BTL.

> To achieve above, we created 'tofu COMMON', like sm (ompi/mca/common/sm/).
> 
> Is there interesting one?
> 
> Though our BTL and COLL are quite interconnect-specific, LLP may be
> contributed in the future.

Yes, it may be interesting to see what you did there.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

2011-06-29 Thread Kawashima
Hi Sylvain,

> > Also, we modified tuned COLL to implement interconnect-and-topology-
> > specific bcast/allgather/alltoall/allreduce algorithm. These algorithm
> > implementations also bypass PML/BML/BTL to eliminate protocol and 
> software
> > overhead.
> This seems perfectly valid to me. The current coll components use normal 
> MPI_Send/Recv semantics, hence the PML/BML/BTL chain, but I always saw the 
> coll framework as a way to be able to integrate smoothly "custom" 
> collective components for a specific interconnect. I think that Mellanox 
> also did a specific collective component using directly their ConnectX HCA 
> capabilities.
> 
> However, modifying the "tuned" component may not be the better way to 
> integrate your collective work. You may consider creating a "tofu" coll 
> component which would only provide the collectives you optimized (and the 
> coll framework will fallback on tuned for the ones you didn't optimize).

Yes. I agree.
But sadly, my colleague implemented it badly.

We created another COLL component that use interconnect barrier,
like Mellanox FCA.

> > To achieve above, we created 'tofu COMMON', like sm 
> (ompi/mca/common/sm/).
> > 
> > Is there interesting one?
> It may be interesting, yes. I don't know the tofu model, but if it is not 
> secret, contributing it is usually a good thing.
> 
> Your communication model may be similar to others and portions of code may 
> be shared with other technologies (I'm thinking of IB, MX, PSM,...). 
> People writing new code would also consider your model and let you take 
> advantage of it. Knowing how tofu is integrated into Open MPI may also 
> impact major decisions the open-source community is taking.

Tofu communication model is simular to that of IB RDMA.
Actually, we use source code of openib BTL as a reference.
We'll consider contribution of some code, and join the discussion.

Regards,

Takahiro Kawashima,
MPI development team,
Fujitsu


Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

2011-06-29 Thread sylvain . jeaugey
Kawashima-san,

Congratulations for your machine, this is a stunning achievement !

> Kawashima  wrote :
> Also, we modified tuned COLL to implement interconnect-and-topology-
> specific bcast/allgather/alltoall/allreduce algorithm. These algorithm
> implementations also bypass PML/BML/BTL to eliminate protocol and 
software
> overhead.
This seems perfectly valid to me. The current coll components use normal 
MPI_Send/Recv semantics, hence the PML/BML/BTL chain, but I always saw the 
coll framework as a way to be able to integrate smoothly "custom" 
collective components for a specific interconnect. I think that Mellanox 
also did a specific collective component using directly their ConnectX HCA 
capabilities.

However, modifying the "tuned" component may not be the better way to 
integrate your collective work. You may consider creating a "tofu" coll 
component which would only provide the collectives you optimized (and the 
coll framework will fallback on tuned for the ones you didn't optimize).

> To achieve above, we created 'tofu COMMON', like sm 
(ompi/mca/common/sm/).
> 
> Is there interesting one?
It may be interesting, yes. I don't know the tofu model, but if it is not 
secret, contributing it is usually a good thing.

Your communication model may be similar to others and portions of code may 
be shared with other technologies (I'm thinking of IB, MX, PSM,...). 
People writing new code would also consider your model and let you take 
advantage of it. Knowing how tofu is integrated into Open MPI may also 
impact major decisions the open-source community is taking.

Sylvain

Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

2011-06-29 Thread Kawashima
Hi Jeff, Ralph, and all,

Thank you for your reply.
RIKEN and Fujitsu will work toword 10Pflops with Open MPI continuously.

Here we can explain some parts of our MPI:

As page 13 of Koh Hotta's presentation shows, we extended OMPI
communication layers.

> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf
# Sorry, this figure is somewhat broken. Arrows point incorrect layers.

First, we created a new BTL component, 'tofu BTL'. It's not so special
one but dedicated to our Tofu interconnect. But its latency was not
enough for us.

So we created a new framework, 'LLP', and its component, 'tofu LLP'.
It bypasses request object creation in PML and BML/BTL, and sends
a message immediately if possible.

Also, we modified tuned COLL to implement interconnect-and-topology-
specific bcast/allgather/alltoall/allreduce algorithm. These algorithm
implementations also bypass PML/BML/BTL to eliminate protocol and software
overhead.

To achieve above, we created 'tofu COMMON', like sm (ompi/mca/common/sm/).

Is there interesting one?

Though our BTL and COLL are quite interconnect-specific, LLP may be
contributed in the future.

Regards,

Takahiro Kawashima,
MPI development team,
Fujitsu

> I echo what Ralph said -- congratulations!
> 
> Let us know when you'll be ready to contribute back what you can.
> 
> Thanks!
> 
> 
> On Jun 27, 2011, at 9:58 PM, Takahiro Kawashima wrote:
> 
> > Dear Open MPI community,
> > 
> > I'm a member of MPI library development team in Fujitsu. Shinji
> > Sumimoto, whose name appears in Jeff's blog, is one of our bosses.
> > 
> > As Rayson and Jeff noted, K computer, world's most powerful HPC system
> > developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI
> > library. We, Fujitsu, are pleased to announce that, and also have special
> > thanks to Open MPI community.
> > We are sorry to be late announce!
> > 
> > Our MPI library is based on Open MPI 1.4 series, and has a new point-
> > to-point component (BTL) and new topology-aware collective communication
> > algorithms (COLL). Also, it is adapted to our runtime environment (ESS,
> > PLM, GRPCOMM etc).
> > 
> > K computer connects 68,544 nodes by our custom interconnect.
> > Its runtime environment is our proprietary one. So we don't use orted.
> > We cannot tell start-up time yet because of disclosure restriction, sorry.
> > 
> > We are surprised by the extensibility of Open MPI, and have proved that
> > Open MPI is scalable to 68,000 processes level! We feel pleasure to
> > utilize such a great open-source software.
> > 
> > We cannot tell detail of our technology yet because of our contract
> > with RIKEN AICS, however, we will plan to feedback of our improvements
> > and bug fixes. We can contribute some bug fixes soon, however, for
> > contribution of our improvements will be next year with Open MPI
> > agreement.
> > 
> > Best regards,
> > 
> > MPI development team,
> > Fujitsu
> > 
> > 
> >> I got more information:
> >> 
> >>   http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/
> >> 
> >> Short version: yes, Open MPI is used on K and was used to power the 8PF 
> >> runs.
> >> 
> >> w00t!
> >> 
> >> 
> >> 
> >> On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote:
> >> 
> >>> w00t!  
> >>> 
> >>> OMPI powers 8 petaflops!
> >>> (at least I'm guessing that -- does anyone know if that's true?)
> >>> 
> >>> 
> >>> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote:
> >>> 
>  Interesting... page 11:
>  
>  http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf
>  
>  Open MPI based:
>  
>  * Open Standard, Open Source, Multi-Platform including PC Cluster.
>  * Adding extension to Open MPI for "Tofu" interconnect
>  
>  Rayson
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/