Re: [OMPI devel] modex getting corrupted

2016-05-23 Thread dpchoudh .
Hello Ralph and all Please ignore this mail. It is indeed due to a syntax error in my code. Sorry for the noise; I'll be more careful with my homework from now on. Best regards Durga We learn from history that we never learn from history. On Mon, May 23, 2016 at 2:13 AM, dpchoudh .

Re: [OMPI devel] modex getting corrupted

2016-05-23 Thread dpchoudh .
Hello Ralph Thanks for your input. The routine that does the send is this: static int btl_lf_modex_send(lfgroup lfgroup) { char *grp_name = lf_get_group_name(lfgroup, NULL, 0); btl_lf_modex_t lf_modex; int rc; strncpy(lf_modex.grp_name, grp_name, GRP_NAME_MAX_LEN);

Re: [OMPI devel] modex getting corrupted

2016-05-21 Thread Ralph Castain
Please provide the exact code used for both send/recv - you likely have an error in the syntax > On May 20, 2016, at 9:36 PM, dpchoudh . wrote: > > Hello all > > I have a naive question: > > My 'cluster' consists of two nodes, connected back to back with a proprietary >

[OMPI devel] modex getting corrupted

2016-05-21 Thread dpchoudh .
Hello all I have a naive question: My 'cluster' consists of two nodes, connected back to back with a proprietary link as well as GbE (over a switch). I am calling OPAL_MODEX_SEND() and the modex consists of just this: struct modex {char name[20], unsigned mtu}; The mtu field is not currently

Re: [OMPI devel] modex receive

2016-04-29 Thread dpchoudh .
Hello Ralph and Gilles Thanks for the clarification. My understanding was that if a BTL was specified to mpirun, then only BTL (and, therefore, the ob1 PML) will be used. However, I always saw that is not the case and now I know why. I do have PSM capable cards (Qlogic IB) in my nodes, and this

Re: [OMPI devel] modex receive

2016-04-29 Thread Gilles Gouaillardet
my basic understanding is that ob1 works with btl, and cm works with mtl (please someone corrects me if I am wrong) an other way to put this is cm cannot use the tcp btl. so I can only guess one mtl (PSM ?) is available, and so cm is preferred over ob1. what if you mpirun --mca mtl ^psm ... is

Re: [OMPI devel] modex receive

2016-04-29 Thread Ralph Castain
CM is not being selected for TCP - you specified TCP for the BTLs, but that assumes that a BTL will be selected. You obviously have something in your system that is supported by an MTL, and that will always be selected before a BTL. > On Apr 28, 2016, at 8:22 PM, dpchoudh .

Re: [OMPI devel] modex receive

2016-04-29 Thread dpchoudh .
Hello Gilles You are absolutely right: 1. Adding --mca pml_base_verbose 100 does show that it is the cm PML that is being picked by default (even for TCP) 2. Adding --mca pml ob1 does cause add_procs() and related BTL friends to be invoked. With a command line of mpirun -np 2 -hostfile

Re: [OMPI devel] modex receive

2016-04-28 Thread George Bosilca
In Open MPI a process only retrieve information about a peer if they communicate. Thus, the add_proc is called from the two sides of a connection establishment, when locally a connection is decided or when a network packet requires a the existence of a proc (for the initiator of the connection).

Re: [OMPI devel] modex receive

2016-04-28 Thread Gilles Gouaillardet
the add_procs subroutine of the btl should be called. /* i added a printf in mca_btl_tcp_add_procs and it *is* invoked */ can you try again with --mca pml ob1 --mca pml_base_verbose 100 ? maybe the add_procs subroutine is not invoked because openmpi uses cm instead of ob1 Cheers, Gilles

[OMPI devel] modex receive

2016-04-28 Thread dpchoudh .
Hello all I am struggling with this issue for last few days and thought it would be prudent to ask for help from people who have way more experience than I do. There are two questions, interrelated in my mind, but may not be so in reality. Question 2 is the issue I am struggling with, and

[OMPI devel] Modex-less operations: setting options

2015-11-10 Thread Ralph Castain
Yo folks We discussed this a bit at last week’s developer telecon, and so I’m attempting to capture the options/plans as they were discussed so that others may chime in with suggestions. Several new capabilities have been added to OMPI in recent months, all focused on exascale operations.

Re: [OMPI devel] Modex

2012-06-19 Thread George Bosilca
On Jun 13, 2012, at 06:09 , Ralph Castain wrote: > George raised something during this morning's call that I wanted to follow-up > on relating to improving our modex operation. I've been playing with an > approach that sounded similar to what he suggested, and perhaps we could > pursue it in

Re: [OMPI devel] Modex

2012-06-15 Thread Josh Hursey
iations for different job scenarios. >> >> Rich >> >> -Original Message- >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On >> Behalf Of Ralph Castain >> Sent: Wednesday, June 13, 2012 12:10 AM >> To: Open MPI Developers

Re: [OMPI devel] Modex

2012-06-13 Thread Richard Graham
ode. Rich -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Wednesday, June 13, 2012 9:09 AM To: Open MPI Developers Subject: Re: [OMPI devel] Modex ? I'm talking about how to implement it, not what level holds the

Re: [OMPI devel] Modex

2012-06-13 Thread Shamis, Pavel
> > We currently block on exchange of contact information for the BTL's when we > perform an all-to-all operation we term the "modex". Do we have to do all-to-all or allgather ? allgather should be enough ... > At the end of that operation, each process constructs a list of information > for

Re: [OMPI devel] Modex

2012-06-13 Thread Ralph Castain
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Wednesday, June 13, 2012 12:10 AM > To: Open MPI Developers > Subject: [OMPI devel] Modex > > George raised something during this morning's call that I wanted to fo

[OMPI devel] Modex

2012-06-13 Thread Ralph Castain
George raised something during this morning's call that I wanted to follow-up on relating to improving our modex operation. I've been playing with an approach that sounded similar to what he suggested, and perhaps we could pursue it in accordance with moving the BTL's to OPAL. We currently

[OMPI devel] Modex-less launch

2010-02-18 Thread Ralph Castain
Hi folks I've had a few recent inquiries about more scalable launch methods for OMPI. It rapidly became clear that I had never documented the modex-less launch operations in the OMPI trunk, and that many people were unaware of their existence. So...I finally wrote a wiki page on the subject:

Re: [OMPI devel] Modex and others

2008-11-14 Thread Jeff Squyres
Hmm. I'm not sure the BML is the right place to do this. The BML doesn't know anything about the internals of the BTLs; it's just a dispatch / multiplexer. Unfortunately, few of us are in a good place to respond at the moment -- SC is next week and we're all hosed trying to get ready for

Re: [OMPI devel] Modex and others

2008-11-13 Thread Ralph Castain
If you look at the Dec meeting wiki, you will see that we are moving quickly to a modex-less launch anyway. It won't be the default because it requires pre-discovery of the cluster's network resources (for which we will provide a tool or method), but it will help resolve some of these

Re: [OMPI devel] Modex and others

2008-11-13 Thread Leonardo Fialho
Jeff, I agree with your viewpoint, principally about the "reachability". But... Looking from the FT viewpoint, sometimes (or some FT architectures), wants to recover an application process on other node different from the first. In this case a new modex should be called. It's fine for

Re: [OMPI devel] Modex and others

2008-11-07 Thread Jeff Squyres
On Nov 7, 2008, at 10:18 AM, Leonardo Fialho wrote: I understand that a process need to have the contact information to send MPI messages to other processes, and modex permits it. My question is, why do not perform the contact exchange when it is necessary? For example: in a M/W

[OMPI devel] Modex and others

2008-11-07 Thread Leonardo Fialho
Hi All, I understand that a process need to have the contact information to send MPI messages to other processes, and modex permits it. My question is, why do not perform the contact exchange when it is necessary? For example: in a M/W application, the workers does not need more information

Re: [OMPI devel] Modex

2007-06-28 Thread Jeff Squyres
Awesome; ditto. On Jun 27, 2007, at 4:19 PM, Terry D. Dontje wrote: Cool this sounds good enough to me. --td Brian Barrett wrote: THe function name changes are pretty obvious (s/mca_pml_base/ompi/), and I thought I'd try something new and actually document the interface in the header file

Re: [OMPI devel] Modex

2007-06-27 Thread Brian Barrett
THe function name changes are pretty obvious (s/mca_pml_base/ompi/), and I thought I'd try something new and actually document the interface in the header file :). So we should be good on that front. Brian On Jun 27, 2007, at 6:38 AM, Terry D. Dontje wrote: I am ok with the following as

Re: [OMPI devel] Modex

2007-06-27 Thread Terry D. Dontje
I am ok with the following as long as we can have some sort of documenation describing what changed like which old functions are replaced with newer functions and any description of changed assumptions. --td Brian Barrett wrote: On Jun 26, 2007, at 6:08 PM, Tim Prins wrote: Some time ago

Re: [OMPI devel] Modex

2007-06-26 Thread Brian Barrett
On Jun 26, 2007, at 6:08 PM, Tim Prins wrote: Some time ago you were working on moving the modex out of the pml and cleaning it up a bit. Is this work still ongoing? The reason I ask is that I am currently working on integrating the RSL, and would rather build on the new code rather than