Re: [OMPI devel] SM component init unload

2012-07-06 Thread George Bosilca
You're right, the code was overzealous. I fix it by removing the parsing of the modex data completely. In any case, the collective module has another chance of deselecting itself, upon creation of a new communicator (thus, after the modex was completed). George On Jul 6, 2012, at 2:20,

Re: [OMPI devel] SM component init unload

2012-07-05 Thread Ralph Castain
George: is there any reason for opening and selecting the coll framework so early in mpi_init? I'm wondering if we can move that code to the end of the procedure so we wouldn't need the locality info until later. Sent from my iPad On Jul 5, 2012, at 10:05 AM, Jeff Squyres

Re: [OMPI devel] SM component init unload

2012-07-05 Thread Jeff Squyres
Thanks George. I filed https://svn.open-mpi.org/trac/ompi/ticket/3162 about this. On Jul 4, 2012, at 5:34 AM, Juan A. Rico wrote: > Thanks all of you for your time and early responses. > > After applying the patch, SM can be used by raising its priority. It is > enough for me (I hope so).

Re: [OMPI devel] SM component init unload

2012-07-04 Thread Juan A. Rico
Thanks all of you for your time and early responses. After applying the patch, SM can be used by raising its priority. It is enough for me (I hope so). But it continues failing when I specify --mca coll sm,self in the command line (with tuned too). I am not going to use this release in

Re: [OMPI devel] SM component init unload

2012-07-03 Thread Ralph Castain
Good catch, George - thanks for the detailed explanation. I think what happened here was that we changed the ordering in MPI_Init a while back - we had always had a rule about when MPI components could access remote proc info, but had grown lax about it, so at least some of the BTLs had to be

Re: [OMPI devel] SM component init unload

2012-07-03 Thread George Bosilca
Juan, Something weird is going on there. The selection mechanism for the SM coll and SM BTL should be very similar. However, the SM BTL successfully select itself while the SM coll fails to determine that all processes are local. In the coll SM the issue is that the remote procs do not have

Re: [OMPI devel] SM component init unload

2012-07-03 Thread Ralph Castain
Okay, please try this again with r26739 or above. You can remove the rest of the "verbose" settings and the --display-map so we declutter the output. Please add "-mca orte_nidmap_verbose 20" to your cmd line. Thanks! Ralph On Tue, Jul 3, 2012 at 1:50 PM, Juan A. Rico wrote: >

Re: [OMPI devel] SM component init unload

2012-07-03 Thread Ralph Castain
Rats - no help there. I'll add some debug to the code base tonight that will tell us more about what's going on here. On Jul 3, 2012, at 1:50 PM, Juan A. Rico wrote: > Here is the output. > > [jarico@Metropolis-01 examples]$ > /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec

Re: [OMPI devel] SM component init unload

2012-07-03 Thread Juan A. Rico
Here is the output. [jarico@Metropolis-01 examples]$ /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --bind-to-core --bynode --mca mca_base_verbose 100 --mca mca_coll_base_output 100 --mca coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca mca_verbose 100 --mca

Re: [OMPI devel] SM component init unload

2012-07-03 Thread Ralph Castain
Interesting - yes, coll sm doesn't think they are on the same node for some reason. Try adding -mca grpcomm_base_verbose 5 and let's see why On Jul 3, 2012, at 1:24 PM, Juan Antonio Rico Gallego wrote: > The code I run is a simple broadcast. > > When I do not specify components to run, the

Re: [OMPI devel] SM component init unload

2012-07-03 Thread Juan Antonio Rico Gallego
The code I run is a simple broadcast. When I do not specify components to run, the output is (more verbose): [jarico@Metropolis-01 examples]$ /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --mca mca_base_verbose 100 --mca mca_coll_base_output 100 --mca coll_sm_priority 99 -mca

Re: [OMPI devel] SM component init unload

2012-07-03 Thread Jeff Squyres
The issue is that the "sm" coll component only implements a few of the MPI collective operations. It is usually mixed at run-time with other coll components to fill out the rest of the MPI collective operations. So what is happening is that OMPI is determining that it doesn't have

Re: [OMPI devel] SM component init unload

2012-07-03 Thread Ralph Castain
Sounds strange - the locality is definitely being set in the code. Can you run it with -mca hwloc_base_verbose 5 --display-map? Should tell us where it thinks things are running, and what locality it is recording. On Jul 3, 2012, at 11:54 AM, Juan Antonio Rico Gallego wrote: > Hello everyone.

[OMPI devel] SM component init unload

2012-07-03 Thread Juan Antonio Rico Gallego
Hello everyone. Maybe you can help me: I got a subversion (r 26725) from the developers trunk. I configure with: ../../onecopy/ompi-trunk/configure --prefix=/home/jarico/shared/packages/openmpi-cas-dbg --disable-shared --enable-static --enable-debug --enable-mem-profile --enable-mem-debug