You're right, the code was overzealous. I fix it by removing the parsing of the
modex data completely. In any case, the collective module has another chance of
deselecting itself, upon creation of a new communicator (thus, after the modex
was completed).
George
On Jul 6, 2012, at 2:20,
George: is there any reason for opening and selecting the coll framework so
early in mpi_init? I'm wondering if we can move that code to the end of the
procedure so we wouldn't need the locality info until later.
Sent from my iPad
On Jul 5, 2012, at 10:05 AM, Jeff Squyres
Thanks George. I filed https://svn.open-mpi.org/trac/ompi/ticket/3162 about
this.
On Jul 4, 2012, at 5:34 AM, Juan A. Rico wrote:
> Thanks all of you for your time and early responses.
>
> After applying the patch, SM can be used by raising its priority. It is
> enough for me (I hope so).
Thanks all of you for your time and early responses.
After applying the patch, SM can be used by raising its priority. It is enough
for me (I hope so). But it continues failing when I specify --mca coll sm,self
in the command line (with tuned too).
I am not going to use this release in
Good catch, George - thanks for the detailed explanation. I think what
happened here was that we changed the ordering in MPI_Init a while back -
we had always had a rule about when MPI components could access remote proc
info, but had grown lax about it, so at least some of the BTLs had to be
Juan,
Something weird is going on there. The selection mechanism for the SM coll and
SM BTL should be very similar. However, the SM BTL successfully select itself
while the SM coll fails to determine that all processes are local.
In the coll SM the issue is that the remote procs do not have
Okay, please try this again with r26739 or above. You can remove the rest
of the "verbose" settings and the --display-map so we declutter the output.
Please add "-mca orte_nidmap_verbose 20" to your cmd line.
Thanks!
Ralph
On Tue, Jul 3, 2012 at 1:50 PM, Juan A. Rico wrote:
>
Rats - no help there. I'll add some debug to the code base tonight that will
tell us more about what's going on here.
On Jul 3, 2012, at 1:50 PM, Juan A. Rico wrote:
> Here is the output.
>
> [jarico@Metropolis-01 examples]$
> /home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec
Here is the output.
[jarico@Metropolis-01 examples]$
/home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --bind-to-core
--bynode --mca mca_base_verbose 100 --mca mca_coll_base_output 100 --mca
coll_sm_priority 99 -mca hwloc_base_verbose 90 --display-map --mca mca_verbose
100 --mca
Interesting - yes, coll sm doesn't think they are on the same node for some
reason. Try adding -mca grpcomm_base_verbose 5 and let's see why
On Jul 3, 2012, at 1:24 PM, Juan Antonio Rico Gallego wrote:
> The code I run is a simple broadcast.
>
> When I do not specify components to run, the
The code I run is a simple broadcast.
When I do not specify components to run, the output is (more verbose):
[jarico@Metropolis-01 examples]$
/home/jarico/shared/packages/openmpi-cas-dbg/bin/mpiexec --mca mca_base_verbose
100 --mca mca_coll_base_output 100 --mca coll_sm_priority 99 -mca
The issue is that the "sm" coll component only implements a few of the MPI
collective operations. It is usually mixed at run-time with other coll
components to fill out the rest of the MPI collective operations.
So what is happening is that OMPI is determining that it doesn't have
Sounds strange - the locality is definitely being set in the code. Can you run
it with -mca hwloc_base_verbose 5 --display-map? Should tell us where it thinks
things are running, and what locality it is recording.
On Jul 3, 2012, at 11:54 AM, Juan Antonio Rico Gallego wrote:
> Hello everyone.
Hello everyone. Maybe you can help me:
I got a subversion (r 26725) from the developers trunk. I configure with:
../../onecopy/ompi-trunk/configure
--prefix=/home/jarico/shared/packages/openmpi-cas-dbg --disable-shared
--enable-static --enable-debug --enable-mem-profile --enable-mem-debug
14 matches
Mail list logo