Agreed - I was simply trying to get it to build because it broke some
developers here.
On Aug 19, 2013, at 6:03 PM, "Jeff Squyres (jsquyres)"
wrote:
> It looks like https://svn.open-mpi.org/trac/ompi/changeset/29043 is a
> stopgap, but it is still definitely wrong.
>
>
On Aug 19, 2013, at 6:07 PM, "Jeff Squyres (jsquyres)"
wrote:
> On Aug 19, 2013, at 8:02 PM, Ralph Castain wrote:
>
>> That's how it works now. My concern is with the error message scenario.
>> IIRC, Jeff's issue was that the error message only
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Ralph,
On 12/08/13 06:17, Ralph Castain wrote:
> 1. Slurm has no direct knowledge or visibility into the
> application procs themselves when launched by mpirun. Slurm only
> sees the ORTE daemons. I'm sure that Slurm rolls up all the
> resources
Thanks for finding r27212. It was about a year ago, and had clearly fallen out
of my cache (I have very little to do with the openib BTL these days).
Your solution isn't correct, because HAVE_IBV_LINK_LAYER_ETHERNET is defined
(nor not) via this m4 macro in config/ompi_check_openfabrics.m4:
On Aug 19, 2013, at 8:02 PM, Ralph Castain wrote:
> That's how it works now. My concern is with the error message scenario. IIRC,
> Jeff's issue was that the error message only contains the hostname of the
> proc that generates it - it doesn't tell you the hostname of the
It looks like https://svn.open-mpi.org/trac/ompi/changeset/29043 is a stopgap,
but it is still definitely wrong.
The MPIT stuff does *not* compile the same way the C bindings compile. Here's
how the C bindings compile:
a) in ompi/mpi/c/profile: always compile libmpi_c_pmpi.la
b) in
That's how it works now. My concern is with the error message scenario. IIRC,
Jeff's issue was that the error message only contains the hostname of the proc
that generates it - it doesn't tell you the hostname of the remote proc. Hence,
we included that info in the proc_t.
However, IIRC we
Some networks need the name to resolve their own communication channel so they
will look it up (all based on IP). However, this is indeed not enough for all
cases. The general solution will be to set the proc hostname the first time the
peer information is looked up in the database
Hmmm...but then proc->hostname will *never* be filled in, because it is only
ever accessed in an error message - i.e., in opal_output and its variants.
If we are not going to retrieve it be default, then we need another solution
*if* we want hostnames for error messages under direct launch. If
That solution is fine with me.
-Nathan
On Tue, Aug 20, 2013 at 12:41:49AM +0200, George Bosilca wrote:
> If your offer is between quadratic and non-deterministic, I'll take the
> former.
>
> I would advocate for a middle-ground solution. Clearly document in the header
> file that the
If your offer is between quadratic and non-deterministic, I'll take the former.
I would advocate for a middle-ground solution. Clearly document in the header
file that the ompi_proc_get_hostname is __not__ safe to be used in all contexts
as it might exhibit recursive behavior due to
Understood - but George is correct in that a failure to find the hostname in
the db will create an infinite loop. Any thoughts on a reliable way to break it?
On Aug 19, 2013, at 2:52 PM, Nathan Hjelm wrote:
> It would require a db read from every rank which is what we are
It would require a db read from every rank which is what we are trying
to avoid. This scales quadratic at best on Cray systems.
-Nathan
On Mon, Aug 19, 2013 at 02:48:18PM -0700, Ralph Castain wrote:
> Yeah, I have some concerns about it too...been trying to test it out some
> more. Would be
Yeah, I have some concerns about it too...been trying to test it out some more.
Would be good to see just how much that one change makes - maybe restoring just
the hostname wouldn't have that big an impact.
I'm leery of trying to ensure we strip all the opal_output loops if we don't
find the
As a result of this patch the first decode of a peer host name might happen in
the middle of a debug message (on the first call to ompi_proc_get_hostname).
Such a behavior might generate deadlocks based on the level of output
verbosity, and has significant potential to reintroduce the recursive
> -Original Message-
> From: Steve Wise [mailto:sw...@opengridcomputing.com]
> Sent: Monday, August 19, 2013 4:02 PM
> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)'
> Cc: 'Indranil Choudhury'
> Subject: RE: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
>
> I guess
I guess HAVE_IBV_LINK_LAYER_ETHERNET is guarding against a libibverbs that
doesn't have
IBV_LINK_LAYER_ETHERNET defined. So the proper fix, I think, is to enhance
configure to check this
and #define HAVE_IBV_LINK_LAYER_ETHERNET if it exists. Or have it check
existence of a link_layer
field
This patch fixes iwarp. dunno if it breaks RoCE though :)
[root@r9 ompi-trunk]# svn diff
Index: ompi/mca/btl/openib/connect/btl_openib_connect_oob.c
===
--- ompi/mca/btl/openib/connect/btl_openib_connect_oob.c(revision
> > I could if I had a patch/fix. :) I don't (yet) understand why
> > HAVE_IBV_LINK_LAYER_ETHERNET was
> added.
> > Can the developer who made these changes explain the intent? I think it
> > might have to do with
RoCE
> > support.
> >
>
> Seems like there should be some change to configure
> -Original Message-
> From: Steve Wise [mailto:sw...@opengridcomputing.com]
> Sent: Monday, August 19, 2013 3:25 PM
> To: 'Jeff Squyres (jsquyres)'
> Cc: 'Open MPI Developers'; 'Indranil Choudhury'
> Subject: RE: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
>
>
>
> >
> -Original Message-
> From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com]
> Sent: Monday, August 19, 2013 3:23 PM
> To: Steve Wise
> Cc: Open MPI Developers; Indranil Choudhury
> Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
>
> No need to both post to the
No need to both post to the ticket and to devel -- just pick one. :-)
Can you send a patch/fix?
On Aug 19, 2013, at 4:17 PM, Steve Wise wrote:
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise
>> Sent:
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise
> Sent: Monday, August 19, 2013 2:42 PM
> To: 'Open MPI Developers'; 'Jeff Squyres (jsquyres)'
> Cc: 'Indranil Choudhury'
> Subject: Re: [OMPI devel] openmpi-1.7.2 fails to use the RDMACM CPC
>
I confirmed that this is a regression from 1.7.1...
I'll see if I can figure out what's going on...
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Steve Wise
> Sent: Monday, August 19, 2013 12:15 PM
> To: 'Jeff Squyres (jsquyres)'
> Cc:
> -Original Message-
> From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com]
> Sent: Monday, August 19, 2013 12:06 PM
> To: Steve Wise
> Cc:
> Subject: Re: openmpi-1.7.2 fails to use the RDMACM CPC
>
> Not offhand.
>
> Given the lack of iWARP testing in the
Because according to Martin there needed to be profiling versions of all
the MPI_T_* functions. I used the c bindings profiling as an base for
implementing the functions for MPI_T. Not sure what went wrong but I never
tried building without profiling support.
-Nathan
On Mon, Aug 19, 2013 at
Hello,
I just tried to run openmpi-1.7.2 over chelsio's IWARP device, and it no longer
works. It appears
that 1.7.2 fails to use the RDMACM CPC. I guess it is trying to use OOB, which
is IB-specific. If
I explicitly specify the RDMACM CPC via '--mca btl_openib_cpc_include rdmacm'
then it
@Nathan --
Why is the MPIT convenience library related to whether profiling is enabled or
not?
Begin forwarded message:
> From:
> Subject: [OMPI svn-full] svn:open-mpi r29043 - in trunk: config ompi
> Date: August 19, 2013 10:48:24 AM EDT
> To:
Make it so.
:p
On Aug 19, 2013, at 10:13 AM, Ralph Castain wrote:
> yeah, i'm trying to fix it - could use your help when you quit your manager
> impersonation
>
> On Aug 19, 2013, at 7:04 AM, Jeff Squyres (jsquyres)
> wrote:
>
>> Did something
yeah, i'm trying to fix it - could use your help when you quit your manager
impersonation
On Aug 19, 2013, at 7:04 AM, Jeff Squyres (jsquyres) wrote:
> Did something happen to break the trunk recently?
>
> -
> [7:03] savbu-usnic-a:~/s/o/o/t/ompi_info ❯❯❯ make
> CC
Did something happen to break the trunk recently?
-
[7:03] savbu-usnic-a:~/s/o/o/t/ompi_info ❯❯❯ make
CC ompi_info.o
CC param.o
CC components.o
CC version.o
CCLD ompi_info
../../../ompi/.libs/libmpi.so: undefined reference to `mpit_unlock'
31 matches
Mail list logo