[OMPI devel] circular library dependence prevents static link on Solaris-10/SPARC
Testing r32448 on trunk for trac issue #4834, I encounter the following which appears unrelated to #4834: CCLD orte-info Undefined first referenced symbol in file ompi_proc_local_proc /sandbox/hargrove/OMPI/openmpi-trunk-solaris10-sparcT2-ss12u3-v9-static/BLD/opal/.libs/libopen-pal.a(libmca_btl_sm_la-btl_sm_component.o) ld: fatal: Symbol referencing errors. No output written to orte-info Note that this is *static* linking. This appears to indicate a call from OPAL to OMPI, and I am guessing this is a side-effect of the BTL move. Since OMPI contains (many) calls to OPAL this is a circular library dependence. Unfortunately, some linkers process their argument strictly left-to-right. Thus if this dependence is not eliminated one may need "-lmpi -lopen-pal -lmpi" (or similar) to resolve it. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.8.2 still held up...
On Thu, Aug 7, 2014 at 8:03 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > > * static linking failure - Gilles has posted a proposed fix, but > somebody needs to approve and CMR it. Please see: > > https://svn.open-mpi.org/trac/ompi/ticket/4834 > > Jeff made a better fix (r32447) to which i added a minor correction > (r32448). > as far as i am concerned, i am fine with to approve #4841 > > that being said, per Jeff's commit log : > "This needs to soak for a day or two on the trunk before moving to the > v1.8 branch" > > so you might want to wait a bit ... > I trust Jeff's judgment on the waiting (or not), but can report that except for an unrelated issue on Solaris-10/SPARC (email coming soon) the changes in r32447+r32448 resolve the issue on all the OSes I test. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.8.2 still held up...
On Thu, Aug 7, 2014 at 8:03 PM, Gilles Gouaillardet < gilles.gouaillar...@iferc.org> wrote: > > * Siegmar reports another alignment issue on Sparc > > http://www.open-mpi.org/community/lists/users/2014/07/24869.php > > > Is there any chance r32449 fixes the issue ? > > i found the problem on Solaris/x86_64 but i have no way to test it on a > Solaris/sparc box. > I have Solaris-10/SPARC, just as Siegmar reports using. However, I don't have gcc-4.9.0 and doubt I can build it myself. I will see if I can reproduce the problem with 1.8.2rc2 or rc3. If so, then I'll give r32449 a try. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.8.2 still held up...
Ralph and all, > * static linking failure - Gilles has posted a proposed fix, but somebody > needs to approve and CMR it. Please see: > https://svn.open-mpi.org/trac/ompi/ticket/4834 Jeff made a better fix (r32447) to which i added a minor correction (r32448). as far as i am concerned, i am fine with to approve #4841 that being said, per Jeff's commit log : "This needs to soak for a day or two on the trunk before moving to the v1.8 branch" so you might want to wait a bit ... > * Siegmar reports another alignment issue on Sparc > http://www.open-mpi.org/community/lists/users/2014/07/24869.php > Is there any chance r32449 fixes the issue ? i found the problem on Solaris/x86_64 but i have no way to test it on a Solaris/sparc box. Cheers, Gilles
Re: [OMPI devel] v1.8.2 still held up...
On Aug 7, 2014, at 1:55 PM, Ralph Castain wrote: > * static linking failure - Gilles has posted a proposed fix, but somebody > needs to approve and CMR it. Please see: > https://svn.open-mpi.org/trac/ompi/ticket/4834 Sorry for the hold up. I just replied on 4834; I'm working on a new patch now. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] v1.8.2 still held up...
Ralph, I will hopefully be able to test Gilles's patch for 4834 on applicable systems today or tomorrow. So, I can soon answer whether the patch fixes all the problems I reported. However, I cannot speak at all to the desirability of the approach relative to the build infrastructure. I think Jeff may be best qualified to make that judgement. -Paul On Thu, Aug 7, 2014 at 10:55 AM, Ralph Castain wrote: > Hey folks > > I *really* need your help to get this release out the door. It remains > stuck on two things: > > * static linking failure - Gilles has posted a proposed fix, but somebody > needs to approve and CMR it. Please see: > https://svn.open-mpi.org/trac/ompi/ticket/4834 > > * fixes to coll/ml that expanded to fixing page alignment in general - > someone needs to review/approve it: > https://svn.open-mpi.org/trac/ompi/ticket/4826 > > > We also have three outstanding issues that may not make 1.8.2: > > * MPI-I/O issues - looks like ROMIO needs some patches, and OMPIO may have > an issue: > http://www.open-mpi.org/community/lists/users/2014/08/24934.php > > * Siegmar reports another alignment issue on Sparc > http://www.open-mpi.org/community/lists/users/2014/07/24869.php > > * Siegmar reports an issue that looks like it relates to handling of > boolean MCA params: > http://www.open-mpi.org/community/lists/users/2014/08/24944.php > > > Can someone *please* help out with these? > Ralph > > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/08/15533.php > -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.8.2 still held up...
Hi Ralph, I'll review 4826 as proxy for hjelmn. I'm just checking that it builds on my system before saying okay. Howard From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, August 07, 2014 11:55 AM To: Open MPI Developers Subject: [OMPI devel] v1.8.2 still held up... Hey folks I *really* need your help to get this release out the door. It remains stuck on two things: * static linking failure - Gilles has posted a proposed fix, but somebody needs to approve and CMR it. Please see: https://svn.open-mpi.org/trac/ompi/ticket/4834 * fixes to coll/ml that expanded to fixing page alignment in general - someone needs to review/approve it: https://svn.open-mpi.org/trac/ompi/ticket/4826 We also have three outstanding issues that may not make 1.8.2: * MPI-I/O issues - looks like ROMIO needs some patches, and OMPIO may have an issue: http://www.open-mpi.org/community/lists/users/2014/08/24934.php * Siegmar reports another alignment issue on Sparc http://www.open-mpi.org/community/lists/users/2014/07/24869.php * Siegmar reports an issue that looks like it relates to handling of boolean MCA params: http://www.open-mpi.org/community/lists/users/2014/08/24944.php Can someone *please* help out with these? Ralph
[OMPI devel] v1.8.2 still held up...
Hey folks I *really* need your help to get this release out the door. It remains stuck on two things: * static linking failure - Gilles has posted a proposed fix, but somebody needs to approve and CMR it. Please see: https://svn.open-mpi.org/trac/ompi/ticket/4834 * fixes to coll/ml that expanded to fixing page alignment in general - someone needs to review/approve it: https://svn.open-mpi.org/trac/ompi/ticket/4826 We also have three outstanding issues that may not make 1.8.2: * MPI-I/O issues - looks like ROMIO needs some patches, and OMPIO may have an issue: http://www.open-mpi.org/community/lists/users/2014/08/24934.php * Siegmar reports another alignment issue on Sparc http://www.open-mpi.org/community/lists/users/2014/07/24869.php * Siegmar reports an issue that looks like it relates to handling of boolean MCA params: http://www.open-mpi.org/community/lists/users/2014/08/24944.php Can someone *please* help out with these? Ralph
Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins
Ralph and George, here are attached two patches : - heterogeneous.v1.patch : a cleanup of the previous patch - heterogeneous.v2.patch : a new patch based on Ralph suggestion. i made the minimal changes to move jobid and vpid into the OPAL layer. Cheers, Gilles On 2014/08/07 11:27, Ralph Castain wrote: > Are we maybe approaching this from the wrong direction? I ask because we had > to do some gyrations in the pmix framework to work around the difference in > naming schemes between OPAL and the rest of the code base, and now we have > more gyrations here. > > Given that the MPI and RTE layers both rely on the structured form of the > name, what about if we just mimic that down in OPAL? I think we could perhaps > do this in a way that still allows someone to overlay it with a 64-bit > unstructured identifier if they want, but that would put the extra work on > their side. In other words, we make it easy to work with the other parts of > our own code base, acknowledging that those wanting to do something else may > have to do some extra work. > > I ask because every resource manager out there assigns each process a jobid > and vpid in some form of integer format. So we have to absorb that > information in {jobid, vpid} format regardless of what we may want to do > internally. What we now have to do is immediately convert that into the > unstructured form for OPAL (where we take it in via PMI), then convert it > back to structured form when passing it up to ORTE so it can be handed to > OMPI, and then convert it back to unstructured form every time either OMPI or > ORTE accesses the OPAL layer. > > Seems awfully convoluted and error prone. Simplifying things for ourselves > might make more sense. > > > On Aug 6, 2014, at 1:21 PM, George Bosilca wrote: > >> Gilles, >> >> This looks right. It is really unfortunately that we have to change the >> definition of orte_process_name_t for big endian architectures, but I don't >> think there is a way around. >> >> Regarding your patch I have two comments: >> 1. There is a flagrant lack of comments ... especially on the ORTE side >> 2. at the OPAL level we are really implementing a htonll, and I really think >> we should stick to the POSIX prototype (aka. returning the changes value >> instead of doing things inplace). >> >> George. >> >> >> >> On Wed, Aug 6, 2014 at 7:02 AM, Gilles Gouaillardet >> wrote: >> Ralph and George, >> >> here is attached a patch that fixes the heterogeneous support without the >> abstraction violation. >> >> Cheers, >> >> Gilles >> >> >> On 2014/08/06 9:40, Gilles Gouaillardet wrote: >>> hummm >>> >>> i intentionally did not swap the two 32 bits (!) >>> >>> from the top level, what we have is : >>> >>> typedef struct { >>>union { >>> uint64_t opal; >>> struct { >>>uint32_t jobid; >>>uint32_t vpid; >>>} orte; >>> } meta_process_name_t; >>> >>> OPAL is agnostic about jobid and vpid. >>> jobid and vpid are set in ORTE/MPI and OPAL is used only >>> to transport the 64 bits >>> /* opal_process_name_t and orte_process_name_t are often casted into each >>> other */ >>> at ORTE/MPI level, jobid and vpid are set individually >>> /* e.g. we do *not* do something like opal = jobid | (vpid<<32) */ >>> this is why everything works fine on homogeneous clusters regardless >>> endianness. >>> >>> now in heterogeneous cluster, thing get a bit trickier ... >>> >>> i was initially unhappy with my commit and i think i found out why : >>> this is an abstraction violation ! >>> the two 32 bits are not swapped by OPAL because this is what is expected by >>> the ORTE/OMPI. >>> >>> now i d like to suggest the following lightweight approach : >>> >>> at OPAL, use #if protected htonll/ntohll >>> (e.g. swap the two 32bits) >>> >>> do the trick at the ORTE level : >>> >>> simply replace >>> >>> struct orte_process_name_t { >>> orte_jobid_t jobid; >>> orte_vpid_t vpid; >>> }; >>> >>> with >>> >>> #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN) >>> struct orte_process_name_t { >>> orte_vpid_t vpid; >>> orte_jobid_t jobid; >>> }; >>> #else >>> struct orte_process_name_t { >>> orte_jobid_t jobid; >>> orte_vpid_t vpid; >>> }; >>> #endif >>> >>> >>> so we keep OPAL agnostic about how the uint64_t is really used at the upper >>> level. >>> an other option is to make OPAL aware of jobid and vpid but this is a bit >>> more heavyweight imho. >>> >>> i'll try this today and make sure it works. >>> >>> any thoughts ? >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> On Wed, Aug 6, 2014 at 8:17 AM, Ralph Castain wrote: >>> Ah yes, so it is - sorry I missed that last test :-/ On Aug 5, 2014, at 10:50 AM, George Bosilca wrote: The code committed by Gilles is correctly protected for big endian ( https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely pointing out that I think he should also swap the 2 32 bits in his impleme