[OMPI devel] Warnings in sctp BTL
Hi folks Not sure who is maintaining the SCTP BTL, but I found the following warnings when building tonight: btl_sctp_frag.c: In function `mca_btl_sctp_frag_large_send': btl_sctp_frag.c:179: warning: int format, different type arg (arg 3) btl_sctp_frag.c:179: warning: int format, different type arg (arg 5) btl_sctp_frag.c: In function `mca_btl_sctp_frag_send': btl_sctp_frag.c:303: warning: int format, different type arg (arg 3) btl_sctp_frag.c:303: warning: int format, different type arg (arg 5) btl_sctp_endpoint.c: In function `mca_btl_sctp_endpoint_recv_connect_ack': btl_sctp_endpoint.c:841: warning: too few arguments for format btl_sctp_proc.c: In function `mca_btl_sctp_proc_create': btl_sctp_proc.c:147: warning: int format, different type arg (arg 2) Could you please clean this up? Thanks! Ralph
Re: [OMPI devel] Warnings in sctp BTL
hey Ralph, At UBC, we are trying to find a new student who can maintain the SCTP BTL. Unfortunately, it is has not been maintained since the progress engine overhaul a while ago. At the moment, this is still on the TODO list. I hope to get to this myself, if no student is found. It was my impression that the SCTP BTL wasn't included in any release by default. I hope that this is still the case brad On Wed, May 13, 2009 at 8:15 PM, Ralph Castainwrote: > Hi folks > > Not sure who is maintaining the SCTP BTL, but I found the following warnings > when building tonight: > > btl_sctp_frag.c: In function `mca_btl_sctp_frag_large_send': > btl_sctp_frag.c:179: warning: int format, different type arg (arg 3) > btl_sctp_frag.c:179: warning: int format, different type arg (arg 5) > btl_sctp_frag.c: In function `mca_btl_sctp_frag_send': > btl_sctp_frag.c:303: warning: int format, different type arg (arg 3) > btl_sctp_frag.c:303: warning: int format, different type arg (arg 5) > btl_sctp_endpoint.c: In function `mca_btl_sctp_endpoint_recv_connect_ack': > btl_sctp_endpoint.c:841: warning: too few arguments for format > btl_sctp_proc.c: In function `mca_btl_sctp_proc_create': > btl_sctp_proc.c:147: warning: int format, different type arg (arg 2) > > Could you please clean this up? > > Thanks! > Ralph > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >
Re: [OMPI devel] Warnings in sctp BTL
On May 14, 2009, at 1:14 AM, Brad Penoff wrote: At UBC, we are trying to find a new student who can maintain the SCTP BTL. Unfortunately, it is has not been maintained since the progress engine overhaul a while ago. At the moment, this is still on the TODO list. I hope to get to this myself, if no student is found. Thanks; that would be most useful. Ralph -- did these messages come in due to the opal_attribute changes from last night? It was my impression that the SCTP BTL wasn't included in any release by default. I hope that this is still the case Correct; it is .ompi_ignore'd in the v1.3 SVN tree. -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Warnings in sctp BTL
I'm not entirely sure as I'm unclear as to when this component would attempt to build. I was building the latest trunk on a new (to me) system last night (Jeff's cluster) when I saw the warnings. Jeff: have you seen them before on your cluster? On Thu, May 14, 2009 at 7:22 AM, Jeff Squyreswrote: > On May 14, 2009, at 1:14 AM, Brad Penoff wrote: > > At UBC, we are trying to find a new student who can maintain the SCTP >> BTL. Unfortunately, it is has not been maintained since the progress >> engine overhaul a while ago. At the moment, this is still on the TODO >> list. I hope to get to this myself, if no student is found. >> >> > Thanks; that would be most useful. > > Ralph -- did these messages come in due to the opal_attribute changes from > last night? > > It was my impression that the SCTP BTL wasn't included in any release >> by default. I hope that this is still the case >> >> > Correct; it is .ompi_ignore'd in the v1.3 SVN tree. > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] Warnings in sctp BTL
Hi Jeff, On Thursday 14 May 2009 09:22:18 am Jeff Squyres wrote: > Ralph -- did these messages come in due to the opal_attribute changes > from last night? They certainly are due to adding the __opal_attribute_format__ changes. A similar patch as btl_tcp_frag should be applied... Thanks, Rainer -- Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008AIM/Skype: rusraink
Re: [OMPI devel] Warnings in sctp BTL
Yes, I do. All of them are on BTL_ERROR lines; I think these came in last night with the opal attribute updates. Looking at last night's MTT, those attribute changes turned up a LOT of warnings in various BTLs... Doh! On May 14, 2009, at 9:34 AM, Ralph Castain wrote: I'm not entirely sure as I'm unclear as to when this component would attempt to build. I was building the latest trunk on a new (to me) system last night (Jeff's cluster) when I saw the warnings. Jeff: have you seen them before on your cluster? On Thu, May 14, 2009 at 7:22 AM, Jeff Squyreswrote: On May 14, 2009, at 1:14 AM, Brad Penoff wrote: At UBC, we are trying to find a new student who can maintain the SCTP BTL. Unfortunately, it is has not been maintained since the progress engine overhaul a while ago. At the moment, this is still on the TODO list. I hope to get to this myself, if no student is found. Thanks; that would be most useful. Ralph -- did these messages come in due to the opal_attribute changes from last night? It was my impression that the SCTP BTL wasn't included in any release by default. I hope that this is still the case Correct; it is .ompi_ignore'd in the v1.3 SVN tree. -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn] svn:open-mpi r21234
Hey Ralph, On Wednesday 13 May 2009 10:54:43 pm Ralph Castain wrote: > This generated a bunch of warnings - the "z" length modifier is not a > generally supported option, which is why we do not use it. I see You compile with -pedantic? > btl_tcp_frag.c: In function ‘mca_btl_tcp_frag_send’: > btl_tcp_frag.c:115: warning: ISO C90 does not support the ‘z’ printf > length modifier > ... see below. > btl_tcp_component.c: In function ‘mca_btl_tcp_component_create_listen’: > btl_tcp_component.c:682: warning: too many arguments for format This is the nice part of the attribute-fix: showing mistakes like this... Fixed... > Could you please fix this? If you want to deal with the size_t 32/64 > bit issues, there is another little macro thingy that we created for > just that purpose (someone else here undoubtedly remembers it). Well, does anyone now the "little macro thingy" ;-) I failed to find it... However, in other parts of the code base, we just use "%lu" and cast to unsigned long... Thoughts? Thanks, Rainer -- Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008AIM/Skype: rusraink
Re: [OMPI devel] [OMPI svn] svn:open-mpi r21234
On May 14, 2009, at 9:57 AM, Rainer Keller wrote: > This generated a bunch of warnings - the "z" length modifier is not a > generally supported option, which is why we do not use it. I see You compile with -pedantic? Ya, configure adds that automatically if you --enable-picky. > btl_tcp_component.c: In function ‘mca_btl_tcp_component_create_listen’: > btl_tcp_component.c:682: warning: too many arguments for format This is the nice part of the attribute-fix: showing mistakes like this... Fixed... Yep; it is good to expose all of these. Annoying to fix, but they should be fixed. :-) > Could you please fix this? If you want to deal with the size_t 32/64 > bit issues, there is another little macro thingy that we created for > just that purpose (someone else here undoubtedly remembers it). Well, does anyone now the "little macro thingy" ;-) I failed to find it... However, in other parts of the code base, we just use "%lu" and cast to unsigned long... Unfortunately, I think that's the best we ever came up with: https://svn.open-mpi.org/trac/ompi/wiki/PrintfCodes -- Jeff Squyres Cisco Systems
[OMPI devel] Build failures on trunk? r21235
All, After a svn update earlier I'm getting build failures on the trunk, I've tried the usual including a full clean checkout and am still getting the errors. I'm not doing anything special other than a VPATH build and this same tree build last week, it's just the update that appears to have broken things. The configure line used was ~/code/OpenMPI/ompi-trunk-tes/trunk/configure --enable-mpirun-prefix-by-default --prefix /mnt/home/debian/ashley/code/OpenMPI/install/ and I'm using the tree at http://svn.open-mpi.org/svn/ompi/trunk, I hope this is the correct one. This is the error the build fails with: /bin/sh ../../../libtool --tag=CXX --mode=link g++ -O3 -DNDEBUG -finline-functions -pthread -export-dynamic -o ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl -lutil -lm libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread -o .libs/ompi_info components.o ompi_info.o output.o param.o version.o -Wl,--export-dynamic ../../../ompi/.libs/libmpi.so -lnsl -lutil -lm -pthread -Wl,-rpath -Wl,/mnt/home/debian/ashley/code/OpenMPI/install/lib ../../../ompi/.libs/libmpi.so: undefined reference to `opal_maffinity_setup' ../../../ompi/.libs/libmpi.so: undefined reference to `opal_paffinity_alone' ../../../ompi/.libs/libmpi.so: undefined reference to `opal_paffinity_base_slot_list' collect2: ld returned 1 exit status make[2]: *** [ompi_info] Error 1 make[2]: Leaving directory `/mnt/memfs/openmpi/ompi/tools/ompi_info' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/mnt/memfs/openmpi/ompi' make: *** [all-recursive] Error 1 ashley@alpha:/mnt/memfs/openmpi$ I can provide more information if requested although as I say I don't think I'm doing anything out of the ordinary. Ashley Pittman,
[OMPI devel] OMPI 1.3 branch
Hi folks I encourage people to please look at your MTT outputs. As we are preparing to roll the 1.3.3 release, I am seeing a lot of problems on the branch: 1. timeouts, coming in two forms: (a) MPI_Abort hanging, and (b) collectives hanging (this is mostly on Solaris) 2. segfaults - mostly on sif, but occasionally elsewhere 3. daemon failed to report back - this was only on sif We will need to correct many of these for the release - unless it proves to be due to trivial errors, I don't see how we will be ready to roll release candidates next week. So let's please start taking a look at these?! Ralph
Re: [OMPI devel] Build failures on trunk? r21235
Hmm; odd. I'm not getting these errors. Just to be sure, I did a VPATH build and still am not getting these errors... :-\ Are those symbols publicly available in libopen-pal.so? It does seem pretty weird that your libtool link line didn't pick up libopen-rte.so and libopen-pal.so...? What version of LT are you using? On May 14, 2009, at 10:28 AM, Ashley Pittman wrote: All, After a svn update earlier I'm getting build failures on the trunk, I've tried the usual including a full clean checkout and am still getting the errors. I'm not doing anything special other than a VPATH build and this same tree build last week, it's just the update that appears to have broken things. The configure line used was ~/code/OpenMPI/ompi-trunk-tes/trunk/configure --enable-mpirun-prefix-by-default --prefix /mnt/home/debian/ashley/code/OpenMPI/install/ and I'm using the tree at http://svn.open-mpi.org/svn/ompi/trunk, I hope this is the correct one. This is the error the build fails with: /bin/sh ../../../libtool --tag=CXX --mode=link g++ -O3 -DNDEBUG -finline-functions -pthread -export-dynamic -o ompi_info components.o ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl -lutil -lm libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread -o .libs/ompi_info components.o ompi_info.o output.o param.o version.o -Wl,--export-dynamic ../../../ompi/.libs/libmpi.so -lnsl -lutil -lm -pthread -Wl,-rpath -Wl,/mnt/home/debian/ashley/code/OpenMPI/install/ lib ../../../ompi/.libs/libmpi.so: undefined reference to `opal_maffinity_setup' ../../../ompi/.libs/libmpi.so: undefined reference to `opal_paffinity_alone' ../../../ompi/.libs/libmpi.so: undefined reference to `opal_paffinity_base_slot_list' collect2: ld returned 1 exit status make[2]: *** [ompi_info] Error 1 make[2]: Leaving directory `/mnt/memfs/openmpi/ompi/tools/ompi_info' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/mnt/memfs/openmpi/ompi' make: *** [all-recursive] Error 1 ashley@alpha:/mnt/memfs/openmpi$ I can provide more information if requested although as I say I don't think I'm doing anything out of the ordinary. Ashley Pittman, ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Build failures on trunk? r21235
Libtool is 2.2.6. I use debian unstable so it's normally fairly up-to-date, I suppose it's not impossible that a debian update has broken things now that I think of it. I normally build in memfs for speed and have just rebooted my machine now, a full rebuild has failed again with the same errors. All three symbols are shown as B according to nm so they should be available. Actually further testing shows it's user error again, if I remove the current install then the build succeeds, it must have been pickings up the libopen-pal from the install location rather than from the current build. Ashley Pittman, On Thu, 2009-05-14 at 11:50 -0400, Jeff Squyres wrote: > Hmm; odd. I'm not getting these errors. Just to be sure, I did a > VPATH build and still am not getting these errors... :-\ > > Are those symbols publicly available in libopen-pal.so? > > It does seem pretty weird that your libtool link line didn't pick up > libopen-rte.so and libopen-pal.so...? What version of LT are you using? > > > On May 14, 2009, at 10:28 AM, Ashley Pittman wrote: > > > > > All, > > > > After a svn update earlier I'm getting build failures on the trunk, > > I've > > tried the usual including a full clean checkout and am still getting > > the > > errors. > > > > I'm not doing anything special other than a VPATH build and this same > > tree build last week, it's just the update that appears to have broken > > things. > > > > The configure line used was > > ~/code/OpenMPI/ompi-trunk-tes/trunk/configure > > --enable-mpirun-prefix-by-default > > --prefix /mnt/home/debian/ashley/code/OpenMPI/install/ and I'm using > > the > > tree at http://svn.open-mpi.org/svn/ompi/trunk, I hope this is the > > correct one. > > > > This is the error the build fails with: > > > > /bin/sh ../../../libtool --tag=CXX --mode=link g++ -O3 -DNDEBUG > > -finline-functions -pthread -export-dynamic -o ompi_info > > components.o > > ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl > > -lutil -lm > > libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread > > -o .libs/ompi_info components.o ompi_info.o output.o param.o version.o > > -Wl,--export-dynamic ../../../ompi/.libs/libmpi.so -lnsl -lutil -lm > > -pthread -Wl,-rpath -Wl,/mnt/home/debian/ashley/code/OpenMPI/install/ > > lib > > ../../../ompi/.libs/libmpi.so: undefined reference to > > `opal_maffinity_setup' > > ../../../ompi/.libs/libmpi.so: undefined reference to > > `opal_paffinity_alone' > > ../../../ompi/.libs/libmpi.so: undefined reference to > > `opal_paffinity_base_slot_list' > > collect2: ld returned 1 exit status > > make[2]: *** [ompi_info] Error 1 > > make[2]: Leaving directory `/mnt/memfs/openmpi/ompi/tools/ompi_info' > > make[1]: *** [all-recursive] Error 1 > > make[1]: Leaving directory `/mnt/memfs/openmpi/ompi' > > make: *** [all-recursive] Error 1 > > ashley@alpha:/mnt/memfs/openmpi$ > > > > I can provide more information if requested although as I say I don't > > think I'm doing anything out of the ordinary. > > > > Ashley Pittman, > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > >
Re: [OMPI devel] OMPI 1.3 branch
Ralph Castain wrote: Hi folks I encourage people to please look at your MTT outputs. As we are preparing to roll the 1.3.3 release, I am seeing a lot of problems on the branch: 1. timeouts, coming in two forms: (a) MPI_Abort hanging, and (b) collectives hanging (this is mostly on Solaris) Can you clarify or send me a link that makes you believe b is mostly solaris. Looking at last night's Sun's MTT 1.3 nightly runs I see 47 timeouts on Linux and 24 timeouts on Solaris. That doesn't constitute mostly Solaris to me. Also how are you determining these timeouts are Collective based? I have a theory they are but I don't have a clear smoking gun as of yet. I've been looking at some collective hangs and segv's. These seem to happen across different platform and OS (Linux and Solaris). I've been finding it really hard to reproduce. I ran MPI_Allreduce_loc_c on a three clusters for 2 days without a hang or segv. I am really concerned whether we'll even be able to get this to fail with debugging on. I have not been able to get a core or time with a hung run in order to get more information. 2. segfaults - mostly on sif, but occasionally elsewhere 3. daemon failed to report back - this was only on sif We will need to correct many of these for the release - unless it proves to be due to trivial errors, I don't see how we will be ready to roll release candidates next week. So let's please start taking a look at these?! I've actually been looking at ours though I have not been extremely vocal. I was hoping to get more info on our timeouts before requesting help. Ralph ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] OMPI 1.3 branch
On Thu, May 14, 2009 at 10:47 AM, Terry Dontjewrote: > Ralph Castain wrote: > >> Hi folks >> >> I encourage people to please look at your MTT outputs. As we are preparing >> to roll the 1.3.3 release, I am seeing a lot of problems on the branch: >> >> 1. timeouts, coming in two forms: (a) MPI_Abort hanging, and (b) >> collectives hanging (this is mostly on Solaris) >> >> Can you clarify or send me a link that makes you believe b is mostly > solaris. Looking at last night's Sun's MTT 1.3 nightly runs I see 47 > timeouts on Linux and 24 timeouts on Solaris. That doesn't constitute > mostly Solaris to me. Also how are you determining these timeouts are > Collective based? I have a theory they are but I don't have a clear smoking > gun as of yet. I looked at this MTT report, which showed it hanging in a whole bunch of collective tests: http://www.open-mpi.org/mtt/index.php?limit===_drilldowns=_scale=_scale=_subtitle=_graphs=_go=_cookies==test_run_start_timestamp=2009-05-13+15%3A15%3A25+-+2009-05-14+15%3A15%3A25_platform_hardware= ^x86_64%24_platform_hardware=show_os_name=^Linux%24_os_name=show_mpi_name=^ompi-nightly-v1.3%24_mpi_name=show_mpi_version=^1.3.3a1r21173%24_mpi_version=show_suite_name=all_suite_name=show_test_name=all_test_name=hide_np=all_np=show_full_command=_full_command=show_http_username=^sun%24_http_username=show_local_username=all_local_username=hide_platform_name=^burl-ct-v20z-10%24_platform_name=show=Detail=test_run_result=_rt_os_version=_os_version=_platform_type=_platform_type=_hostname=_hostname=_compiler_name=_compiler_name=_compiler_version=_compiler_version=_vpath_mode=_vpath_mode=_endian=_endian=_bitness=_bitness=_configure_arguments=_exit_value=_exit_value=_exit_signal=_exit_signal=_duration=_duration=_client_serial=_client_serial=_result_message=_result_stdout=_result_stderr=_environment=_description=_launcher=_launcher=_resource_mgr=_resource_mgr=_network=_network=_parameters=_parameters==summary When I look at the hangs on other systems, they are in non-collective tests. I'm not sure what that really means, though - it was just an observation based on this one set of tests. > > > I've been looking at some collective hangs and segv's. These seem to > happen across different platform and OS (Linux and Solaris). I've been > finding it really hard to reproduce. I ran MPI_Allreduce_loc_c on a three > clusters for 2 days without a hang or segv. I am really concerned whether > we'll even be able to get this to fail with debugging on. > I have not been able to get a core or time with a hung run in order to get > more information. > >> 2. segfaults - mostly on sif, but occasionally elsewhere >> >> 3. daemon failed to report back - this was only on sif >> >> We will need to correct many of these for the release - unless it proves >> to be due to trivial errors, I don't see how we will be ready to roll >> release candidates next week. >> >> So let's please start taking a look at these?! >> >> I've actually been looking at ours though I have not been extremely > vocal. I was hoping to get more info on our timeouts before requesting > help. No problem - I wasn't pointing a finger at anyone in particular. Just wanted to highlight that the branch is not in great shape since we had talked on the telecon about trying to do a release next week. > Ralph >> >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] Build failures on trunk? r21235
Hmm. This may not be pilot error. I build OMPI with a pre-installed OMPI all the time and they don't conflict during the build (i.e., the building OMPI always uses the libopen-rte and libopen-pal from the build tree, not the install tree). Here's my link lines for ompi_info: /bin/sh ../../../libtool --tag=CXX --mode=link g++ -g -Wall -Wundef -Wno-long-long -finline-functions -pthread -export-dynamic -o ompi_info components.o ompi_info.o output.o param.o version.o ../../../ ompi/libmpi.la -lnsl -lutil -lm libtool: link: g++ -g -Wall -Wundef -Wno-long-long -finline-functions - pthread -o .libs/ompi_info components.o ompi_info.o output.o param.o version.o -Wl,--export-dynamic ../../../ompi/.libs/libmpi.so /users/ jsquyres/svn/ompi/orte/.libs/libopen-rte.so /users/jsquyres/svn/ompi/ opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm -pthread -Wl,-rpath - Wl,/home/jsquyres/bogus/lib Notice that libopen-rte.os and libopen-pal.so are explicitly mentioned by absolute path name. Yours weren't. I wonder why...? On May 14, 2009, at 12:41 PM, Ashley Pittman wrote: Libtool is 2.2.6. I use debian unstable so it's normally fairly up-to-date, I suppose it's not impossible that a debian update has broken things now that I think of it. I normally build in memfs for speed and have just rebooted my machine now, a full rebuild has failed again with the same errors. All three symbols are shown as B according to nm so they should be available. Actually further testing shows it's user error again, if I remove the current install then the build succeeds, it must have been pickings up the libopen-pal from the install location rather than from the current build. Ashley Pittman, On Thu, 2009-05-14 at 11:50 -0400, Jeff Squyres wrote: > Hmm; odd. I'm not getting these errors. Just to be sure, I did a > VPATH build and still am not getting these errors... :-\ > > Are those symbols publicly available in libopen-pal.so? > > It does seem pretty weird that your libtool link line didn't pick up > libopen-rte.so and libopen-pal.so...? What version of LT are you using? > > > On May 14, 2009, at 10:28 AM, Ashley Pittman wrote: > > > > > All, > > > > After a svn update earlier I'm getting build failures on the trunk, > > I've > > tried the usual including a full clean checkout and am still getting > > the > > errors. > > > > I'm not doing anything special other than a VPATH build and this same > > tree build last week, it's just the update that appears to have broken > > things. > > > > The configure line used was > > ~/code/OpenMPI/ompi-trunk-tes/trunk/configure > > --enable-mpirun-prefix-by-default > > --prefix /mnt/home/debian/ashley/code/OpenMPI/install/ and I'm using > > the > > tree at http://svn.open-mpi.org/svn/ompi/trunk, I hope this is the > > correct one. > > > > This is the error the build fails with: > > > > /bin/sh ../../../libtool --tag=CXX --mode=link g++ -O3 -DNDEBUG > > -finline-functions -pthread -export-dynamic -o ompi_info > > components.o > > ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la - lnsl > > -lutil -lm > > libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread > > -o .libs/ompi_info components.o ompi_info.o output.o param.o version.o > > -Wl,--export-dynamic ../../../ompi/.libs/libmpi.so -lnsl -lutil -lm > > -pthread -Wl,-rpath -Wl,/mnt/home/debian/ashley/code/OpenMPI/ install/ > > lib > > ../../../ompi/.libs/libmpi.so: undefined reference to > > `opal_maffinity_setup' > > ../../../ompi/.libs/libmpi.so: undefined reference to > > `opal_paffinity_alone' > > ../../../ompi/.libs/libmpi.so: undefined reference to > > `opal_paffinity_base_slot_list' > > collect2: ld returned 1 exit status > > make[2]: *** [ompi_info] Error 1 > > make[2]: Leaving directory `/mnt/memfs/openmpi/ompi/tools/ ompi_info' > > make[1]: *** [all-recursive] Error 1 > > make[1]: Leaving directory `/mnt/memfs/openmpi/ompi' > > make: *** [all-recursive] Error 1 > > ashley@alpha:/mnt/memfs/openmpi$ > > > > I can provide more information if requested although as I say I don't > > think I'm doing anything out of the ordinary. > > > > Ashley Pittman, > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Build failures on trunk? r21235
Hello, Ashley, did you rebootstrap with Debian's Libtool? They enable link_all_deplibs=no in their Libtool which changes some things and can cause issues like this. Can't hurt to open a Debian bug report about it (targeted against libtool) so they know this issue exists. Can you try working around it by setting link_all_deplibs to "yes", then rebuilding all the libraries? Like this, done in the top build directory with your current build tree: find . -name libtool | xargs \ sed -i 's/^\(link_all_deplibs=\).*//' find . -name \*.la | xargs ./libtool --mode=clean rm -f make If that does not work, then I'd be very interested in what the failure would look at that point. A more permanent workaround could be in OpenMPI to list each library that is used *directly* by some other library as a dependency. Sigh. Or fix Debian Libtool. Cheers, Ralf * Jeff Squyres wrote on Thu, May 14, 2009 at 07:28:47PM CEST: > Hmm. This may not be pilot error. I build OMPI with a pre-installed > OMPI all the time and they don't conflict during the build (i.e., the > building OMPI always uses the libopen-rte and libopen-pal from the build > tree, not the install tree). Here's my link lines for ompi_info: > > /bin/sh ../../../libtool --tag=CXX --mode=link g++ -g -Wall -Wundef > -Wno-long-long -finline-functions -pthread -export-dynamic -o > ompi_info components.o ompi_info.o output.o param.o version.o ../../../ > ompi/libmpi.la -lnsl -lutil -lm > libtool: link: g++ -g -Wall -Wundef -Wno-long-long -finline-functions - > pthread -o .libs/ompi_info components.o ompi_info.o output.o param.o > version.o -Wl,--export-dynamic ../../../ompi/.libs/libmpi.so /users/ > jsquyres/svn/ompi/orte/.libs/libopen-rte.so /users/jsquyres/svn/ompi/ > opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm -pthread -Wl,-rpath - > Wl,/home/jsquyres/bogus/lib > > Notice that libopen-rte.os and libopen-pal.so are explicitly mentioned > by absolute path name. Yours weren't. I wonder why...? > > > On May 14, 2009, at 12:41 PM, Ashley Pittman wrote: > >> >> Libtool is 2.2.6. I use debian unstable so it's normally fairly >> up-to-date, I suppose it's not impossible that a debian update has >> broken things now that I think of it. >> >> I normally build in memfs for speed and have just rebooted my machine >> now, a full rebuild has failed again with the same errors. >> >> All three symbols are shown as B according to nm so they should be >> available. >> >> Actually further testing shows it's user error again, if I remove the >> current install then the build succeeds, it must have been pickings up >> the libopen-pal from the install location rather than from the current >> build. >> >> Ashley Pittman,
Re: [OMPI devel] Build failures on trunk? r21235
On May 14, 2009, at 1:46 PM, Ralf Wildenhues wrote: A more permanent workaround could be in OpenMPI to list each library that is used *directly* by some other library as a dependency. Sigh. We actually took pains to *not* do that; we *used* to do that and explicitly took it out. :-\ IIRC, it had something to do with dlopen'ing libmpi.so...? Or fix Debian Libtool. That sounds better to me, but I'm admittedly a little biased. :-) -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Build failures on trunk? r21235
Hello, * Jeff Squyres wrote on Thu, May 14, 2009 at 07:56:24PM CEST: > On May 14, 2009, at 1:46 PM, Ralf Wildenhues wrote: > >> A more permanent workaround could be in OpenMPI to list each library >> that is used *directly* by some other library as a dependency. Sigh. > > We actually took pains to *not* do that; we *used* to do that and > explicitly took it out. :-\ IIRC, it had something to do with > dlopen'ing libmpi.so...? Admittedly, I didn't look at Open MPI in detail before writing my previous reply. So it would be nice to know the outcome of the workaround anyway (I do have a Debian here, but different Libtool versions and little time), there could also be another genuine bug hiding there. Dlopening sounds like Debian Libtool issue though, and one worthy of a Debian bug report (because that is not intended by them to fail). Thanks, Ralf
Re: [OMPI devel] Build failures on trunk? r21235
On Thu, 14 May 2009, Jeff Squyres wrote: On May 14, 2009, at 1:46 PM, Ralf Wildenhues wrote: A more permanent workaround could be in OpenMPI to list each library that is used *directly* by some other library as a dependency. Sigh. We actually took pains to *not* do that; we *used* to do that and explicitly took it out. :-\ IIRC, it had something to do with dlopen'ing libmpi.so...? Actually, I think that was something else. Today, libopen-rte.la lists libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la. I had removed the dependency of libmpi.la on libopen-pal.la because it was causing libopen-pal.so to be listed twice by libtool, which was causing problems. It would be a trivial fix to change the Makefiles to make libmpi.la to depend on libopen-pal.la as well as libopen-rte.la. Brian
Re: [OMPI devel] Build failures on trunk? r21235
Hi Brian, * Brian W. Barrett wrote on Thu, May 14, 2009 at 08:22:58PM CEST: > > Actually, I think that was something else. Today, libopen-rte.la lists > libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la. I had > removed the dependency of libmpi.la on libopen-pal.la because it was > causing libopen-pal.so to be listed twice by libtool, which was causing > problems. That's weird, and shouldn't happen (the problems, that is). Do you have a pointer for them? Thanks, Ralf
Re: [OMPI devel] Build failures on trunk? r21235
On May 14, 2009, at 2:22 PM, Brian W. Barrett wrote: > We actually took pains to *not* do that; we *used* to do that and explicitly > took it out. :-\ IIRC, it had something to do with dlopen'ing libmpi.so...? Actually, I think that was something else. Today, libopen-rte.la lists libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la. I had removed the dependency of libmpi.la on libopen-pal.la because it was causing libopen-pal.so to be listed twice by libtool, which was causing problems. It would be a trivial fix to change the Makefiles to make libmpi.la to depend on libopen-pal.la as well as libopen-rte.la. Ah -- am I thinking of us removing libmpi (etc.) from the DSOs? -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Build failures on trunk? r21235
On Thu, 14 May 2009, Ralf Wildenhues wrote: Hi Brian, * Brian W. Barrett wrote on Thu, May 14, 2009 at 08:22:58PM CEST: Actually, I think that was something else. Today, libopen-rte.la lists libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la. I had removed the dependency of libmpi.la on libopen-pal.la because it was causing libopen-pal.so to be listed twice by libtool, which was causing problems. That's weird, and shouldn't happen (the problems, that is). Do you have a pointer for them? I don't - it was many moons ago. And it very likely was when we were in that (evil) period where we were using LT2 before it was released as stable. So it's completely possible we were seeing a transient bug which is long since gone. Brian
Re: [OMPI devel] Build failures on trunk? r21235
On Thu, 14 May 2009, Jeff Squyres wrote: On May 14, 2009, at 2:22 PM, Brian W. Barrett wrote: We actually took pains to *not* do that; we *used* to do that and explicitly took it out. :-\ IIRC, it had something to do with dlopen'ing libmpi.so...? Actually, I think that was something else. Today, libopen-rte.la lists libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la. I had removed the dependency of libmpi.la on libopen-pal.la because it was causing libopen-pal.so to be listed twice by libtool, which was causing problems. It would be a trivial fix to change the Makefiles to make libmpi.la to depend on libopen-pal.la as well as libopen-rte.la. Ah -- am I thinking of us removing libmpi (etc.) from the DSOs? I think so. And that's a change we definitely don't want to undo. Brian
Re: [OMPI devel] Build failures on trunk? r21235
While we're talking about build failures ... I haven't been able to build any of the 1.3.x releases on my OS X machines. OS X 10.5.6 (Leopard) on Intel macs. Attached is the configure command and the failure from last night's development tarball, openmpi-1.3.3a1r21223.tar.gz. 1.2.x builds fine. - Bryan -- Bryan Lally, la...@lanl.gov 505.667.9954 CCS-2 Los Alamos National Laboratory Los Alamos, New Mexico
Re: [OMPI devel] Build failures on trunk? r21235
Did you mean to attach something? FWIW, I can configure/build on Leopard just fine...? I'm using the compilers from hpc.sf.net, though. I haven't tried recently with the native Leopard compilers. On May 14, 2009, at 2:38 PM, Bryan Lally wrote: While we're talking about build failures ... I haven't been able to build any of the 1.3.x releases on my OS X machines. OS X 10.5.6 (Leopard) on Intel macs. Attached is the configure command and the failure from last night's development tarball, openmpi-1.3.3a1r21223.tar.gz. 1.2.x builds fine. - Bryan -- Bryan Lally, la...@lanl.gov 505.667.9954 CCS-2 Los Alamos National Laboratory Los Alamos, New Mexico ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Build failures on trunk? r21235
Argh. This time with attachment attached ... Bryan Lally wrote: While we're talking about build failures ... I haven't been able to build any of the 1.3.x releases on my OS X machines. OS X 10.5.6 (Leopard) on Intel macs. Attached is the configure command and the failure from last night's development tarball, openmpi-1.3.3a1r21223.tar.gz. 1.2.x builds fine. - Bryan -- Bryan Lally, la...@lanl.gov 505.667.9954 CCS-2 Los Alamos National Laboratory Los Alamos, New Mexico ./configure\ --prefix=/usr/local/openmpi-1.3.3x \ --disable-mpi-f77 \ --disable-mpi-f90 \ --disable-mpi-profile make ... Making all in tools/orte-iof /bin/sh ../../../libtool --tag=CC --mode=link gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -fvisibility=hidden -export-dynamic -o orte-iof orte-iof.o ../../../orte/libopen-rte.la -lutil libtool: link: gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -fvisibility=hidden -o orte-iof orte-iof.o ../../../orte/.libs/libopen-rte.a /Users/lally/Software/openmpi-1.3.3a1r21223/opal/.libs/libopen-pal.a -lutil Undefined symbols: "_orte_iof", referenced from: _orte_iof$non_lazy_ptr in orte-iof.o "_orte_routed", referenced from: _orte_routed$non_lazy_ptr in libopen-rte.a(hnp_contact.o) _orte_routed$non_lazy_ptr in libopen-rte.a(rml_base_contact.o) ld: symbol(s) not found collect2: ld returned 1 exit status make[2]: *** [orte-iof] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive] Error 1
Re: [OMPI devel] Build failures on trunk? r21235
Blast - wish I could remember, but I did see that once before and now can't remember the fix. I can build non-tarballs just fine on my Mac, though, so it could be a problem with the tarball not picking something up. On Thu, May 14, 2009 at 12:41 PM, Bryan Lallywrote: > Argh. This time with attachment attached ... > > > Bryan Lally wrote: > >> While we're talking about build failures ... >> >> I haven't been able to build any of the 1.3.x releases on my OS X >> machines. OS X 10.5.6 (Leopard) on Intel macs. Attached is the configure >> command and the failure from last night's development tarball, >> openmpi-1.3.3a1r21223.tar.gz. 1.2.x builds fine. >> >>- Bryan >> > > > -- > Bryan Lally, la...@lanl.gov > 505.667.9954 > CCS-2 > Los Alamos National Laboratory > Los Alamos, New Mexico > > ./configure\ >--prefix=/usr/local/openmpi-1.3.3x \ >--disable-mpi-f77 \ >--disable-mpi-f90 \ >--disable-mpi-profile > > make > > ... > > Making all in tools/orte-iof > /bin/sh ../../../libtool --tag=CC --mode=link gcc -O3 -DNDEBUG > -finline-functions -fno-strict-aliasing -fvisibility=hidden > -export-dynamic -o orte-iof orte-iof.o ../../../orte/libopen-rte.la-lutil > libtool: link: gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing > -fvisibility=hidden -o orte-iof orte-iof.o > ../../../orte/.libs/libopen-rte.a > /Users/lally/Software/openmpi-1.3.3a1r21223/opal/.libs/libopen-pal.a -lutil > Undefined symbols: > "_orte_iof", referenced from: > _orte_iof$non_lazy_ptr in orte-iof.o > "_orte_routed", referenced from: > _orte_routed$non_lazy_ptr in libopen-rte.a(hnp_contact.o) > _orte_routed$non_lazy_ptr in libopen-rte.a(rml_base_contact.o) > ld: symbol(s) not found > collect2: ld returned 1 exit status > make[2]: *** [orte-iof] Error 1 > make[1]: *** [all-recursive] Error 1 > make: *** [all-recursive] Error 1 > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] Build failures on trunk? r21235
Jeff Squyres wrote: Did you mean to attach something? yeah, oops. I can't count how many times I've done that FWIW, I can configure/build on Leopard just fine...? I'm using the compilers from hpc.sf.net, though. I haven't tried recently with the native Leopard compilers. This was with the native Leopard compilers. - Bryan -- Bryan Lally, la...@lanl.gov 505.667.9954 CCS-2 Los Alamos National Laboratory Los Alamos, New Mexico
Re: [OMPI devel] Build failures on trunk? r21235
On Thu, 2009-05-14 at 19:46 +0200, Ralf Wildenhues wrote: > Hello, > > Ashley, did you rebootstrap with Debian's Libtool? I'm not sure I understand the question, I did a fresh checkout and re-ran ./autogen.sh if that's what you mean. > They enable link_all_deplibs=no in their Libtool That appears to the the case. > which changes some > things and can cause issues like this. Can't hurt to open a Debian > bug report about it (targeted against libtool) so they know this issue > exists. > > Can you try working around it by setting link_all_deplibs to "yes", > then rebuilding all the libraries? Like this, done in the top build > directory with your current build tree: > find . -name libtool | xargs \ > sed -i 's/^\(link_all_deplibs=\).*//' > find . -name \*.la | xargs ./libtool --mode=clean rm -f > make Moving back in the install dir which luckily I still had lying around and re-compiling did work so I assume you are correct. > If that does not work, then I'd be very interested in what the failure > would look at that point. > > A more permanent workaround could be in OpenMPI to list each library > that is used *directly* by some other library as a dependency. Sigh. Would it be this or would it be listing library's which are used directly by some other library and are distributed as part of OpenMPI. Sounds slightly more sensible when you phrase it like that. > Or fix Debian Libtool. My naive view here is that link_all_deplibs=no sounds like a sensible default as the linker should do the right thing if they aren't named. It sounds to me like Brians suggestion of stating a dependency from libmpi.la to libopen-pal.la might have more miles in it. That still doesn't explain why my link line didn't show either being linked and Geoff sees both however. I'll keep the code here lying around in case you want me to perform further tests. Ashley,
Re: [OMPI devel] Build failures on trunk? r21235
Hmm. I just did a build with both the SF.net compilers and then a 2nd build with the native leopard compilers of openmpi-1.3.3a1r21223. The Leopard build failed deep in VT, though, with some obscure C++ STL- looking error -- but OMPI itself built fine. You can compile OMPI without VT with --enable-contrib-no-build=vt. I'll send a note to the VT guys. On May 14, 2009, at 3:07 PM, Bryan Lally wrote: Jeff Squyres wrote: > Did you mean to attach something? yeah, oops. I can't count how many times I've done that > FWIW, I can configure/build on Leopard just fine...? I'm using the > compilers from hpc.sf.net, though. I haven't tried recently with the > native Leopard compilers. This was with the native Leopard compilers. - Bryan -- Bryan Lally, la...@lanl.gov 505.667.9954 CCS-2 Los Alamos National Laboratory Los Alamos, New Mexico ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Build failures on trunk? r21235
On May 14, 2009, at 3:14 PM, Jeff Squyres (jsquyres) wrote: The Leopard build failed deep in VT, though, with some obscure C++ STL- looking error -- but OMPI itself built fine. You can compile OMPI without VT with --enable-contrib-no-build=vt. I'll send a note to the VT guys. I take that back -- I just did 3 more builds and was unable to get the VT build to fail. That's not good. :-( I did do a parallel build -- perhaps that wonked something up in VT there. Unfortunately, I don't have any of the build logs, though -- so I don't have anything to report... -- Jeff Squyres Cisco Systems
Re: [OMPI devel] Build failures on trunk? r21235
Jeff Squyres wrote: I take that back -- I just did 3 more builds and was unable to get the VT build to fail. That's not good. :-( And I'm never getting there - I still fail in tools/orte-iof, same way. I tried removing apple's mpi.h in /usr/include, but that wasn't it. This is a very stock OS X box. Apple's tools, including gcc (4.0.1), libtool and the linker. - Bryan -- Bryan Lally, la...@lanl.gov 505.667.9954 CCS-2 Los Alamos National Laboratory Los Alamos, New Mexico