[OMPI devel] Warnings in sctp BTL

2009-05-14 Thread Ralph Castain

Hi folks

Not sure who is maintaining the SCTP BTL, but I found the following  
warnings when building tonight:


btl_sctp_frag.c: In function `mca_btl_sctp_frag_large_send':
btl_sctp_frag.c:179: warning: int format, different type arg (arg 3)
btl_sctp_frag.c:179: warning: int format, different type arg (arg 5)
btl_sctp_frag.c: In function `mca_btl_sctp_frag_send':
btl_sctp_frag.c:303: warning: int format, different type arg (arg 3)
btl_sctp_frag.c:303: warning: int format, different type arg (arg 5)
btl_sctp_endpoint.c: In function  
`mca_btl_sctp_endpoint_recv_connect_ack':

btl_sctp_endpoint.c:841: warning: too few arguments for format
btl_sctp_proc.c: In function `mca_btl_sctp_proc_create':
btl_sctp_proc.c:147: warning: int format, different type arg (arg 2)

Could you please clean this up?

Thanks!
Ralph



Re: [OMPI devel] Warnings in sctp BTL

2009-05-14 Thread Brad Penoff
hey Ralph,

At UBC, we are trying to find a new student who can maintain the SCTP
BTL.  Unfortunately, it is has not been maintained since the progress
engine overhaul a while ago.  At the moment, this is still on the TODO
list.  I hope to get to this myself, if no student is found.

It was my impression that the SCTP BTL wasn't included in any release
by default.  I hope that this is still the case

brad

On Wed, May 13, 2009 at 8:15 PM, Ralph Castain  wrote:
> Hi folks
>
> Not sure who is maintaining the SCTP BTL, but I found the following warnings
> when building tonight:
>
> btl_sctp_frag.c: In function `mca_btl_sctp_frag_large_send':
> btl_sctp_frag.c:179: warning: int format, different type arg (arg 3)
> btl_sctp_frag.c:179: warning: int format, different type arg (arg 5)
> btl_sctp_frag.c: In function `mca_btl_sctp_frag_send':
> btl_sctp_frag.c:303: warning: int format, different type arg (arg 3)
> btl_sctp_frag.c:303: warning: int format, different type arg (arg 5)
> btl_sctp_endpoint.c: In function `mca_btl_sctp_endpoint_recv_connect_ack':
> btl_sctp_endpoint.c:841: warning: too few arguments for format
> btl_sctp_proc.c: In function `mca_btl_sctp_proc_create':
> btl_sctp_proc.c:147: warning: int format, different type arg (arg 2)
>
> Could you please clean this up?
>
> Thanks!
> Ralph
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>


Re: [OMPI devel] Warnings in sctp BTL

2009-05-14 Thread Jeff Squyres

On May 14, 2009, at 1:14 AM, Brad Penoff wrote:


At UBC, we are trying to find a new student who can maintain the SCTP
BTL.  Unfortunately, it is has not been maintained since the progress
engine overhaul a while ago.  At the moment, this is still on the TODO
list.  I hope to get to this myself, if no student is found.



Thanks; that would be most useful.

Ralph -- did these messages come in due to the opal_attribute changes  
from last night?



It was my impression that the SCTP BTL wasn't included in any release
by default.  I hope that this is still the case



Correct; it is .ompi_ignore'd in the v1.3 SVN tree.

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Warnings in sctp BTL

2009-05-14 Thread Ralph Castain
I'm not entirely sure as I'm unclear as to when this component would attempt
to build. I was building the latest trunk on a new (to me) system last night
(Jeff's cluster) when I saw the warnings.

Jeff: have you seen them before on your cluster?


On Thu, May 14, 2009 at 7:22 AM, Jeff Squyres  wrote:

> On May 14, 2009, at 1:14 AM, Brad Penoff wrote:
>
>  At UBC, we are trying to find a new student who can maintain the SCTP
>> BTL.  Unfortunately, it is has not been maintained since the progress
>> engine overhaul a while ago.  At the moment, this is still on the TODO
>> list.  I hope to get to this myself, if no student is found.
>>
>>
> Thanks; that would be most useful.
>
> Ralph -- did these messages come in due to the opal_attribute changes from
> last night?
>
>  It was my impression that the SCTP BTL wasn't included in any release
>> by default.  I hope that this is still the case
>>
>>
> Correct; it is .ompi_ignore'd in the v1.3 SVN tree.
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] Warnings in sctp BTL

2009-05-14 Thread Rainer Keller
Hi Jeff,
On Thursday 14 May 2009 09:22:18 am Jeff Squyres wrote:
> Ralph -- did these messages come in due to the opal_attribute changes
> from last night?
They certainly are due to adding the __opal_attribute_format__ changes.

A similar patch as btl_tcp_frag should be applied...

Thanks,
Rainer
-- 

Rainer Keller, PhD  Tel: +1 (865) 241-6293
Oak Ridge National Lab  Fax: +1 (865) 241-4811
PO Box 2008 MS 6164   Email: kel...@ornl.gov
Oak Ridge, TN 37831-2008AIM/Skype: rusraink




Re: [OMPI devel] Warnings in sctp BTL

2009-05-14 Thread Jeff Squyres
Yes, I do.  All of them are on BTL_ERROR lines; I think these came in  
last night with the opal attribute updates.


Looking at last night's MTT, those attribute changes turned up a LOT  
of warnings in various BTLs...  Doh!



On May 14, 2009, at 9:34 AM, Ralph Castain wrote:

I'm not entirely sure as I'm unclear as to when this component would  
attempt to build. I was building the latest trunk on a new (to me)  
system last night (Jeff's cluster) when I saw the warnings.


Jeff: have you seen them before on your cluster?


On Thu, May 14, 2009 at 7:22 AM, Jeff Squyres   
wrote:

On May 14, 2009, at 1:14 AM, Brad Penoff wrote:

At UBC, we are trying to find a new student who can maintain the SCTP
BTL.  Unfortunately, it is has not been maintained since the progress
engine overhaul a while ago.  At the moment, this is still on the TODO
list.  I hope to get to this myself, if no student is found.


Thanks; that would be most useful.

Ralph -- did these messages come in due to the opal_attribute  
changes from last night?


It was my impression that the SCTP BTL wasn't included in any release
by default.  I hope that this is still the case


Correct; it is .ompi_ignore'd in the v1.3 SVN tree.

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn] svn:open-mpi r21234

2009-05-14 Thread Rainer Keller
Hey Ralph,
On Wednesday 13 May 2009 10:54:43 pm Ralph Castain wrote:
> This generated a bunch of warnings - the "z" length modifier is not a
> generally supported option, which is why we do not use it.
I see You compile with -pedantic?

> btl_tcp_frag.c: In function ‘mca_btl_tcp_frag_send’:
> btl_tcp_frag.c:115: warning: ISO C90 does not support the ‘z’ printf
> length modifier
> ...
see below.

> btl_tcp_component.c: In function ‘mca_btl_tcp_component_create_listen’:
> btl_tcp_component.c:682: warning: too many arguments for format
This is the nice part of the attribute-fix: showing mistakes like this...
Fixed...


> Could you please fix this? If you want to deal with the size_t 32/64
> bit issues, there is another little macro thingy that we created for
> just that purpose (someone else here undoubtedly remembers it).
Well, does anyone now the "little macro thingy" ;-)
I failed to find it...

However, in other parts of the code base, we just use "%lu" and cast to 
unsigned long...
Thoughts?

Thanks,
Rainer
-- 

Rainer Keller, PhD  Tel: +1 (865) 241-6293
Oak Ridge National Lab  Fax: +1 (865) 241-4811
PO Box 2008 MS 6164   Email: kel...@ornl.gov
Oak Ridge, TN 37831-2008AIM/Skype: rusraink





Re: [OMPI devel] [OMPI svn] svn:open-mpi r21234

2009-05-14 Thread Jeff Squyres

On May 14, 2009, at 9:57 AM, Rainer Keller wrote:

> This generated a bunch of warnings - the "z" length modifier is  
not a

> generally supported option, which is why we do not use it.
I see You compile with -pedantic?



Ya, configure adds that automatically if you --enable-picky.

> btl_tcp_component.c: In function  
‘mca_btl_tcp_component_create_listen’:

> btl_tcp_component.c:682: warning: too many arguments for format
This is the nice part of the attribute-fix: showing mistakes like  
this...

Fixed...



Yep; it is good to expose all of these.  Annoying to fix, but they  
should be fixed.  :-)



> Could you please fix this? If you want to deal with the size_t 32/64
> bit issues, there is another little macro thingy that we created for
> just that purpose (someone else here undoubtedly remembers it).
Well, does anyone now the "little macro thingy" ;-)
I failed to find it...

However, in other parts of the code base, we just use "%lu" and cast  
to

unsigned long...



Unfortunately, I think that's the best we ever came up with:

https://svn.open-mpi.org/trac/ompi/wiki/PrintfCodes

--
Jeff Squyres
Cisco Systems




[OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Ashley Pittman

All,

After a svn update earlier I'm getting build failures on the trunk, I've
tried the usual including a full clean checkout and am still getting the
errors.

I'm not doing anything special other than a VPATH build and this same
tree build last week, it's just the update that appears to have broken
things.

The configure line used was
~/code/OpenMPI/ompi-trunk-tes/trunk/configure
--enable-mpirun-prefix-by-default
--prefix /mnt/home/debian/ashley/code/OpenMPI/install/ and I'm using the
tree at http://svn.open-mpi.org/svn/ompi/trunk, I hope this is the
correct one.

This is the error the build fails with:

/bin/sh ../../../libtool --tag=CXX   --mode=link g++  -O3 -DNDEBUG
-finline-functions -pthread  -export-dynamic   -o ompi_info components.o
ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl
-lutil -lm 
libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread
-o .libs/ompi_info components.o ompi_info.o output.o param.o version.o
-Wl,--export-dynamic  ../../../ompi/.libs/libmpi.so -lnsl -lutil -lm
-pthread -Wl,-rpath -Wl,/mnt/home/debian/ashley/code/OpenMPI/install/lib
../../../ompi/.libs/libmpi.so: undefined reference to
`opal_maffinity_setup'
../../../ompi/.libs/libmpi.so: undefined reference to
`opal_paffinity_alone'
../../../ompi/.libs/libmpi.so: undefined reference to
`opal_paffinity_base_slot_list'
collect2: ld returned 1 exit status
make[2]: *** [ompi_info] Error 1
make[2]: Leaving directory `/mnt/memfs/openmpi/ompi/tools/ompi_info'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/mnt/memfs/openmpi/ompi'
make: *** [all-recursive] Error 1
ashley@alpha:/mnt/memfs/openmpi$ 

I can provide more information if requested although as I say I don't
think I'm doing anything out of the ordinary.

Ashley Pittman,



[OMPI devel] OMPI 1.3 branch

2009-05-14 Thread Ralph Castain
Hi folks

I encourage people to please look at your MTT outputs. As we are preparing
to roll the 1.3.3 release, I am seeing a lot of problems on the branch:

1. timeouts, coming in two forms: (a) MPI_Abort hanging, and (b) collectives
hanging (this is mostly on Solaris)

2. segfaults - mostly on sif, but occasionally elsewhere

3. daemon failed to report back - this was only on sif

We will need to correct many of these for the release - unless it proves to
be due to trivial errors, I don't see how we will be ready to roll release
candidates next week.

So let's please start taking a look at these?!

Ralph


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Jeff Squyres
Hmm; odd.  I'm not getting these errors.  Just to be sure, I did a  
VPATH build and still am not getting these errors...  :-\


Are those symbols publicly available in libopen-pal.so?

It does seem pretty weird that your libtool link line didn't pick up  
libopen-rte.so and libopen-pal.so...?  What version of LT are you using?



On May 14, 2009, at 10:28 AM, Ashley Pittman wrote:



All,

After a svn update earlier I'm getting build failures on the trunk,  
I've
tried the usual including a full clean checkout and am still getting  
the

errors.

I'm not doing anything special other than a VPATH build and this same
tree build last week, it's just the update that appears to have broken
things.

The configure line used was
~/code/OpenMPI/ompi-trunk-tes/trunk/configure
--enable-mpirun-prefix-by-default
--prefix /mnt/home/debian/ashley/code/OpenMPI/install/ and I'm using  
the

tree at http://svn.open-mpi.org/svn/ompi/trunk, I hope this is the
correct one.

This is the error the build fails with:

/bin/sh ../../../libtool --tag=CXX   --mode=link g++  -O3 -DNDEBUG
-finline-functions -pthread  -export-dynamic   -o ompi_info  
components.o

ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl
-lutil -lm
libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread
-o .libs/ompi_info components.o ompi_info.o output.o param.o version.o
-Wl,--export-dynamic  ../../../ompi/.libs/libmpi.so -lnsl -lutil -lm
-pthread -Wl,-rpath -Wl,/mnt/home/debian/ashley/code/OpenMPI/install/ 
lib

../../../ompi/.libs/libmpi.so: undefined reference to
`opal_maffinity_setup'
../../../ompi/.libs/libmpi.so: undefined reference to
`opal_paffinity_alone'
../../../ompi/.libs/libmpi.so: undefined reference to
`opal_paffinity_base_slot_list'
collect2: ld returned 1 exit status
make[2]: *** [ompi_info] Error 1
make[2]: Leaving directory `/mnt/memfs/openmpi/ompi/tools/ompi_info'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/mnt/memfs/openmpi/ompi'
make: *** [all-recursive] Error 1
ashley@alpha:/mnt/memfs/openmpi$

I can provide more information if requested although as I say I don't
think I'm doing anything out of the ordinary.

Ashley Pittman,

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Ashley Pittman

Libtool is 2.2.6.  I use debian unstable so it's normally fairly
up-to-date, I suppose it's not impossible that a debian update has
broken things now that I think of it.

I normally build in memfs for speed and have just rebooted my machine
now, a full rebuild has failed again with the same errors.

All three symbols are shown as B according to nm so they should be
available.

Actually further testing shows it's user error again, if I remove the
current install then the build succeeds, it must have been pickings up
the libopen-pal from the install location rather than from the current
build.

Ashley Pittman,

On Thu, 2009-05-14 at 11:50 -0400, Jeff Squyres wrote:
> Hmm; odd.  I'm not getting these errors.  Just to be sure, I did a  
> VPATH build and still am not getting these errors...  :-\
> 
> Are those symbols publicly available in libopen-pal.so?
> 
> It does seem pretty weird that your libtool link line didn't pick up  
> libopen-rte.so and libopen-pal.so...?  What version of LT are you using?
> 
> 
> On May 14, 2009, at 10:28 AM, Ashley Pittman wrote:
> 
> >
> > All,
> >
> > After a svn update earlier I'm getting build failures on the trunk,  
> > I've
> > tried the usual including a full clean checkout and am still getting  
> > the
> > errors.
> >
> > I'm not doing anything special other than a VPATH build and this same
> > tree build last week, it's just the update that appears to have broken
> > things.
> >
> > The configure line used was
> > ~/code/OpenMPI/ompi-trunk-tes/trunk/configure
> > --enable-mpirun-prefix-by-default
> > --prefix /mnt/home/debian/ashley/code/OpenMPI/install/ and I'm using  
> > the
> > tree at http://svn.open-mpi.org/svn/ompi/trunk, I hope this is the
> > correct one.
> >
> > This is the error the build fails with:
> >
> > /bin/sh ../../../libtool --tag=CXX   --mode=link g++  -O3 -DNDEBUG
> > -finline-functions -pthread  -export-dynamic   -o ompi_info  
> > components.o
> > ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la -lnsl
> > -lutil -lm
> > libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread
> > -o .libs/ompi_info components.o ompi_info.o output.o param.o version.o
> > -Wl,--export-dynamic  ../../../ompi/.libs/libmpi.so -lnsl -lutil -lm
> > -pthread -Wl,-rpath -Wl,/mnt/home/debian/ashley/code/OpenMPI/install/ 
> > lib
> > ../../../ompi/.libs/libmpi.so: undefined reference to
> > `opal_maffinity_setup'
> > ../../../ompi/.libs/libmpi.so: undefined reference to
> > `opal_paffinity_alone'
> > ../../../ompi/.libs/libmpi.so: undefined reference to
> > `opal_paffinity_base_slot_list'
> > collect2: ld returned 1 exit status
> > make[2]: *** [ompi_info] Error 1
> > make[2]: Leaving directory `/mnt/memfs/openmpi/ompi/tools/ompi_info'
> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory `/mnt/memfs/openmpi/ompi'
> > make: *** [all-recursive] Error 1
> > ashley@alpha:/mnt/memfs/openmpi$
> >
> > I can provide more information if requested although as I say I don't
> > think I'm doing anything out of the ordinary.
> >
> > Ashley Pittman,
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 



Re: [OMPI devel] OMPI 1.3 branch

2009-05-14 Thread Terry Dontje

Ralph Castain wrote:

Hi folks

I encourage people to please look at your MTT outputs. As we are 
preparing to roll the 1.3.3 release, I am seeing a lot of problems on 
the branch:


1. timeouts, coming in two forms: (a) MPI_Abort hanging, and (b) 
collectives hanging (this is mostly on Solaris)


Can you clarify or send me a link that makes you believe b is mostly 
solaris.  Looking at last night's Sun's MTT 1.3 nightly runs I see 47 
timeouts on Linux and 24 timeouts on Solaris.  That doesn't constitute 
mostly Solaris to me.  Also how are you determining these timeouts are 
Collective based?  I have a theory they are but I don't have a clear 
smoking gun as of yet.


I've been looking at some collective hangs and segv's.  These seem to 
happen across different platform and OS (Linux and Solaris).  I've been 
finding it really hard to reproduce.  I ran MPI_Allreduce_loc_c on a 
three clusters for 2 days without a hang or segv.  I am really concerned 
whether we'll even be able to get this to fail with debugging on. 

I have not been able to get a core or time with a hung run in order to 
get more information. 

2. segfaults - mostly on sif, but occasionally elsewhere

3. daemon failed to report back - this was only on sif

We will need to correct many of these for the release - unless it 
proves to be due to trivial errors, I don't see how we will be ready 
to roll release candidates next week.


So let's please start taking a look at these?!

I've actually been looking at ours though I have not been extremely 
vocal.  I was hoping to get more info on our timeouts before requesting 
help.

Ralph



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  




Re: [OMPI devel] OMPI 1.3 branch

2009-05-14 Thread Ralph Castain
On Thu, May 14, 2009 at 10:47 AM, Terry Dontje  wrote:

> Ralph Castain wrote:
>
>> Hi folks
>>
>> I encourage people to please look at your MTT outputs. As we are preparing
>> to roll the 1.3.3 release, I am seeing a lot of problems on the branch:
>>
>> 1. timeouts, coming in two forms: (a) MPI_Abort hanging, and (b)
>> collectives hanging (this is mostly on Solaris)
>>
>>  Can you clarify or send me a link that makes you believe b is mostly
> solaris.  Looking at last night's Sun's MTT 1.3 nightly runs I see 47
> timeouts on Linux and 24 timeouts on Solaris.  That doesn't constitute
> mostly Solaris to me.  Also how are you determining these timeouts are
> Collective based?  I have a theory they are but I don't have a clear smoking
> gun as of yet.


I looked at this MTT report, which showed it hanging in a whole bunch of
collective tests:

http://www.open-mpi.org/mtt/index.php?limit===_drilldowns=_scale=_scale=_subtitle=_graphs=_go=_cookies==test_run_start_timestamp=2009-05-13+15%3A15%3A25+-+2009-05-14+15%3A15%3A25_platform_hardware=
^x86_64%24_platform_hardware=show_os_name=^Linux%24_os_name=show_mpi_name=^ompi-nightly-v1.3%24_mpi_name=show_mpi_version=^1.3.3a1r21173%24_mpi_version=show_suite_name=all_suite_name=show_test_name=all_test_name=hide_np=all_np=show_full_command=_full_command=show_http_username=^sun%24_http_username=show_local_username=all_local_username=hide_platform_name=^burl-ct-v20z-10%24_platform_name=show=Detail=test_run_result=_rt_os_version=_os_version=_platform_type=_platform_type=_hostname=_hostname=_compiler_name=_compiler_name=_compiler_version=_compiler_version=_vpath_mode=_vpath_mode=_endian=_endian=_bitness=_bitness=_configure_arguments=_exit_value=_exit_value=_exit_signal=_exit_signal=_duration=_duration=_client_serial=_client_serial=_result_message=_result_stdout=_result_stderr=_environment=_description=_launcher=_launcher=_resource_mgr=_resource_mgr=_network=_network=_parameters=_parameters==summary

When I look at the hangs on other systems, they are in non-collective tests.
I'm not sure what that really means, though - it was just an observation
based on this one set of tests.


>
>
> I've been looking at some collective hangs and segv's.  These seem to
> happen across different platform and OS (Linux and Solaris).  I've been
> finding it really hard to reproduce.  I ran MPI_Allreduce_loc_c on a three
> clusters for 2 days without a hang or segv.  I am really concerned whether
> we'll even be able to get this to fail with debugging on.
> I have not been able to get a core or time with a hung run in order to get
> more information.
>
>> 2. segfaults - mostly on sif, but occasionally elsewhere
>>
>> 3. daemon failed to report back - this was only on sif
>>
>> We will need to correct many of these for the release - unless it proves
>> to be due to trivial errors, I don't see how we will be ready to roll
>> release candidates next week.
>>
>> So let's please start taking a look at these?!
>>
>>  I've actually been looking at ours though I have not been extremely
> vocal.  I was hoping to get more info on our timeouts before requesting
> help.


No problem - I wasn't pointing a finger at anyone in particular. Just wanted
to highlight that the branch is not in great shape since we had talked on
the telecon about trying to do a release next week.



>  Ralph
>>
>> 
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Jeff Squyres
Hmm.  This may not be pilot error.  I build OMPI with a pre-installed  
OMPI all the time and they don't conflict during the build (i.e., the  
building OMPI always uses the libopen-rte and libopen-pal from the  
build tree, not the install tree).  Here's my link lines for ompi_info:


/bin/sh ../../../libtool --tag=CXX   --mode=link g++  -g -Wall -Wundef  
-Wno-long-long -finline-functions -pthread  -export-dynamic   -o  
ompi_info components.o ompi_info.o output.o param.o version.o ../../../ 
ompi/libmpi.la -lnsl  -lutil -lm
libtool: link: g++ -g -Wall -Wundef -Wno-long-long -finline-functions - 
pthread -o .libs/ompi_info components.o ompi_info.o output.o param.o  
version.o -Wl,--export-dynamic  ../../../ompi/.libs/libmpi.so /users/ 
jsquyres/svn/ompi/orte/.libs/libopen-rte.so /users/jsquyres/svn/ompi/ 
opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm -pthread -Wl,-rpath - 
Wl,/home/jsquyres/bogus/lib


Notice that libopen-rte.os and libopen-pal.so are explicitly mentioned  
by absolute path name.  Yours weren't.  I wonder why...?



On May 14, 2009, at 12:41 PM, Ashley Pittman wrote:



Libtool is 2.2.6.  I use debian unstable so it's normally fairly
up-to-date, I suppose it's not impossible that a debian update has
broken things now that I think of it.

I normally build in memfs for speed and have just rebooted my machine
now, a full rebuild has failed again with the same errors.

All three symbols are shown as B according to nm so they should be
available.

Actually further testing shows it's user error again, if I remove the
current install then the build succeeds, it must have been pickings up
the libopen-pal from the install location rather than from the current
build.

Ashley Pittman,

On Thu, 2009-05-14 at 11:50 -0400, Jeff Squyres wrote:
> Hmm; odd.  I'm not getting these errors.  Just to be sure, I did a
> VPATH build and still am not getting these errors...  :-\
>
> Are those symbols publicly available in libopen-pal.so?
>
> It does seem pretty weird that your libtool link line didn't pick up
> libopen-rte.so and libopen-pal.so...?  What version of LT are you  
using?

>
>
> On May 14, 2009, at 10:28 AM, Ashley Pittman wrote:
>
> >
> > All,
> >
> > After a svn update earlier I'm getting build failures on the  
trunk,

> > I've
> > tried the usual including a full clean checkout and am still  
getting

> > the
> > errors.
> >
> > I'm not doing anything special other than a VPATH build and this  
same
> > tree build last week, it's just the update that appears to have  
broken

> > things.
> >
> > The configure line used was
> > ~/code/OpenMPI/ompi-trunk-tes/trunk/configure
> > --enable-mpirun-prefix-by-default
> > --prefix /mnt/home/debian/ashley/code/OpenMPI/install/ and I'm  
using

> > the
> > tree at http://svn.open-mpi.org/svn/ompi/trunk, I hope this is the
> > correct one.
> >
> > This is the error the build fails with:
> >
> > /bin/sh ../../../libtool --tag=CXX   --mode=link g++  -O3 -DNDEBUG
> > -finline-functions -pthread  -export-dynamic   -o ompi_info
> > components.o
> > ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la - 
lnsl

> > -lutil -lm
> > libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread
> > -o .libs/ompi_info components.o ompi_info.o output.o param.o  
version.o
> > -Wl,--export-dynamic  ../../../ompi/.libs/libmpi.so -lnsl -lutil  
-lm
> > -pthread -Wl,-rpath -Wl,/mnt/home/debian/ashley/code/OpenMPI/ 
install/

> > lib
> > ../../../ompi/.libs/libmpi.so: undefined reference to
> > `opal_maffinity_setup'
> > ../../../ompi/.libs/libmpi.so: undefined reference to
> > `opal_paffinity_alone'
> > ../../../ompi/.libs/libmpi.so: undefined reference to
> > `opal_paffinity_base_slot_list'
> > collect2: ld returned 1 exit status
> > make[2]: *** [ompi_info] Error 1
> > make[2]: Leaving directory `/mnt/memfs/openmpi/ompi/tools/ 
ompi_info'

> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory `/mnt/memfs/openmpi/ompi'
> > make: *** [all-recursive] Error 1
> > ashley@alpha:/mnt/memfs/openmpi$
> >
> > I can provide more information if requested although as I say I  
don't

> > think I'm doing anything out of the ordinary.
> >
> > Ashley Pittman,
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Ralf Wildenhues
Hello,

Ashley, did you rebootstrap with Debian's Libtool?

They enable link_all_deplibs=no in their Libtool which changes some
things and can cause issues like this.  Can't hurt to open a Debian
bug report about it (targeted against libtool) so they know this issue
exists.

Can you try working around it by setting link_all_deplibs to "yes",
then rebuilding all the libraries?  Like this, done in the top build
directory with your current build tree:
  find . -name libtool | xargs \
sed -i 's/^\(link_all_deplibs=\).*//'
  find . -name \*.la | xargs ./libtool --mode=clean rm -f
  make

If that does not work, then I'd be very interested in what the failure
would look at that point.

A more permanent workaround could be in OpenMPI to list each library
that is used *directly* by some other library as a dependency.  Sigh.
Or fix Debian Libtool.

Cheers,
Ralf

* Jeff Squyres wrote on Thu, May 14, 2009 at 07:28:47PM CEST:
> Hmm.  This may not be pilot error.  I build OMPI with a pre-installed  
> OMPI all the time and they don't conflict during the build (i.e., the  
> building OMPI always uses the libopen-rte and libopen-pal from the build 
> tree, not the install tree).  Here's my link lines for ompi_info:
>
> /bin/sh ../../../libtool --tag=CXX   --mode=link g++  -g -Wall -Wundef  
> -Wno-long-long -finline-functions -pthread  -export-dynamic   -o  
> ompi_info components.o ompi_info.o output.o param.o version.o ../../../ 
> ompi/libmpi.la -lnsl  -lutil -lm
> libtool: link: g++ -g -Wall -Wundef -Wno-long-long -finline-functions - 
> pthread -o .libs/ompi_info components.o ompi_info.o output.o param.o  
> version.o -Wl,--export-dynamic  ../../../ompi/.libs/libmpi.so /users/ 
> jsquyres/svn/ompi/orte/.libs/libopen-rte.so /users/jsquyres/svn/ompi/ 
> opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm -pthread -Wl,-rpath - 
> Wl,/home/jsquyres/bogus/lib
>
> Notice that libopen-rte.os and libopen-pal.so are explicitly mentioned  
> by absolute path name.  Yours weren't.  I wonder why...?
>
>
> On May 14, 2009, at 12:41 PM, Ashley Pittman wrote:
>
>>
>> Libtool is 2.2.6.  I use debian unstable so it's normally fairly
>> up-to-date, I suppose it's not impossible that a debian update has
>> broken things now that I think of it.
>>
>> I normally build in memfs for speed and have just rebooted my machine
>> now, a full rebuild has failed again with the same errors.
>>
>> All three symbols are shown as B according to nm so they should be
>> available.
>>
>> Actually further testing shows it's user error again, if I remove the
>> current install then the build succeeds, it must have been pickings up
>> the libopen-pal from the install location rather than from the current
>> build.
>>
>> Ashley Pittman,


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Jeff Squyres

On May 14, 2009, at 1:46 PM, Ralf Wildenhues wrote:


A more permanent workaround could be in OpenMPI to list each library
that is used *directly* by some other library as a dependency.  Sigh.



We actually took pains to *not* do that; we *used* to do that and  
explicitly took it out.  :-\  IIRC, it had something to do with  
dlopen'ing libmpi.so...?



Or fix Debian Libtool.




That sounds better to me, but I'm admittedly a little biased.  :-)

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Ralf Wildenhues
Hello,

* Jeff Squyres wrote on Thu, May 14, 2009 at 07:56:24PM CEST:
> On May 14, 2009, at 1:46 PM, Ralf Wildenhues wrote:
>
>> A more permanent workaround could be in OpenMPI to list each library
>> that is used *directly* by some other library as a dependency.  Sigh.
>
> We actually took pains to *not* do that; we *used* to do that and  
> explicitly took it out.  :-\  IIRC, it had something to do with  
> dlopen'ing libmpi.so...?

Admittedly, I didn't look at Open MPI in detail before writing my
previous reply.  So it would be nice to know the outcome of the
workaround anyway (I do have a Debian here, but different Libtool
versions and little time), there could also be another genuine bug
hiding there.  Dlopening sounds like Debian Libtool issue though,
and one worthy of a Debian bug report (because that is not intended
by them to fail).

Thanks,
Ralf


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Brian W. Barrett

On Thu, 14 May 2009, Jeff Squyres wrote:


On May 14, 2009, at 1:46 PM, Ralf Wildenhues wrote:


A more permanent workaround could be in OpenMPI to list each library
that is used *directly* by some other library as a dependency.  Sigh.


We actually took pains to *not* do that; we *used* to do that and explicitly 
took it out.  :-\  IIRC, it had something to do with dlopen'ing libmpi.so...?


Actually, I think that was something else.  Today, libopen-rte.la lists 
libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la.  I had 
removed the dependency of libmpi.la on libopen-pal.la because it was 
causing libopen-pal.so to be listed twice by libtool, which was causing 
problems.


It would be a trivial fix to change the Makefiles to make libmpi.la to 
depend on libopen-pal.la as well as libopen-rte.la.


Brian


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Ralf Wildenhues
Hi Brian,

* Brian W. Barrett wrote on Thu, May 14, 2009 at 08:22:58PM CEST:
>
> Actually, I think that was something else.  Today, libopen-rte.la lists  
> libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la.  I had 
> removed the dependency of libmpi.la on libopen-pal.la because it was  
> causing libopen-pal.so to be listed twice by libtool, which was causing  
> problems.

That's weird, and shouldn't happen (the problems, that is).  Do you have
a pointer for them?

Thanks,
Ralf


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Jeff Squyres

On May 14, 2009, at 2:22 PM, Brian W. Barrett wrote:

> We actually took pains to *not* do that; we *used* to do that and  
explicitly
> took it out.  :-\  IIRC, it had something to do with dlopen'ing  
libmpi.so...?


Actually, I think that was something else.  Today, libopen-rte.la  
lists
libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la.   
I had

removed the dependency of libmpi.la on libopen-pal.la because it was
causing libopen-pal.so to be listed twice by libtool, which was  
causing

problems.

It would be a trivial fix to change the Makefiles to make libmpi.la to
depend on libopen-pal.la as well as libopen-rte.la.




Ah -- am I thinking of us removing libmpi (etc.) from the DSOs?

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Brian W. Barrett

On Thu, 14 May 2009, Ralf Wildenhues wrote:


Hi Brian,

* Brian W. Barrett wrote on Thu, May 14, 2009 at 08:22:58PM CEST:


Actually, I think that was something else.  Today, libopen-rte.la lists
libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la.  I had
removed the dependency of libmpi.la on libopen-pal.la because it was
causing libopen-pal.so to be listed twice by libtool, which was causing
problems.


That's weird, and shouldn't happen (the problems, that is).  Do you have
a pointer for them?


I don't - it was many moons ago.  And it very likely was when we were in 
that (evil) period where we were using LT2 before it was released as 
stable.  So it's completely possible we were seeing a transient bug which 
is long since gone.


Brian


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Brian W. Barrett

On Thu, 14 May 2009, Jeff Squyres wrote:


On May 14, 2009, at 2:22 PM, Brian W. Barrett wrote:

We actually took pains to *not* do that; we *used* to do that and 
explicitly
took it out.  :-\  IIRC, it had something to do with dlopen'ing 
libmpi.so...?


Actually, I think that was something else.  Today, libopen-rte.la lists
libopen-pal.la as a dependency and libmpi.la lists libopen-rte.la.  I had
removed the dependency of libmpi.la on libopen-pal.la because it was
causing libopen-pal.so to be listed twice by libtool, which was causing
problems.

It would be a trivial fix to change the Makefiles to make libmpi.la to
depend on libopen-pal.la as well as libopen-rte.la.


Ah -- am I thinking of us removing libmpi (etc.) from the DSOs?


I think so.  And that's a change we definitely don't want to undo.

Brian


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Bryan Lally

While we're talking about build failures ...

I haven't been able to build any of the 1.3.x releases on my OS X 
machines.  OS X 10.5.6 (Leopard) on Intel macs.  Attached is the 
configure command and the failure from last night's development tarball, 
openmpi-1.3.3a1r21223.tar.gz.  1.2.x builds fine.


- Bryan

--
Bryan Lally, la...@lanl.gov
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Jeff Squyres

Did you mean to attach something?

FWIW, I can configure/build on Leopard just fine...?  I'm using the  
compilers from hpc.sf.net, though.  I haven't tried recently with the  
native Leopard compilers.



On May 14, 2009, at 2:38 PM, Bryan Lally wrote:


While we're talking about build failures ...

I haven't been able to build any of the 1.3.x releases on my OS X  
machines.  OS X 10.5.6 (Leopard) on Intel macs.  Attached is the  
configure command and the failure from last night's development  
tarball, openmpi-1.3.3a1r21223.tar.gz.  1.2.x builds fine.


- Bryan

--
Bryan Lally, la...@lanl.gov
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Bryan Lally

Argh.  This time with attachment attached ...

Bryan Lally wrote:

While we're talking about build failures ...

I haven't been able to build any of the 1.3.x releases on my OS X 
machines.  OS X 10.5.6 (Leopard) on Intel macs.  Attached is the 
configure command and the failure from last night's development tarball, 
openmpi-1.3.3a1r21223.tar.gz.  1.2.x builds fine.


- Bryan



--
Bryan Lally, la...@lanl.gov
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico
./configure\
--prefix=/usr/local/openmpi-1.3.3x \
--disable-mpi-f77  \
--disable-mpi-f90  \
--disable-mpi-profile

make

...

Making all in tools/orte-iof
/bin/sh ../../../libtool --tag=CC   --mode=link gcc  -O3 -DNDEBUG 
-finline-functions -fno-strict-aliasing  -fvisibility=hidden  -export-dynamic   
-o orte-iof orte-iof.o ../../../orte/libopen-rte.la -lutil  
libtool: link: gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing 
-fvisibility=hidden -o orte-iof orte-iof.o  ../../../orte/.libs/libopen-rte.a 
/Users/lally/Software/openmpi-1.3.3a1r21223/opal/.libs/libopen-pal.a -lutil
Undefined symbols:
  "_orte_iof", referenced from:
  _orte_iof$non_lazy_ptr in orte-iof.o
  "_orte_routed", referenced from:
  _orte_routed$non_lazy_ptr in libopen-rte.a(hnp_contact.o)
  _orte_routed$non_lazy_ptr in libopen-rte.a(rml_base_contact.o)
ld: symbol(s) not found
collect2: ld returned 1 exit status
make[2]: *** [orte-iof] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Ralph Castain
Blast - wish I could remember, but I did see that once before and now can't
remember the fix. I can build non-tarballs just fine on my Mac, though, so
it could be a problem with the tarball not picking something up.



On Thu, May 14, 2009 at 12:41 PM, Bryan Lally  wrote:

> Argh.  This time with attachment attached ...
>
>
> Bryan Lally wrote:
>
>> While we're talking about build failures ...
>>
>> I haven't been able to build any of the 1.3.x releases on my OS X
>> machines.  OS X 10.5.6 (Leopard) on Intel macs.  Attached is the configure
>> command and the failure from last night's development tarball,
>> openmpi-1.3.3a1r21223.tar.gz.  1.2.x builds fine.
>>
>>- Bryan
>>
>
>
> --
> Bryan Lally, la...@lanl.gov
> 505.667.9954
> CCS-2
> Los Alamos National Laboratory
> Los Alamos, New Mexico
>
> ./configure\
>--prefix=/usr/local/openmpi-1.3.3x \
>--disable-mpi-f77  \
>--disable-mpi-f90  \
>--disable-mpi-profile
>
> make
>
> ...
>
> Making all in tools/orte-iof
> /bin/sh ../../../libtool --tag=CC   --mode=link gcc  -O3 -DNDEBUG
> -finline-functions -fno-strict-aliasing  -fvisibility=hidden
>  -export-dynamic   -o orte-iof orte-iof.o ../../../orte/libopen-rte.la-lutil
> libtool: link: gcc -O3 -DNDEBUG -finline-functions -fno-strict-aliasing
> -fvisibility=hidden -o orte-iof orte-iof.o
>  ../../../orte/.libs/libopen-rte.a
> /Users/lally/Software/openmpi-1.3.3a1r21223/opal/.libs/libopen-pal.a -lutil
> Undefined symbols:
>  "_orte_iof", referenced from:
>  _orte_iof$non_lazy_ptr in orte-iof.o
>  "_orte_routed", referenced from:
>  _orte_routed$non_lazy_ptr in libopen-rte.a(hnp_contact.o)
>  _orte_routed$non_lazy_ptr in libopen-rte.a(rml_base_contact.o)
> ld: symbol(s) not found
> collect2: ld returned 1 exit status
> make[2]: *** [orte-iof] Error 1
> make[1]: *** [all-recursive] Error 1
> make: *** [all-recursive] Error 1
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Bryan Lally

Jeff Squyres wrote:

Did you mean to attach something?


yeah, oops.  I can't count how many times I've done that

FWIW, I can configure/build on Leopard just fine...?  I'm using the 
compilers from hpc.sf.net, though.  I haven't tried recently with the 
native Leopard compilers.


This was with the native Leopard compilers.

- Bryan

--
Bryan Lally, la...@lanl.gov
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico


Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Ashley Pittman
On Thu, 2009-05-14 at 19:46 +0200, Ralf Wildenhues wrote:
> Hello,
> 
> Ashley, did you rebootstrap with Debian's Libtool?

I'm not sure I understand the question, I did a fresh checkout and
re-ran ./autogen.sh if that's what you mean.

> They enable link_all_deplibs=no in their Libtool

That appears to the the case.

> which changes some
> things and can cause issues like this.  Can't hurt to open a Debian
> bug report about it (targeted against libtool) so they know this issue
> exists.
> 
> Can you try working around it by setting link_all_deplibs to "yes",
> then rebuilding all the libraries?  Like this, done in the top build
> directory with your current build tree:
>   find . -name libtool | xargs \
> sed -i 's/^\(link_all_deplibs=\).*//'
>   find . -name \*.la | xargs ./libtool --mode=clean rm -f
>   make

Moving back in the install dir which luckily I still had lying around
and re-compiling did work so I assume you are correct.

> If that does not work, then I'd be very interested in what the failure
> would look at that point.
> 
> A more permanent workaround could be in OpenMPI to list each library
> that is used *directly* by some other library as a dependency.  Sigh.

Would it be this or would it be listing library's which are used
directly by some other library and are distributed as part of  OpenMPI.
Sounds slightly more sensible when you phrase it like that.

> Or fix Debian Libtool.

My naive view here is that link_all_deplibs=no sounds like a sensible
default as the linker should do the right thing if they aren't named.
It sounds to me like Brians suggestion of stating a dependency from
libmpi.la to libopen-pal.la might have more miles in it.

That still doesn't explain why my link line didn't show either being
linked and Geoff sees both however.

I'll keep the code here lying around in case you want me to perform
further tests.

Ashley,



Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Jeff Squyres
Hmm.  I just did a build with both the SF.net compilers and then a 2nd  
build with the native leopard compilers of openmpi-1.3.3a1r21223.


The Leopard build failed deep in VT, though, with some obscure C++ STL- 
looking error -- but OMPI itself built fine.  You can compile OMPI  
without VT with --enable-contrib-no-build=vt.


I'll send a note to the VT guys.



On May 14, 2009, at 3:07 PM, Bryan Lally wrote:


Jeff Squyres wrote:
> Did you mean to attach something?

yeah, oops.  I can't count how many times I've done that

> FWIW, I can configure/build on Leopard just fine...?  I'm using the
> compilers from hpc.sf.net, though.  I haven't tried recently with  
the

> native Leopard compilers.

This was with the native Leopard compilers.

- Bryan

--
Bryan Lally, la...@lanl.gov
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Jeff Squyres

On May 14, 2009, at 3:14 PM, Jeff Squyres (jsquyres) wrote:

The Leopard build failed deep in VT, though, with some obscure C++  
STL-

looking error -- but OMPI itself built fine.  You can compile OMPI
without VT with --enable-contrib-no-build=vt.

I'll send a note to the VT guys.



I take that back -- I just did 3 more builds and was unable to get the  
VT build to fail.  That's not good.  :-(


I did do a parallel build -- perhaps that wonked something up in VT  
there.  Unfortunately, I don't have any of the build logs, though --  
so I don't have anything to report...


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Build failures on trunk? r21235

2009-05-14 Thread Bryan Lally

Jeff Squyres wrote:

I take that back -- I just did 3 more builds and was unable to get the 
VT build to fail.  That's not good.  :-(


And I'm never getting there - I still fail in tools/orte-iof, same way.

I tried removing apple's mpi.h in /usr/include, but that wasn't it.

This is a very stock OS X box.  Apple's tools, including gcc (4.0.1), 
libtool and the linker.


- Bryan

--
Bryan Lally, la...@lanl.gov
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico