Re: [OMPI devel] oshmem enabled by default

2014-08-04 Thread Paul Hargrove
Since "disabled by default" is just part of a macro argument we can say
anything we want.
I propose the following:

Index: config/oshmem_configure_options.m4
===
--- config/oshmem_configure_options.m4  (revision 32424)
+++ config/oshmem_configure_options.m4  (working copy)
@@ -22,7 +22,7 @@
 AC_MSG_CHECKING([if want oshmem])
 AC_ARG_ENABLE([oshmem],
   [AC_HELP_STRING([--enable-oshmem],
-  [Enable building the OpenSHMEM interface
(disabled by default)])],
+  [Enable building the OpenSHMEM interface
(available on Linux only, where it is enabled by default)])],
   [oshmem_arg_given=yes],
   [oshmem_arg_given=no])
 if test "$oshmem_arg_given" = "yes"; then


-Paul




On Mon, Aug 4, 2014 at 7:34 PM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

>  Paul,
>
> this is a bit trickier ...
>
> on a Linux platform oshmem is built by default,
> on a non Linux platform, oshmem is *not* built by default.
>
> so the configure message (disabled by default) is correct on non Linux
> platform, and incorrect on Linux platform ...
>
> i do not know what should be done, here are some options :
> - have a different behaviour on Linux vs non Linux platforms (by the way,
> does autotools support this ?)
> - disable by default, provide only the --enable-oshmem option (so
> configure abort if --enable-oshmem on non Linux platforms)
> - provide only the --disable-oshmem option, useful only on Linux
> platforms. on non Linux platforms do not build oshmem and this is not an
> error
> - other ?
>
> Cheers,
>
> Gilles
>
> r31155 | rhc | 2014-03-20 05:32:15 +0900 (Thu, 20 Mar 2014) | 5 lines
>
> As per the thread on ticket #4399, OSHMEM does not support non-Linux
> platforms. So provide a check for Linux and error out if --enable-oshmem is
> given on a non-supported platform. If no OSHMEM option is given (enable or
> disable), then don't attempt to build OSHMEM unless we are on a Linux
> platform. Default to building if we are on Linux for now, pending the
> outcome of the Debian situation.
>
>
> On 2014/08/05 6:41, Paul Hargrove wrote:
>
> In both trunk and 1.8.2rc3 the behavior is to enable oshmem by default.
>
> In the 1.8.2rc3 tarball the configure help output matches the behavior.
> HOWEVER, in the trunk the configure help output still says oshmem is
> DISabled by default.
>
> {~/OMPI/ompi-trunk}$ svn info | grep "Revision"
> Revision: 32422
> {~/OMPI/ompi-trunk}$ ./configure --help | grep -A1 'enable-oshmem '
>   --enable-oshmem Enable building the OpenSHMEM interface (disabled
> by
>   default)
>
> -Paul
>
>
> On Thu, Jul 24, 2014 at 2:09 PM, Ralph Castain  
>  wrote:
>
>
>  Actually, it already is set correctly - the help message was out of date,
> so I corrected that.
>
> On Jul 24, 2014, at 10:58 AM, Marco Atzeri  
>  wrote:
>
>
>  On 24/07/2014 15:52, Ralph Castain wrote:
>
>  Oshmem should be enabled by default now
>
>  Ok,
> so please reverse the configure switch
>
>  --enable-oshmem Enable building the OpenSHMEM interface
>
>  (disabled by default)
>
>  I will test enabling it in the meantime.
>
> Regards
> Marco
>
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
>
>  http://www.open-mpi.org/community/lists/devel/2014/07/15254.php
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this 
> post:http://www.open-mpi.org/community/lists/devel/2014/07/15261.php
>
>
>
> ___
> devel mailing listde...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15502.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15507.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] oshmem enabled by default

2014-08-04 Thread Gilles Gouaillardet
Paul,

this is a bit trickier ...

on a Linux platform oshmem is built by default,
on a non Linux platform, oshmem is *not* built by default.

so the configure message (disabled by default) is correct on non Linux
platform, and incorrect on Linux platform ...

i do not know what should be done, here are some options :
- have a different behaviour on Linux vs non Linux platforms (by the
way, does autotools support this ?)
- disable by default, provide only the --enable-oshmem option (so
configure abort if --enable-oshmem on non Linux platforms)
- provide only the --disable-oshmem option, useful only on Linux
platforms. on non Linux platforms do not build oshmem and this is not an
error
- other ?

Cheers,

Gilles

r31155 | rhc | 2014-03-20 05:32:15 +0900 (Thu, 20 Mar 2014) | 5 lines

As per the thread on ticket #4399, OSHMEM does not support non-Linux
platforms. So provide a check for Linux and error out if --enable-oshmem
is given on a non-supported platform. If no OSHMEM option is given
(enable or disable), then don't attempt to build OSHMEM unless we are on
a Linux platform. Default to building if we are on Linux for now,
pending the outcome of the Debian situation.


On 2014/08/05 6:41, Paul Hargrove wrote:
> In both trunk and 1.8.2rc3 the behavior is to enable oshmem by default.
>
> In the 1.8.2rc3 tarball the configure help output matches the behavior.
> HOWEVER, in the trunk the configure help output still says oshmem is
> DISabled by default.
>
> {~/OMPI/ompi-trunk}$ svn info | grep "Revision"
> Revision: 32422
> {~/OMPI/ompi-trunk}$ ./configure --help | grep -A1 'enable-oshmem '
>   --enable-oshmem Enable building the OpenSHMEM interface (disabled
> by
>   default)
>
> -Paul
>
>
> On Thu, Jul 24, 2014 at 2:09 PM, Ralph Castain  wrote:
>
>> Actually, it already is set correctly - the help message was out of date,
>> so I corrected that.
>>
>> On Jul 24, 2014, at 10:58 AM, Marco Atzeri  wrote:
>>
>>> On 24/07/2014 15:52, Ralph Castain wrote:
 Oshmem should be enabled by default now
>>> Ok,
>>> so please reverse the configure switch
>>>
>>>  --enable-oshmem Enable building the OpenSHMEM interface
>> (disabled by default)
>>> I will test enabling it in the meantime.
>>>
>>> Regards
>>> Marco
>>>
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15254.php
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15261.php
>>
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15502.php



[OMPI devel] minor atomics nit

2014-08-04 Thread Paul Hargrove
Running "make dist" on trunk I see:

--> Generating assembly for "SPARC" "default-.text-.globl-:--.L-#-1-0-1-0-0"
Could not open ../../../opal/asm/base/SPARC.asm: No such file or directory

Which is apparent because the following lines were never removed
from opal/asm/asm-data.txt

# default compile mode on Solaris.  Evil.  equiv to about Sparc v8
SPARC   default-.text-.globl-:--.L-#-1-0-1-0-0  sparc-solaris

README is clear about having dropped support for SPARC < v8plus.


-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


[OMPI devel] [vt] --with-openmpi-inside configure argument

2014-08-04 Thread Paul Hargrove
I noticed that Open MPI is passing
--with-openmpi-inside=1.7
in the arguments passed to
ompi/contrib/vt/vt/configure
and
ompi/contrib/vt/vt/extlib/otf/configure

The extlib/otf case just tests if the value is set, but the top-level
vt/configure is checking for the specific string "1.7":

# Check whether we are inside Open MPI package
inside_openmpi="no"
AC_ARG_WITH(openmpi-inside, [],
[
AS_IF([test x"$withval" = "xyes" -o x"$withval" = "x1.7"],
[
inside_openmpi="$withval"
CPPFLAGS="-DINSIDE_OPENMPI $CPPFLAGS"

# Set FC to F77 if Open MPI version < 1.7
AS_IF([test x"$withval" = "xyes" -a x"$FC" = x -a x"$F77"
!= x],
[FC="$F77"])
])
])

That logic looks a bit fragile with respect to any future changes.
Specifically the inner AS_IF is true for the desired condition "version <
1.7" only because the outer AS_IF currently ensures the only possible
values of "$withval" are "yes" and "1.7".

-Paul

-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-04 Thread Ralph Castain
My thought was to post initially as a blocker, pending a discussion with Jeff 
at tomorrow's telecon. If he thinks this is something we can fix in some 
central point (thus catching it everywhere), then it could be quick and worth 
doing. However, I'm skeptical as I tried to do that in the most obvious place, 
and it failed (could be operator error).

Will let you know tomorrow. Truly appreciate your digging on this!
Ralph

On Aug 4, 2014, at 3:50 PM, Paul Hargrove  wrote:

> Ralph and Jeff,
> 
> I've been digging and find the problem is wider than just the one library and 
> has manifestations specific to FreeBSD, NetBSD and Solaris.  I am adding new 
> info to the ticket as I unearth it.
> 
> Additionally, it appears this existed in 1.8, 1.8.1 and in the 1.7 series as 
> well.
> So, would suggest this NOT be a blocker for a 1.8.2 release.
> 
> Of course I am willing to provide testing if you still want to push for a 
> quick resolution.
> 
> -Paul
> 
> 
> On Mon, Aug 4, 2014 at 1:27 PM, Ralph Castain  wrote:
> Okay, I filed a blocker on this for 1.8.2 and assigned it to Jeff. I took a 
> crack at fixing it, but came up short :-(
> 
> 
> On Aug 3, 2014, at 10:46 PM, Paul Hargrove  wrote:
> 
>> I've identified the difference between the platform that does link libutil 
>> and the one that does not.
>> 
>> 1) libutil is linked (as an OMPI dependency) only on the working system:
>> 
>> Working system:
>> $ grep 'checking for .* LIBS' configure.out
>> checking for OPAL LIBS... -lm -lpciaccess -ldl 
>> checking for ORTE LIBS... -lm -lpciaccess -ldl -ltorque 
>> checking for OMPI LIBS... -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil
>> 
>> NON-working system:
>> $ grep 'checking for .* LIBS' configure.out
>> checking for OPAL LIBS... -lm -ldl 
>> checking for ORTE LIBS... -lm -ldl -ltorque 
>> checking for OMPI LIBS... -lm -ldl -ltorque 
>> 
>> So, the working system that does link libutil is doing so as an OMPI 
>> dependency.
>> However it is also needed for opal (only caller of openpty is 
>> opal/util/open_pty.c).
>> 
>> 2) Only the working system is building ROMIO:
>> 
>> Comparing the 'checking if * can compile' lines of configure output shows 
>> only ONE difference:
>> 
>>  checking if MCA component fs:ufs can compile... yes
>>  checking if MCA component fs:pvfs2 can compile... no
>>  checking if MCA component io:ompio can compile... yes
>> -checking if MCA component io:romio can compile... no
>> +checking if MCA component io:romio can compile... yes
>>  checking if MCA component mpool:grdma can compile... yes
>>  checking if MCA component mpool:sm can compile... yes
>>  checking if MCA component mpool:udreg can compile... no
>> 
>> So, it appears that *if* ROMIO is configured in, then "-lutil" gets added to 
>> OMPI_WRAPPER_EXTRA_LIBS.
>> This masks the fact that it is missing from OPAL_WRAPPER_EXTRA_LIBS.
>> 
>> 
>> I have confirmed that I can reproduce the static linking failure by adding 
>> --disable-io-romio to the configure options of the system that worked 
>> previously.
>> 
>> So, I update my report (and the email subject line) to:
>>Static linking fails on Linux when not building ROMIO
>> 
>> -Paul
>> 
>> 
>> 
>> On Sun, Aug 3, 2014 at 6:22 PM, Paul Hargrove  wrote:
>> Hmm,
>> 
>> On a different Linux/x86-64 host things work as expected with '-lutil' 
>> linked explicitly:
>> 
>> $ ./INST/bin/mpicc -showme BLD/examples/hello_c.c 
>> pgcc BLD/examples/hello_c.c 
>> -I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/include
>>  -L/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib 
>> -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath 
>> -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib 
>> -Wl,-rpath 
>> -Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
>>  
>> -L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
>>  -lmpi -lopen-rte -lopen-pal -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil
>> 
>> Searching for relevant differences now...
>> 
>> -Paul
>> 
>> 
>> On Sun, Aug 3, 2014 at 4:58 PM, Paul Hargrove  wrote:
>> 
>> I've configured the 1.8.2rc3 tarball with "--enable-static --disable-shared" 
>> on a fairly standard Linux/x86-64 platform.  While there are no problems on 
>> the same platform w/o these configure flags, with them I cannot link any 
>> application codes.
>> 
>> $ mpicc -ghello_c.c   -o hello_c
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o):
>>  In function `opal_openpty':
>> opal_pty.c:(.text+0x1): undefined reference to `openpty'
>> 
>> I checked "make openpty" and the manpage says to link with '-lutil'.
>> The '-showme' does not show libutil:
>> 
>> $ mpicc -showme hello_c.c
>> gcc hello_c.c 
>> 

Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-04 Thread Paul Hargrove
Ralph and Jeff,

I've been digging and find the problem is wider than just the one library
and has manifestations specific to FreeBSD, NetBSD and Solaris.  I am
adding new info to the ticket as I unearth it.

Additionally, it appears this existed in 1.8, 1.8.1 and in the 1.7 series
as well.
So, would suggest this NOT be a blocker for a 1.8.2 release.

Of course I am willing to provide testing if you still want to push for a
quick resolution.

-Paul


On Mon, Aug 4, 2014 at 1:27 PM, Ralph Castain  wrote:

> Okay, I filed a blocker on this for 1.8.2 and assigned it to Jeff. I took
> a crack at fixing it, but came up short :-(
>
>
> On Aug 3, 2014, at 10:46 PM, Paul Hargrove  wrote:
>
> I've identified the difference between the platform that does link libutil
> and the one that does not.
>
> 1) libutil is linked (as an OMPI dependency) only on the working system:
>
> Working system:
> $ grep 'checking for .* LIBS' configure.out
> checking for OPAL LIBS... -lm -lpciaccess -ldl
> checking for ORTE LIBS... -lm -lpciaccess -ldl -ltorque
> checking for OMPI LIBS... -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil
>
> NON-working system:
> $ grep 'checking for .* LIBS' configure.out
> checking for OPAL LIBS... -lm -ldl
> checking for ORTE LIBS... -lm -ldl -ltorque
> checking for OMPI LIBS... -lm -ldl -ltorque
>
> So, the working system that does link libutil is doing so as an OMPI
> dependency.
> However it is also needed for opal (only caller of openpty is
> opal/util/open_pty.c).
>
> 2) Only the working system is building ROMIO:
>
> Comparing the 'checking if * can compile' lines of configure output shows
> only ONE difference:
>
>  checking if MCA component fs:ufs can compile... yes
>  checking if MCA component fs:pvfs2 can compile... no
>  checking if MCA component io:ompio can compile... yes
> -checking if MCA component io:romio can compile... no
> +checking if MCA component io:romio can compile... yes
>  checking if MCA component mpool:grdma can compile... yes
>  checking if MCA component mpool:sm can compile... yes
>  checking if MCA component mpool:udreg can compile... no
>
> So, it appears that *if* ROMIO is configured in, then "-lutil" gets added
> to OMPI_WRAPPER_EXTRA_LIBS.
> This masks the fact that it is missing from OPAL_WRAPPER_EXTRA_LIBS.
>
>
> I have confirmed that I can reproduce the static linking failure by adding
> --disable-io-romio to the configure options of the system that worked
> previously.
>
> So, I update my report (and the email subject line) to:
>Static linking fails on Linux when not building ROMIO
>
> -Paul
>
>
>
> On Sun, Aug 3, 2014 at 6:22 PM, Paul Hargrove  wrote:
>
>> Hmm,
>>
>> On a different Linux/x86-64 host things work as expected with '-lutil'
>> linked explicitly:
>>
>> $ ./INST/bin/mpicc -showme BLD/examples/hello_c.c
>> pgcc BLD/examples/hello_c.c
>> -I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/include
>> -L/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib
>> -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath
>> -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib
>> -Wl,-rpath
>> -Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
>> -L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
>> -lmpi -lopen-rte -lopen-pal -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil
>>
>> Searching for relevant differences now...
>>
>> -Paul
>>
>>
>> On Sun, Aug 3, 2014 at 4:58 PM, Paul Hargrove  wrote:
>>
>>>
>>> I've configured the 1.8.2rc3 tarball with "--enable-static
>>> --disable-shared" on a fairly standard Linux/x86-64 platform.  While there
>>> are no problems on the same platform w/o these configure flags, with them I
>>> cannot link any application codes.
>>>
>>> $ mpicc -ghello_c.c   -o hello_c
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o):
>>> In function `opal_openpty':
>>> opal_pty.c:(.text+0x1): undefined reference to `openpty'
>>>
>>> I checked "make openpty" and the manpage says to link with '-lutil'.
>>> The '-showme' does not show libutil:
>>>
>>> $ mpicc -showme hello_c.c
>>> gcc hello_c.c
>>> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/include
>>> -pthread -L/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>>> -Wl,/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
>>> -Wl,--enable-new-dtags
>>> -L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
>>> -lmpi -lopen-rte -lopen-pal -lm -ldl -ltorque -libverbs -lrdmacm
>>>
>>>
>>> It looks like configure is doing the right thing 

[OMPI devel] oshmem enabled by default

2014-08-04 Thread Paul Hargrove
In both trunk and 1.8.2rc3 the behavior is to enable oshmem by default.

In the 1.8.2rc3 tarball the configure help output matches the behavior.
HOWEVER, in the trunk the configure help output still says oshmem is
DISabled by default.

{~/OMPI/ompi-trunk}$ svn info | grep "Revision"
Revision: 32422
{~/OMPI/ompi-trunk}$ ./configure --help | grep -A1 'enable-oshmem '
  --enable-oshmem Enable building the OpenSHMEM interface (disabled
by
  default)

-Paul


On Thu, Jul 24, 2014 at 2:09 PM, Ralph Castain  wrote:

> Actually, it already is set correctly - the help message was out of date,
> so I corrected that.
>
> On Jul 24, 2014, at 10:58 AM, Marco Atzeri  wrote:
>
> > On 24/07/2014 15:52, Ralph Castain wrote:
> >> Oshmem should be enabled by default now
> >
> > Ok,
> > so please reverse the configure switch
> >
> >  --enable-oshmem Enable building the OpenSHMEM interface
> (disabled by default)
> >
> > I will test enabling it in the meantime.
> >
> > Regards
> > Marco
> >
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15254.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15261.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-04 Thread Ralph Castain
Okay, I filed a blocker on this for 1.8.2 and assigned it to Jeff. I took a 
crack at fixing it, but came up short :-(


On Aug 3, 2014, at 10:46 PM, Paul Hargrove  wrote:

> I've identified the difference between the platform that does link libutil 
> and the one that does not.
> 
> 1) libutil is linked (as an OMPI dependency) only on the working system:
> 
> Working system:
> $ grep 'checking for .* LIBS' configure.out
> checking for OPAL LIBS... -lm -lpciaccess -ldl 
> checking for ORTE LIBS... -lm -lpciaccess -ldl -ltorque 
> checking for OMPI LIBS... -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil
> 
> NON-working system:
> $ grep 'checking for .* LIBS' configure.out
> checking for OPAL LIBS... -lm -ldl 
> checking for ORTE LIBS... -lm -ldl -ltorque 
> checking for OMPI LIBS... -lm -ldl -ltorque 
> 
> So, the working system that does link libutil is doing so as an OMPI 
> dependency.
> However it is also needed for opal (only caller of openpty is 
> opal/util/open_pty.c).
> 
> 2) Only the working system is building ROMIO:
> 
> Comparing the 'checking if * can compile' lines of configure output shows 
> only ONE difference:
> 
>  checking if MCA component fs:ufs can compile... yes
>  checking if MCA component fs:pvfs2 can compile... no
>  checking if MCA component io:ompio can compile... yes
> -checking if MCA component io:romio can compile... no
> +checking if MCA component io:romio can compile... yes
>  checking if MCA component mpool:grdma can compile... yes
>  checking if MCA component mpool:sm can compile... yes
>  checking if MCA component mpool:udreg can compile... no
> 
> So, it appears that *if* ROMIO is configured in, then "-lutil" gets added to 
> OMPI_WRAPPER_EXTRA_LIBS.
> This masks the fact that it is missing from OPAL_WRAPPER_EXTRA_LIBS.
> 
> 
> I have confirmed that I can reproduce the static linking failure by adding 
> --disable-io-romio to the configure options of the system that worked 
> previously.
> 
> So, I update my report (and the email subject line) to:
>Static linking fails on Linux when not building ROMIO
> 
> -Paul
> 
> 
> 
> On Sun, Aug 3, 2014 at 6:22 PM, Paul Hargrove  wrote:
> Hmm,
> 
> On a different Linux/x86-64 host things work as expected with '-lutil' linked 
> explicitly:
> 
> $ ./INST/bin/mpicc -showme BLD/examples/hello_c.c 
> pgcc BLD/examples/hello_c.c 
> -I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/include
>  -L/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib 
> -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath 
> -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib 
> -Wl,-rpath 
> -Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
>  
> -L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
>  -lmpi -lopen-rte -lopen-pal -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil
> 
> Searching for relevant differences now...
> 
> -Paul
> 
> 
> On Sun, Aug 3, 2014 at 4:58 PM, Paul Hargrove  wrote:
> 
> I've configured the 1.8.2rc3 tarball with "--enable-static --disable-shared" 
> on a fairly standard Linux/x86-64 platform.  While there are no problems on 
> the same platform w/o these configure flags, with them I cannot link any 
> application codes.
> 
> $ mpicc -ghello_c.c   -o hello_c
> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o):
>  In function `opal_openpty':
> opal_pty.c:(.text+0x1): undefined reference to `openpty'
> 
> I checked "make openpty" and the manpage says to link with '-lutil'.
> The '-showme' does not show libutil:
> 
> $ mpicc -showme hello_c.c
> gcc hello_c.c 
> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/include
>  -pthread -L/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath 
> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath 
> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath 
> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath 
> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath 
> -Wl,/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
>  -Wl,--enable-new-dtags 
> -L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
>  -lmpi -lopen-rte -lopen-pal -lm -ldl -ltorque -libverbs -lrdmacm
> 
> 
> It looks like configure is doing the right thing on some level, but failing 
> to add '-lutil' to the appropriate list of libs (OPAL_WRAPPER_EXTRA_LIBS?):
> 
> 
> == Library and Function tests
> 
> checking if we need -lutil for openpty... yes
> checking for openpty... yes
> 
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> 

Re: [OMPI devel] opal_config_bottom.h question again

2014-08-04 Thread Ralph Castain
Yeah, I think Howard's commit isn't correct. We shouldn't have to put 
opal_config.h after the system includes. I think the real problem is that you 
aren't supposed to directly include the system malloc.h as that hoses the 
entire memory interceptor stuff.

Howard: can you try reverting your commit and just removing the system malloc.h 
as per Adrian's patch? I think that will correctly solve the problem.


On Aug 4, 2014, at 9:55 AM, Adrian Reber  wrote:

> And with following change I can get it to compile again:
> 
> diff --git a/opal/mca/mpool/base/mpool_base_frame.c 
> b/opal/mca/mpool/base/mpool_base_frame.c
> index c1b044b..f94b8a5 100644
> --- a/opal/mca/mpool/base/mpool_base_frame.c
> +++ b/opal/mca/mpool/base/mpool_base_frame.c
> @@ -21,12 +21,10 @@
> #include "opal_config.h"
> 
> #include 
> +#include 
> #ifdef HAVE_UNISTD_H
> #include  
> #endif  /* HAVE_UNISTD_H */
> -#ifdef HAVE_MALLOC_H
> -#include 
> -#endif
> 
> #include "opal/mca/mca.h"
> #include "opal/mca/base/base.h"
> diff --git a/opal/util/malloc.h b/opal/util/malloc.h
> index db5a4d0..efeaf98 100644
> --- a/opal/util/malloc.h
> +++ b/opal/util/malloc.h
> @@ -21,7 +21,7 @@
> #ifndef OPAL_MALLOC_H
> #define OPAL_MALLOC_H
> 
> -#include "opal_config.h"
> +#include 
> #include 
> 
> /*
> 
> 
> On Mon, Aug 04, 2014 at 06:39:13PM +0200, Adrian Reber wrote:
>> I can confirm this on Fedora 20 with gcc 4.8.3.
>> 
>> Running ./configure without any options gives me the same error.
>> 
>> On Mon, Aug 04, 2014 at 04:24:29PM +, Pritchard Jr., Howard wrote:
>>> Hi Ralph,
>>> 
>>> Nope that doesn't fix the problem I'm hitting.   I tried to build the opmi 
>>> trunk
>>> on a system with a much older gcc compiler (4.4.7) and it compiled :)!  But
>>> I'd like to be able to compile opmi with a newer gcc like the one on my 
>>> opensuse
>>> 13.1 box.
>>> 
>>> The preprocessor is pulling in the system malloc.h and that's where things 
>>> blow up:
>>> 
>>>  CC   base/mpool_base_frame.lo
>>> In file included from ../../../opal/include/opal_config.h:2750:0,
>>> from base/mpool_base_frame.c:21:
>>> ../../../opal/include/opal_config_bottom.h:381:38: error: expected 
>>> declaration specifiers or '...' before '(' token
>>> #define malloc(size) opal_malloc((size), __FILE__, __LINE__)
>>>  ^
>>> In file included from base/mpool_base_frame.c:28:0:
>>> /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' 
>>> before string constant
>>> extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur;
>>> ^
>>> /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' 
>>> before numeric constant
>>> In file included from ../../../opal/include/opal_config.h:2750:0,
>>> from base/mpool_base_frame.c:21:
>>> ../../../opal/include/opal_config_bottom.h:385:48: error: expected 
>>> declaration specifiers or '...' before '(' token
>>> #define calloc(nmembers, size) opal_calloc((nmembers), (size), 
>>> __FILE__, __LINE__)
>>>^
>>> ../../../opal/include/opal_config_bottom.h:385:60: error: expected 
>>> declaration specifiers or '...' before '(' token
>>> #define calloc(nmembers, size) opal_calloc((nmembers), (size), 
>>> __FILE__, __LINE__)
>>>^
>>> In file included from base/mpool_base_frame.c:28:0:
>>> /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' 
>>> before string constant
>>> extern void *calloc (size_t __nmemb, size_t __size)
>>> ^
>>> /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' 
>>> before numeric constant
>>> In file included from ../../../opal/include/opal_config.h:2750:0,
>>> from base/mpool_base_frame.c:21:
>>> ../../../opal/include/opal_config_bottom.h:389:45: error: expected 
>>> declaration specifiers or '...' before '(' token
>>> #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, 
>>> __LINE__)
>>> ^
>>> ../../../opal/include/opal_config_bottom.h:389:52: error: expected 
>>> declaration specifiers or '...' before '(' token
>>> #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, 
>>> __LINE__)
>>>^
>>> In file included from base/mpool_base_frame.c:28:0:
>>> /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' 
>>> before string constant
>>> extern void *realloc (void *__ptr, size_t __size)
>>> ^
>>> /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' 
>>> before numeric constant
>>> In file included from ../../../opal/include/opal_config.h:2750:0,
>>> from base/mpool_base_frame.c:21:
>>> ../../../opal/include/opal_config_bottom.h:393:33: error: expected 
>>> declaration specifiers or '...' before '(' token
>>> #define free(ptr) 

Re: [OMPI devel] opal_config_bottom.h question again

2014-08-04 Thread Adrian Reber
And with following change I can get it to compile again:

diff --git a/opal/mca/mpool/base/mpool_base_frame.c 
b/opal/mca/mpool/base/mpool_base_frame.c
index c1b044b..f94b8a5 100644
--- a/opal/mca/mpool/base/mpool_base_frame.c
+++ b/opal/mca/mpool/base/mpool_base_frame.c
@@ -21,12 +21,10 @@
 #include "opal_config.h"

 #include 
+#include 
 #ifdef HAVE_UNISTD_H
 #include  
 #endif  /* HAVE_UNISTD_H */
-#ifdef HAVE_MALLOC_H
-#include 
-#endif

 #include "opal/mca/mca.h"
 #include "opal/mca/base/base.h"
diff --git a/opal/util/malloc.h b/opal/util/malloc.h
index db5a4d0..efeaf98 100644
--- a/opal/util/malloc.h
+++ b/opal/util/malloc.h
@@ -21,7 +21,7 @@
 #ifndef OPAL_MALLOC_H
 #define OPAL_MALLOC_H

-#include "opal_config.h"
+#include 
 #include 

 /*


On Mon, Aug 04, 2014 at 06:39:13PM +0200, Adrian Reber wrote:
> I can confirm this on Fedora 20 with gcc 4.8.3.
> 
> Running ./configure without any options gives me the same error.
> 
> On Mon, Aug 04, 2014 at 04:24:29PM +, Pritchard Jr., Howard wrote:
> > Hi Ralph,
> > 
> > Nope that doesn't fix the problem I'm hitting.   I tried to build the opmi 
> > trunk
> > on a system with a much older gcc compiler (4.4.7) and it compiled :)!  But
> > I'd like to be able to compile opmi with a newer gcc like the one on my 
> > opensuse
> > 13.1 box.
> > 
> > The preprocessor is pulling in the system malloc.h and that's where things 
> > blow up:
> > 
> >   CC   base/mpool_base_frame.lo
> > In file included from ../../../opal/include/opal_config.h:2750:0,
> >  from base/mpool_base_frame.c:21:
> > ../../../opal/include/opal_config_bottom.h:381:38: error: expected 
> > declaration specifiers or '...' before '(' token
> > #define malloc(size) opal_malloc((size), __FILE__, __LINE__)
> >   ^
> > In file included from base/mpool_base_frame.c:28:0:
> > /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' 
> > before string constant
> > extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur;
> > ^
> > /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' 
> > before numeric constant
> > In file included from ../../../opal/include/opal_config.h:2750:0,
> >  from base/mpool_base_frame.c:21:
> > ../../../opal/include/opal_config_bottom.h:385:48: error: expected 
> > declaration specifiers or '...' before '(' token
> > #define calloc(nmembers, size) opal_calloc((nmembers), (size), 
> > __FILE__, __LINE__)
> > ^
> > ../../../opal/include/opal_config_bottom.h:385:60: error: expected 
> > declaration specifiers or '...' before '(' token
> > #define calloc(nmembers, size) opal_calloc((nmembers), (size), 
> > __FILE__, __LINE__)
> > ^
> > In file included from base/mpool_base_frame.c:28:0:
> > /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' 
> > before string constant
> > extern void *calloc (size_t __nmemb, size_t __size)
> > ^
> > /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' 
> > before numeric constant
> > In file included from ../../../opal/include/opal_config.h:2750:0,
> >  from base/mpool_base_frame.c:21:
> > ../../../opal/include/opal_config_bottom.h:389:45: error: expected 
> > declaration specifiers or '...' before '(' token
> > #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, 
> > __LINE__)
> >  ^
> > ../../../opal/include/opal_config_bottom.h:389:52: error: expected 
> > declaration specifiers or '...' before '(' token
> > #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, 
> > __LINE__)
> > ^
> > In file included from base/mpool_base_frame.c:28:0:
> > /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' 
> > before string constant
> > extern void *realloc (void *__ptr, size_t __size)
> > ^
> > /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' 
> > before numeric constant
> > In file included from ../../../opal/include/opal_config.h:2750:0,
> >  from base/mpool_base_frame.c:21:
> > ../../../opal/include/opal_config_bottom.h:393:33: error: expected 
> > declaration specifiers or '...' before '(' token
> > #define free(ptr) opal_free((ptr), __FILE__, __LINE__)
> >  ^
> > In file included from base/mpool_base_frame.c:28:0:
> > /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' 
> > before string constant
> > extern void free (void *__ptr) __THROW;
> > ^
> > /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' 
> > before numeric constant
> > 
> > 
> > 
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> > Sent: Monday, August 04, 2014 10:09 AM
> > 

Re: [OMPI devel] opal_config_bottom.h question again

2014-08-04 Thread Adrian Reber
I can confirm this on Fedora 20 with gcc 4.8.3.

Running ./configure without any options gives me the same error.

On Mon, Aug 04, 2014 at 04:24:29PM +, Pritchard Jr., Howard wrote:
> Hi Ralph,
> 
> Nope that doesn't fix the problem I'm hitting.   I tried to build the opmi 
> trunk
> on a system with a much older gcc compiler (4.4.7) and it compiled :)!  But
> I'd like to be able to compile opmi with a newer gcc like the one on my 
> opensuse
> 13.1 box.
> 
> The preprocessor is pulling in the system malloc.h and that's where things 
> blow up:
> 
>   CC   base/mpool_base_frame.lo
> In file included from ../../../opal/include/opal_config.h:2750:0,
>  from base/mpool_base_frame.c:21:
> ../../../opal/include/opal_config_bottom.h:381:38: error: expected 
> declaration specifiers or '...' before '(' token
> #define malloc(size) opal_malloc((size), __FILE__, __LINE__)
>   ^
> In file included from base/mpool_base_frame.c:28:0:
> /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' 
> before string constant
> extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur;
> ^
> /usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' 
> before numeric constant
> In file included from ../../../opal/include/opal_config.h:2750:0,
>  from base/mpool_base_frame.c:21:
> ../../../opal/include/opal_config_bottom.h:385:48: error: expected 
> declaration specifiers or '...' before '(' token
> #define calloc(nmembers, size) opal_calloc((nmembers), (size), __FILE__, 
> __LINE__)
> ^
> ../../../opal/include/opal_config_bottom.h:385:60: error: expected 
> declaration specifiers or '...' before '(' token
> #define calloc(nmembers, size) opal_calloc((nmembers), (size), __FILE__, 
> __LINE__)
> ^
> In file included from base/mpool_base_frame.c:28:0:
> /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' 
> before string constant
> extern void *calloc (size_t __nmemb, size_t __size)
> ^
> /usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' 
> before numeric constant
> In file included from ../../../opal/include/opal_config.h:2750:0,
>  from base/mpool_base_frame.c:21:
> ../../../opal/include/opal_config_bottom.h:389:45: error: expected 
> declaration specifiers or '...' before '(' token
> #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, __LINE__)
>  ^
> ../../../opal/include/opal_config_bottom.h:389:52: error: expected 
> declaration specifiers or '...' before '(' token
> #define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, __LINE__)
> ^
> In file included from base/mpool_base_frame.c:28:0:
> /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' 
> before string constant
> extern void *realloc (void *__ptr, size_t __size)
> ^
> /usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' 
> before numeric constant
> In file included from ../../../opal/include/opal_config.h:2750:0,
>  from base/mpool_base_frame.c:21:
> ../../../opal/include/opal_config_bottom.h:393:33: error: expected 
> declaration specifiers or '...' before '(' token
> #define free(ptr) opal_free((ptr), __FILE__, __LINE__)
>  ^
> In file included from base/mpool_base_frame.c:28:0:
> /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' 
> before string constant
> extern void free (void *__ptr) __THROW;
> ^
> /usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' 
> before numeric constant
> 
> 
> 
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Monday, August 04, 2014 10:09 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] opal_config_bottom.h question again
> 
> I believe the issue is actually in opal/util/malloc.h, Howard. I noticed this 
> while looking around this weekend - someone included opal_config.h in the 
> malloc.h file even though it explicitly says "DON'T DO THIS"  in that header 
> file.
> 
> #ifndef OPAL_MALLOC_H
> #define OPAL_MALLOC_H
> 
> #include "opal_config.h"
> #include 
> 
> /*
>  * THIS FILE CANNOT INCLUDE ANY OTHER OPAL HEADER FILES!!!
>  *
>  * It is included via .  Hence, it should not
>  * include ANY other files, nor should it include "opal_config.h".
>  *
>  */
> 
> Don't know why someone did that, but you might see if it fixes your problem
> 
> 
> On Aug 4, 2014, at 9:00 AM, Pritchard Jr., Howard 
> > wrote:
> 
> 
> Hi Folks,
> 
> As I said last week, I'm noticing now that on my opensuse 13.1 system and gcc 
> 4.8.1, when I do a fresh
> checkout of trunk ompi and try to build, 

Re: [OMPI devel] opal_config_bottom.h question again

2014-08-04 Thread Pritchard Jr., Howard
Hi Ralph,

Nope that doesn't fix the problem I'm hitting.   I tried to build the opmi trunk
on a system with a much older gcc compiler (4.4.7) and it compiled :)!  But
I'd like to be able to compile opmi with a newer gcc like the one on my opensuse
13.1 box.

The preprocessor is pulling in the system malloc.h and that's where things blow 
up:

  CC   base/mpool_base_frame.lo
In file included from ../../../opal/include/opal_config.h:2750:0,
 from base/mpool_base_frame.c:21:
../../../opal/include/opal_config_bottom.h:381:38: error: expected declaration 
specifiers or '...' before '(' token
#define malloc(size) opal_malloc((size), __FILE__, __LINE__)
  ^
In file included from base/mpool_base_frame.c:28:0:
/usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' 
before string constant
extern void *malloc (size_t __size) __THROW __attribute_malloc__ __wur;
^
/usr/include/malloc.h:38:1: error: expected declaration specifiers or '...' 
before numeric constant
In file included from ../../../opal/include/opal_config.h:2750:0,
 from base/mpool_base_frame.c:21:
../../../opal/include/opal_config_bottom.h:385:48: error: expected declaration 
specifiers or '...' before '(' token
#define calloc(nmembers, size) opal_calloc((nmembers), (size), __FILE__, 
__LINE__)
^
../../../opal/include/opal_config_bottom.h:385:60: error: expected declaration 
specifiers or '...' before '(' token
#define calloc(nmembers, size) opal_calloc((nmembers), (size), __FILE__, 
__LINE__)
^
In file included from base/mpool_base_frame.c:28:0:
/usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' 
before string constant
extern void *calloc (size_t __nmemb, size_t __size)
^
/usr/include/malloc.h:41:1: error: expected declaration specifiers or '...' 
before numeric constant
In file included from ../../../opal/include/opal_config.h:2750:0,
 from base/mpool_base_frame.c:21:
../../../opal/include/opal_config_bottom.h:389:45: error: expected declaration 
specifiers or '...' before '(' token
#define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, __LINE__)
 ^
../../../opal/include/opal_config_bottom.h:389:52: error: expected declaration 
specifiers or '...' before '(' token
#define realloc(ptr, size) opal_realloc((ptr), (size), __FILE__, __LINE__)
^
In file included from base/mpool_base_frame.c:28:0:
/usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' 
before string constant
extern void *realloc (void *__ptr, size_t __size)
^
/usr/include/malloc.h:49:1: error: expected declaration specifiers or '...' 
before numeric constant
In file included from ../../../opal/include/opal_config.h:2750:0,
 from base/mpool_base_frame.c:21:
../../../opal/include/opal_config_bottom.h:393:33: error: expected declaration 
specifiers or '...' before '(' token
#define free(ptr) opal_free((ptr), __FILE__, __LINE__)
 ^
In file included from base/mpool_base_frame.c:28:0:
/usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' 
before string constant
extern void free (void *__ptr) __THROW;
^
/usr/include/malloc.h:53:1: error: expected declaration specifiers or '...' 
before numeric constant



From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Monday, August 04, 2014 10:09 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] opal_config_bottom.h question again

I believe the issue is actually in opal/util/malloc.h, Howard. I noticed this 
while looking around this weekend - someone included opal_config.h in the 
malloc.h file even though it explicitly says "DON'T DO THIS"  in that header 
file.

#ifndef OPAL_MALLOC_H
#define OPAL_MALLOC_H

#include "opal_config.h"
#include 

/*
 * THIS FILE CANNOT INCLUDE ANY OTHER OPAL HEADER FILES!!!
 *
 * It is included via .  Hence, it should not
 * include ANY other files, nor should it include "opal_config.h".
 *
 */

Don't know why someone did that, but you might see if it fixes your problem


On Aug 4, 2014, at 9:00 AM, Pritchard Jr., Howard 
> wrote:


Hi Folks,

As I said last week, I'm noticing now that on my opensuse 13.1 system and gcc 
4.8.1, when I do a fresh
checkout of trunk ompi and try to build, without any configure options,

mca_base_mpool_frame.c

does not compile.

The reason is there is a conflict in opal_config_bottom.h and the contents of 
malloc.h,
which for my system is pulled in by the preprocessor.

If I undefine HAVE_MALLOC_H in this file, the code compiles fine.  
Alternatively,
one can also move the malloc.h include prior to the opal_config.h include and 
things
work.  Alternatively, one can add 

Re: [OMPI devel] opal_config_bottom.h question again

2014-08-04 Thread Ralph Castain
I believe the issue is actually in opal/util/malloc.h, Howard. I noticed this 
while looking around this weekend - someone included opal_config.h in the 
malloc.h file even though it explicitly says "DON'T DO THIS"  in that header 
file.

#ifndef OPAL_MALLOC_H
#define OPAL_MALLOC_H

#include "opal_config.h"
#include 

/*
 * THIS FILE CANNOT INCLUDE ANY OTHER OPAL HEADER FILES!!!
 *
 * It is included via .  Hence, it should not
 * include ANY other files, nor should it include "opal_config.h".
 *
 */

Don't know why someone did that, but you might see if it fixes your problem


On Aug 4, 2014, at 9:00 AM, Pritchard Jr., Howard  wrote:

> Hi Folks,
>  
> As I said last week, I’m noticing now that on my opensuse 13.1 system and gcc 
> 4.8.1, when I do a fresh
> checkout of trunk ompi and try to build, without any configure options,
>  
> mca_base_mpool_frame.c
>  
> does not compile.
>  
> The reason is there is a conflict in opal_config_bottom.h and the contents of 
> malloc.h,
> which for my system is pulled in by the preprocessor.
>  
> If I undefine HAVE_MALLOC_H in this file, the code compiles fine.  
> Alternatively,
> one can also move the malloc.h include prior to the opal_config.h include and 
> things
> work.  Alternatively, one can add the OPAL_DISABLE_ENABLE_MEM_DEBUG define
> as in mpool_base_lookup.c , and the compile problem similarly goes away.
>  
> I’d like to check in a fix for this.  I’d prefer to just move the std include 
> files ahead
> of the opal_config.h include.  I’d like to do this today unless someone 
> objects.
>  
> I’m somewhat surprised I’m the only one seeing this though.
>  
> Howard
>  
>  
> -
> Howard Pritchard
> HPC-5
> Los Alamos National Laboratory
>  
>  
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15495.php



[OMPI devel] opal_config_bottom.h question again

2014-08-04 Thread Pritchard Jr., Howard
Hi Folks,

As I said last week, I'm noticing now that on my opensuse 13.1 system and gcc 
4.8.1, when I do a fresh
checkout of trunk ompi and try to build, without any configure options,

mca_base_mpool_frame.c

does not compile.

The reason is there is a conflict in opal_config_bottom.h and the contents of 
malloc.h,
which for my system is pulled in by the preprocessor.

If I undefine HAVE_MALLOC_H in this file, the code compiles fine.  
Alternatively,
one can also move the malloc.h include prior to the opal_config.h include and 
things
work.  Alternatively, one can add the OPAL_DISABLE_ENABLE_MEM_DEBUG define
as in mpool_base_lookup.c , and the compile problem similarly goes away.

I'd like to check in a fix for this.  I'd prefer to just move the std include 
files ahead
of the opal_config.h include.  I'd like to do this today unless someone objects.

I'm somewhat surprised I'm the only one seeing this though.

Howard


-
Howard Pritchard
HPC-5
Los Alamos National Laboratory




[OMPI devel] canceling buffered send request with pml/cm

2014-08-04 Thread Yossi Etigin
Hi,

Seems like it's impossible to cancel buffered sends with pml/cm.

>From one hand, pml/cm completes the buffered send immediately 
>(MCA_PML_CM_HVY_SEND_REQUEST_START):
if(OMPI_SUCCESS == ret &&   
 \
   sendreq->req_send.req_send_mode == MCA_PML_BASE_SEND_BUFFERED) { 
 \
sendreq->req_send.req_base.req_ompi.req_status.MPI_ERROR = 0;   
 \
ompi_request_complete(&(sendreq)->req_send.req_base.req_ompi, 
true); \
}

So, if the user is doing Bsend()/Cancel()/Wait()/Test_canceled(), the Wait() 
would be a no-op.
Therefore when mtl_cancel() was called, it had to either cancel/guarantee 
completion *immediately*, otherwise the return from Test_canceled would be 
undefined.
However, it's not always possible to cancel immediately, because need to make 
sure the peer has not matched it yet (fox example, with mtl mxm).

IMHO it's wrong for pml_cm to complete a buffered send immediately.
What do you think?

--Yossi


[OMPI devel] 1.8.2rc3 cosmetic issues in configure

2014-08-04 Thread Paul Hargrove
It looks like four instances of AC_MSG_CHECKING are missing an
AC_MSG_RESULT or have other configure macros improperly nested between the
two:

checking for epoll support... checking for epoll_ctl... yes
yes
checking for working epoll library interface... yes
yes

checking if user requested CMA build... checking --with-knem value...
simple ok (unspecified)

checking if user requested CMA build... checking if MCA component btl:vader
can compile... yes

checking orte configuration args... checking if MCA component dpm:orte can
compile... yes

-Paul


-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-04 Thread Paul Hargrove
I've identified the difference between the platform that does link libutil
and the one that does not.

1) libutil is linked (as an OMPI dependency) only on the working system:

Working system:
$ grep 'checking for .* LIBS' configure.out
checking for OPAL LIBS... -lm -lpciaccess -ldl
checking for ORTE LIBS... -lm -lpciaccess -ldl -ltorque
checking for OMPI LIBS... -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil

NON-working system:
$ grep 'checking for .* LIBS' configure.out
checking for OPAL LIBS... -lm -ldl
checking for ORTE LIBS... -lm -ldl -ltorque
checking for OMPI LIBS... -lm -ldl -ltorque

So, the working system that does link libutil is doing so as an OMPI
dependency.
However it is also needed for opal (only caller of openpty is
opal/util/open_pty.c).

2) Only the working system is building ROMIO:

Comparing the 'checking if * can compile' lines of configure output shows
only ONE difference:

 checking if MCA component fs:ufs can compile... yes
 checking if MCA component fs:pvfs2 can compile... no
 checking if MCA component io:ompio can compile... yes
-checking if MCA component io:romio can compile... no
+checking if MCA component io:romio can compile... yes
 checking if MCA component mpool:grdma can compile... yes
 checking if MCA component mpool:sm can compile... yes
 checking if MCA component mpool:udreg can compile... no

So, it appears that *if* ROMIO is configured in, then "-lutil" gets added
to OMPI_WRAPPER_EXTRA_LIBS.
This masks the fact that it is missing from OPAL_WRAPPER_EXTRA_LIBS.


I have confirmed that I can reproduce the static linking failure by adding
--disable-io-romio to the configure options of the system that worked
previously.

So, I update my report (and the email subject line) to:
   Static linking fails on Linux when not building ROMIO

-Paul



On Sun, Aug 3, 2014 at 6:22 PM, Paul Hargrove  wrote:

> Hmm,
>
> On a different Linux/x86-64 host things work as expected with '-lutil'
> linked explicitly:
>
> $ ./INST/bin/mpicc -showme BLD/examples/hello_c.c
> pgcc BLD/examples/hello_c.c
> -I/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/include
> -L/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib
> -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath
> -Wl,/opt/torque/4.2.7.h1/lib -Wl,-rpath -Wl,/opt/torque/4.2.7.h1/lib
> -Wl,-rpath
> -Wl,/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
> -L/scratch/scratchdirs/hargrove/OMPI/openmpi-1.8.2rc3-linux-x86_64-pgi-14.1/INST/lib
> -lmpi -lopen-rte -lopen-pal -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil
>
> Searching for relevant differences now...
>
> -Paul
>
>
> On Sun, Aug 3, 2014 at 4:58 PM, Paul Hargrove  wrote:
>
>>
>> I've configured the 1.8.2rc3 tarball with "--enable-static
>> --disable-shared" on a fairly standard Linux/x86-64 platform.  While there
>> are no problems on the same platform w/o these configure flags, with them I
>> cannot link any application codes.
>>
>> $ mpicc -ghello_c.c   -o hello_c
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib/libopen-pal.a(opal_pty.o):
>> In function `opal_openpty':
>> opal_pty.c:(.text+0x1): undefined reference to `openpty'
>>
>> I checked "make openpty" and the manpage says to link with '-lutil'.
>> The '-showme' does not show libutil:
>>
>> $ mpicc -showme hello_c.c
>> gcc hello_c.c
>> -I/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/include
>> -pthread -L/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>> -Wl,/usr/syscom/opt/torque/4.1.4/lib -Wl,-rpath
>> -Wl,/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
>> -Wl,--enable-new-dtags
>> -L/global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.8.2rc3-linux-x86_64-static/INST/lib
>> -lmpi -lopen-rte -lopen-pal -lm -ldl -ltorque -libverbs -lrdmacm
>>
>>
>> It looks like configure is doing the right thing on some level, but
>> failing to add '-lutil' to the appropriate list of libs
>> (OPAL_WRAPPER_EXTRA_LIBS?):
>>
>>
>> 
>> == Library and Function tests
>>
>> 
>> checking if we need -lutil for openpty... yes
>> checking for openpty... yes
>>
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>
>
>
> --
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

Re: [OMPI devel] 1.8.2rc3 now out

2014-08-04 Thread Gilles Gouaillardet
Fixed in r32409 : %d and %s were swapped in a MLERROR (printf like)

Gilles

On 2014/08/02 11:07, Gilles Gouaillardet wrote:
> Paul,
>
> about the second point :
> mmap is called with the MAP_FIXED flag, before the fix, the
> required address was not aligned on a page size and hence
> mmap failed.
> the mmap failure was immediatly handled, but for some reasons
> i did not fully investigate yet, this failure was not correctly propagated,
> leading to a SIGSEGV later in lmngr_register (if i remember correctly)
>
> i will add this to my todo list : investigate why the error is not correctly
> propagated and handled.
>
> Cheers,
>
> Gilles
>
> On Sat, Aug 2, 2014 at 6:05 AM, Paul Hargrove  wrote:
>
>> Regarding review of the coll/ml fix:
>>
>> While the fix Gilles worked out overnight proved sufficient on
>> Solaris/SPARC, Linux/PPC64 and Linux/IA64, I had two concerns:
>>
>> 1) As I already voiced on the list, I am concerned with the portability of
>> _SC_PAGESIZE vs _SC_PAGE_SIZE (vs get_pagesize()).
>>
>> 2) Though I have not tried to trace the code, the fact that fixing the
>> alignment prevents a SEGV strongly suggests that there was a mmap (or
>> something else sensitive to page size) call failing.  So, there should
>> probably be a check added for failure of that call to produce a cleaner
>> failure than SEGV.
>>
>> Just my USD 0.02.
>> -Paul
>>
>>
>> On Fri, Aug 1, 2014 at 6:39 AM, Ralph Castain  wrote:
>>
>>> Okay, I fixed those two and will release rc4 once the coll/ml fix has
>>> been reviewed. Thanks
>>>
>>> On Aug 1, 2014, at 2:46 AM, Mike Dubman  wrote:
>>>
>>> Also, latest commit into openib (origin/v1.8
>>> https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something:
>>>
>>> *11:45:01* + timeout -s SIGSEGV 3m 
>>> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/bin/mpirun 
>>> -np 8 -mca pml ob1 -mca btl self,openib 
>>> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/examples/hello_usempi*11:45:01*
>>>  
>>> --*11:45:01*
>>>  WARNING: There are more than one active ports on host 'hpctest', but 
>>> the*11:45:01* default subnet GID prefix was detected on more than one of 
>>> these*11:45:01* ports.  If these ports are connected to different physical 
>>> IB*11:45:01* networks, this configuration will fail in Open MPI.  This 
>>> version of*11:45:01* Open MPI requires that every physically separate IB 
>>> subnet that is*11:45:01* used between connected MPI processes must have 
>>> different subnet ID*11:45:01* values.*11:45:01* *11:45:01* Please see this 
>>> FAQ entry for more details:*11:45:01* *11:45:01*   
>>> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid*11:45:01*
>>>  *11:45:01* NOTE: You can turn off this warning by setting the MCA 
>>> parameter*11:45:01*   btl_openib_warn_default_gid_prefix to 
>>> 0.*11:45:01* 
>>> --*11:45:01*
>>>  
>>> --*11:45:01*
>>>  WARNING: No queue pairs were defined in the 
>>> btl_openib_receive_queues*11:45:01* MCA parameter.  At least one queue pair 
>>> must be defined.  The*11:45:01* OpenFabrics (openib) BTL will therefore be 
>>> deactivated for this run.*11:45:01* *11:45:01*   Local host: 
>>> hpctest*11:45:01* 
>>> --*11:45:01*
>>>  
>>> --*11:45:01*
>>>  At least one pair of MPI processes are unable to reach each other 
>>> for*11:45:01* MPI communications.  This means that no Open MPI device has 
>>> indicated*11:45:01* that it can be used to communicate between these 
>>> processes.  This is*11:45:01* an error; Open MPI requires that all MPI 
>>> processes be able to reach*11:45:01* each other.  This error can sometimes 
>>> be the result of forgetting to*11:45:01* specify the "self" BTL.*11:45:01* 
>>> *11:45:01*   Process 1 ([[55281,1],1]) is on host: hpctest*11:45:01*   
>>> Process 2 ([[55281,1],0]) is on host: hpctest*11:45:01*   BTLs attempted: 
>>> self*11:45:01* *11:45:01* Your MPI job is now going to abort; 
>>> sorry.*11:45:01* 
>>> --*11:45:01*
>>>  
>>> --*11:45:01*
>>>  MPI_INIT has failed because at least one MPI process is 
>>> unreachable*11:45:01* from another.  This *usually* means that an 
>>> underlying communication*11:45:01* plugin -- such as a BTL or an MTL -- has 
>>> either not loaded or not*11:45:01* allowed itself to be used.  Your MPI job 
>>> will now abort.*11:45:01* *11:45:01* You may wish to try to narrow down the 
>>> problem;*11:45:01* *11:45:01*  * Check the output of ompi_info to 

Re: [OMPI devel] OMPI devel] trunk warnings on x86

2014-08-04 Thread Gilles Gouaillardet
Paul and all,

before r32408, the environment/abort test from the ibm test suite
crashed with SIGSEGV.

there is no more crash after the fix :-)

that being said, i experience some (random) hangs on my VM :
--mca btl tcp,self => no hang
--mca btl sm,self or --mca btl vader,self => hang about 25% of the time
--mca btl scif,self => always hang

only the mpirun process remains and is hanging.

i will try to debug this, and i welcome any help !

Cheers,

Gilles

On 2014/08/04 11:57, Gilles Gouaillardet wrote:
> Paul,
>
> i confirm ampersand was missing and this was a bug
> /* a similar bug was fixed by Ralph in r32357 */
>
> i commited r32408 in order to fix these three bugs.
>
> i also took the liberty to replace the OMPI_CAST_RTE_NAME
> with an inline function (only in debug mode) in order to get a
> compiler warning on both 32 and 64 bits arch in this case :
>
> #if OPAL_ENABLE_DEBUG
> static inline orte_process_name_t *
> OMPI_CAST_RTE_NAME(opal_process_name_t * name);
> #else
> #define OMPI_CAST_RTE_NAME(a) ((orte_process_name_t*)(a))
> #endif
>
> Cheers,
>
> Gilles
>
> On 2014/08/03 14:49, Gilles GOUAILLARDET wrote:
>> Paul,
>>
>> imho, the root cause is a missing ampersand.
>>
>> I will double check this from tomorrow only
>>
>> Cheers,
>>
>> Gilles
>>
>> Ralph Castain  wrote:
>>> Arg - that raises an interesting point. This is a pointer to a 64-bit 
>>> number. Will uintptr_t resolve that problem on such platforms?
>>>
>>>
>>> On Aug 2, 2014, at 8:12 PM, Paul Hargrove  wrote:
>>>
>>>
>>> Looks like on a 32-bit platform a (uintptr_t) cast is desired in the 
>>> OMPI_CAST_RTE_NAME() macro.
>>>
>>>
>>> Warnings from current trunk tarball attributable to the missing case 
>>> include:
>>>
>>>
>>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:89:
>>>  warning: cast to pointer from integer of different size
>>>
>>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:97:
>>>  warning: cast to pointer from integer of different size
>>>
>>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/mca/pml/bfo/pml_bfo_failover.c:1417:
>>>  warning: cast to pointer from integer of different size
>>>
>>>
>>> -Paul
>>>
>>>
>>> -- 
>>>
>>> Paul H. Hargrove  phhargr...@lbl.gov
>>>
>>> Future Technologies Group
>>>
>>> Computer and Data Sciences Department Tel: +1-510-495-2352
>>>
>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15481.php
>>>
>>>
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15484.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15489.php