Re: [OMPI devel] [EXTERNAL] Open MPI v4.0.1: Process is hanging inside MPI_Init() when debugged with TotalView

2019-11-12 Thread Larry Baker via devel
"allowing us to weakly synchronize two threads" concerns me if the 
synchronization is important or must be reliable.  I do not understand how 
volatile alone provides reliable synchronization without a mechanism to order 
visible changes to memory.  If the flag(s) in question are suppposed to 
indicate some state has changed in this weakly synchronized behavior, without 
proper memory barriers, there is no guarantee that memory changes will be 
viewed by the two threads in the same order they were issued.  It is quite 
possible that the updated state that is flagged as being "good" or "done" or 
whatever will not yet be visible across multiple cores, even though the updated 
flag indicator may have become visible.  Only if the flag itself is the data 
can this work, it seems to me.  If it is a flag that something has been 
completed, volatile is not sufficient to guarantee the corresponding changes in 
state will be visible.  I have had such experience from code that used volatile 
as a proxy for memory barriers.  I was told "it has never been a problem".  
Rare events can, and do, occur.  In my case, it did after over 3 years running 
the code without interruption.  I doubt anyone had ever run the code for such a 
long sample interval.  We found out because we missed recording an important 
earthquake a week after the race condition was tripped.  Murphy's law triumphs 
again. :)

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



> On 12 Nov 2019, at 1:05:31 PM, George Bosilca via devel 
>  wrote:
> 
> If the issue was some kind of memory consistently between threads, then 
> printing that variable in the context of the debugger would show the value of 
> debugger_event_active being false.
> 
> volatile is not a memory barrier, it simply forces a load for each access of 
> the data, allowing us to weakly synchronize two threads, as long as we dot 
> expect the synchronization to be immediate.
> 
> Anyway, good to see that the issue has been solved.
> 
>   George.
> 
> 
> On Tue, Nov 12, 2019 at 2:25 PM John DelSignore via devel 
> mailto:devel@lists.open-mpi.org>> wrote:
> Hi Austen,
> 
> Thanks for the reply. What I am seeing is consistent with your thought, in 
> that when I see the hang, one or more processes did not have a flag updated. 
> I don't understand how the Open MPI code works well enough to say if it is a 
> memory barrier problem or not. It almost looks like a event delivery or 
> dropped event problem to me.
> The place in the MPI_init() code where the MPI processes hang and the number 
> of "hung" processes seems to vary from run to run. In some cases the 
> processes are waiting for an event or waiting for a fence (whatever that is).
> I did the following run today, which shows that it can hang waiting for an 
> event that apparently was not generated or was dropped:
> 
> Started TV on mpirun: totalview -args mpirun -np 4 ./cpi
> Ran the mpirun process until it hit the MPIR_Breakpoint() event.
> TV attached to all four of the MPI processes and left all five processes 
> stopped.
> Continued all of the processes/threads and let them run freely for about 60 
> seconds. They should have run to completion in that amount of time.
> Halted all of the processes. I included an aggregated backtrace of all of the 
> processes below.
> In this particular run, all four MPI processes were waiting in 
> ompi_rte_wait_for_debugger() in rte_orte_module.c at line 196, which is:
> 
> /* let the MPI progress engine run while we wait for debugger release 
> */
> OMPI_WAIT_FOR_COMPLETION(debugger_event_active);
> 
> I don't know how that is supposed to work, but I can clearly see that 
> debugger_event_active was true in all of the processes, even though TV set 
> MPIR_debug_gate to 1:
> d1.<> f {2.1 3.1 4.1 5.1} p debugger_event_active
> Thread 2.1:
>  debugger_event_active = true (1)
> Thread 3.1:
>  debugger_event_active = true (1)
> Thread 4.1:
>  debugger_event_active = true (1)
> Thread 5.1:
>  debugger_event_active = true (1)
> d1.<> f {2.1 3.1 4.1 5.1} p MPIR_debug_gate
> Thread 2.1:
>  MPIR_debug_gate = 0x0001 (1)
> Thread 3.1:
>  MPIR_debug_gate = 0x0001 (1)
> Thread 4.1:
>  MPIR_debug_gate = 0x0001 (1)
> Thread 5.1:
>  MPIR_debug_gate = 0x0001 (1)
> d1.<> 
> 
> I think the _release_fn() function in rte_orte_module.c is supposed to set 
> debugger_event_active to false, but that apparently did not happen in this 
> case. So, AFAICT, the reason debugger_event_active would not be set to false 
> is that the event was never delivered, so the _release_fn() function was 
> never called. If that's the case, then the lack of a memory barrier is 
> probably a mo

Re: [OMPI devel] 3.1.2: Datatype errors and segfault in MPI_Allgatherv

2018-11-01 Thread Larry Baker via devel
Things that read like they should be unsigned look suspicious to me:

nbElems -909934592
count -1819869184

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov




> On Nov 1, 2018, at 10:34 PM, Ben Menadue  wrote:
> 
> Hi,
> 
> I haven’t heard back from the user yet, but I just put this example together 
> which works on 1, 2, and 3 ranks but fails for 4. Unfortunately it needs a 
> fair amount of memory, about 14.3GB per process, so I was running it with 
> -map-by ppr:1:node.
> 
> It doesn’t fail with the segfault as the user’s code does, but it does 
> SIGABRT:
> 
> 16:12 bjm900@r4320 MPI_TESTS > mpirun -mca pml ob1 -mca coll ^fca,hcoll 
> -map-by ppr:1:node -np 4 ./a.out
> [r4450:11544] ../../../../../opal/datatype/opal_datatype_pack.h:53
>   Pointer 0x2bb7ceedb010 size 131040 is outside 
> [0x2b9ec63cb010,0x2bad1458b010] for
>   base ptr 0x2b9ec63cb010 count 1 and data 
> [r4450:11544] Datatype 0x145fe90[] size 3072000 align 4 id 0 length 7 
> used 6
> true_lb 0 true_ub 6144000 (true_extent 6144000) lb 0 ub 6144000 
> (extent 6144000)
> nbElems -909934592 loops 4 flags 104 (committed )-c-GD--[---][---]
>contain OPAL_FLOAT4:* 
> --C[---][---]OPAL_LOOP_S 192 times the next 2 elements extent 
> 8000
> --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0xaba95 
> (4608000) blen 0 extent 4 (size 8000)
> --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
> 4608000 size of data 8000
> --C[---][---]OPAL_LOOP_S 192 times the next 2 elements extent 
> 8000
> --C---P-D--[---][---]OPAL_FLOAT4 count 2000 disp 0x0 (0) blen 0 
> extent 4 (size 8000)
> --C[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
> 0 size of data 8000
> ---G---[---][---]OPAL_LOOP_E prev 6 elements first elem displacement 
> 4608000 size of data 655228928
> Optimized description 
> -cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0xaba95 
> (4608000) blen 1 extent 1 (size 1536000)
> -cC---P-DB-[---][---] OPAL_UINT1 count -1819869184 disp 0x0 (0) blen 1 
> extent 1 (size 1536000)
> ---G---[---][---]OPAL_LOOP_E prev 2 elements first elem displacement 
> 4608000 
> [r4450:11544] *** Process received signal ***
> [r4450:11544] Signal: Aborted (6)
> [r4450:11544] Signal code:  (-6)
> 
> Cheers,
> Ben
> 
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] [2.1.2rc3] libevent SEGV on FreeBSD/amd64

2017-08-30 Thread Larry Baker
Paul,

> (gdb) print base->nactivequeues


seems like an extraordinarily large number to me.  I don't know what the 
implications are of the --enable-debug clang option is.  Any chance the 
SEGFAULT is a debugging trap when an uninitialized value is encountered?

The other thought I had is an alignment trap if, for example, nactivequeues is 
a 64-bit int but is not 64-bit aligned.  As far as I can tell, nactivequeues is 
a plain int.  But, what that is on FreeBSD/amd64, I do not know.

Should there be more information in dmesg or a system log file with the trap 
code so you can identify whether it is an instruction fetch (VERY unlikely), an 
operand fetch, or a store that caused the trap?

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



> On 30 Aug 2017, at 3:17:05 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
> I am testing the 2.1.2rc3 tarball on FreeBSD-11.1, configured with
>--prefix=[...] --enable-debug CC=clang CXX=clang++ --disable-mpi-fortran 
> --with-hwloc=/usr/local
> 
> The CC/CXX setting are to use the system default compilers (rather than 
> gcc/g++ in /usr/local/bin).
> The --with-hwloc is to avoid issue #3992 
> <https://github.com/open-mpi/ompi/issues/3992> (though I have not determined 
> if that impacts this RC).
> 
> When running ring_c I get a SEGV from orterun, for which a gdb backtrace is 
> given below.
> The one surprising thing (highlighted) in the backtrace is that both the RHS 
> and LHS of the assignment appear to be valid memory locations.
> So, if the backtrace is accurate then I am at a loss as to why a SEGV occurs.
> 
> -Paul
> 
> 
> Program terminated with signal 11, Segmentation fault.
> [...]
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> 1779ev->ev_pri = base->nactivequeues / 2;
> (gdb) print base->nactivequeues
> $3 = 106201992
> (gdb) print ev->ev_pri
> $4 = 0 '\0'
> (gdb) where
> #0  opal_libevent2022_event_assign (ev=0x8065482c0, base= out>, fd=,
> events=2, callback=, arg=0x0)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/event/libevent2022/libevent/event.c:1779
> #1  0x0008062e1fd2 in pmix_start_progress_thread ()
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/util/progress_threads.c:83
> #2  0x0008063047e4 in PMIx_server_init (module=0x806545be8, 
> info=0x802e16a00, ninfo=2)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c:310
> #3  0x0008062c12f6 in pmix1_server_init (module=0x800b106a0, 
> info=0x7fffe290)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/opal/mca/pmix/pmix112/pmix1_server_south.c:140
> #4  0x000800889f43 in pmix_server_init ()
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/orted/pmix/pmix_server.c:261
> #5  0x000803e22d87 in rte_init ()
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:666
> #6  0x00080084a45e in orte_init (pargc=0x7fffe988, 
> pargv=0x7fffe980, flags=4)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/runtime/orte_init.c:226
> #7  0x004046a4 in orterun (argc=7, argv=0x7fffea18)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/orterun.c:831
> #8  0x00403bc2 in main (argc=7, argv=0x7fffea18)
> at 
> /home/phargrov/OMPI/openmpi-2.1.2rc3-freebsd11-amd64/openmpi-2.1.2rc3/orte/tools/orterun/main.c:13
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov 
> <mailto:phhargr...@lbl.gov>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Missing support for 2 types in MPI_Sizeof()

2016-04-15 Thread Larry Baker
Be careful what you wish for.

I remember looking at this issue a while ago, but I can't remember why or how I 
ran into it.  I do remember convincing myself that the MPI standard was correct 
in restricting SIZEOF to numeric types.  For one thing, a character variable 
type is a string container in Fortran, while in C it is a single character.  
What would be the correct interpretation for SIZEOF in Fortran?  The maximum 
length?  The TRIM'd length?  What would be the correct interpretation in C?  1? 
 strlen()?  strlen()+1?  The size of a character itself may not be the same on 
either end of an MPI connection if, for example, one program is compiled using 
8-bit characters and the other is compiled using uses 16-bit characters.  
Interchanging strings opens up a can of worms.  As far as logical, there is no 
C logical type.  In Fortran, while the size of a logical variable is specified 
as a "storage unit" (the same as an integer), the representation of true and 
false is unspecified, and, thus, is processor dependent.  On VAXes, only a 
single bit matters; the instruction set supports this logical data type.  (In 
C, thought there is no logical data type, the C standard does specify 0=false 
and 1=true for the result of relational and logical operators and 0=false and 
not 0=true for logical operator operands.)  This means it is problematic to 
exchange logical data between Fortran programs (C makes no sense, since there 
is no logical data type) when different compilers (part of what Fortran calls a 
processor) are used.

Better to find out what discussions took place in the MPI standards committee 
before adding extensions to SIZEOF.  They may very well have good reasons to 
avoid character and logical data, as I concluded. 

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 15 Apr 2016, at 5:34 AM, Jeff Squyres (jsquyres) wrote:

> Nadia --
> 
> I believe that the character and logical types are not in this script already 
> because the description of MPI_SIZEOF in MPI-3.1 says that the input choice 
> buffer parameter is:
> 
> IN x a Fortran variable of numeric intrinsic type (choice)
> 
> As I understand it (and my usual disclaimer here: I am *not* a Fortran 
> expert), CHARACTER and LOGICAL types are not numeric in Fortran.
> 
> However, we could add such interfaces as an extension.
> 
> I just checked MPICH 3.2, and they *do* include MPI_SIZEOF interfaces for 
> CHARACTER and LOGICAL, but they are missing many of the other MPI_SIZEOF 
> interfaces that we have in OMPI.  Meaning: OMPI and MPICH already diverge 
> wildly on MPI_SIZEOF.  :-\
> 
> I guess I don't have a strong opinion here.  If you file a PR for this patch, 
> I won't object.  :-)
> 
> 
>> On Apr 15, 2016, at 3:22 AM, DERBEY, NADIA <nadia.der...@atos.net> wrote:
>> 
>> Hi,
>> 
>> The following trivial example doesn't compile because of 2 missing types 
>> in the MPI_SIZEOF subroutines (in mpi_sizeof.f90).
>> 
>> [derbeyn@btp0 test]$ cat mpi_sizeof.f90
>>  program main
>> !use mpi
>>  include 'mpif.h'
>> 
>>  integer ierr, sz, mpisize
>>  real r1
>>  integer i1
>>  character ch1
>>  logical l1
>> 
>>  call MPI_INIT(ierr)
>>  call MPI_SIZEOF(r1, sz, ierr)
>>  call MPI_SIZEOF(i1, sz, ierr)
>>  call MPI_SIZEOF(l1, sz, ierr)
>>  call MPI_SIZEOF(ch1, sz, ierr)
>>  call MPI_FINALIZE(ierr)
>> 
>>  end
>> [derbeyn@btp0 test]$ mpif90 -o mpi_sizeof mpi_sizeof.f90
>> mpi_sizeof.f90(14): error #6285: There is no matching specific 
>> subroutine for this generic subroutine call.   [MPI_SIZEOF]
>>  call MPI_SIZEOF(ch1, sz, ierr)
>> -^
>> mpi_sizeof.f90(15): error #6285: There is no matching specific
>> subroutine for this generic subroutine call.   [MPI_SIZEOF]
>>  call MPI_SIZEOF(l1, sz, ierr)
>> -^
>> compilation aborted for mpi_sizeof.f90 (code 1)
>> 
>> 
>> This problem happens both on master and v2.x. The following patch seems
>> to solve the issue:
>> 
>> diff --git a/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
>> b/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
>> index 5ea3dca3..a2a99924 100755
>> --- a/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
>> +++ b/ompi/mpi/fortran/base/gen-mpi-sizeof.pl
>> @@ -145,6 +145,9 @@ sub generate {
>>   # Main
>> 
>> #
>> 
>> +queue_sub("character", "char", "character_kinds");
>> +queue_sub("logical", "logical", "log

Re: [OMPI devel] mpif.h on Intel build when run with OMPI_FC=gfortran

2016-03-03 Thread Larry Baker
Dave,

Both Gilles and Chris raise important points.  You really cannot expect to mix 
modules from two different Fortran compilers in a single executable.  There is 
no requirement placed on a compiler by the Fortran standard for what object 
language it should use, how the information in modules is made available across 
compilation units, or the procedure calling conventions.  This makes me wonder, 
as you do, what the point is of the OMPI_CC and OMPI_FC environment variables?  
I do think that Intel has tried to make their objects interoperable with GCC 
objects.  That is a link-time issue.  You are encountering compile-time issues. 
 Gilles says whatever mpif-sizeof.h was intended to define, it cannot be done 
in gfortran.  Even if mpif-sizeof.h generated for an Intel compiler was 
standard-conforming (so the errors you encountered are not show stoppers), I'm 
not sure you would be able to get past the incompatibility between the internal 
formats used by each compiler to store module definitions and declarations for 
later USE by another compilation unit.  I think your expectations cannot be 
fulfilled because of the compilers, not because of OpenMPI.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 3 Mar 2016, at 6:39 PM, Dave Turner wrote:

> Gilles,
> 
> I don't see the point of having the OMPI_CC and OMPI_FC environment
> variables at all if you're saying that we shouldn't expect them to work.  I 
> actually do think they work fine if you do a GNU build and use them to
> specify the Intel compilers.  I also think it works fine when you do an
> Intel build and compile with gcc.  So to me it just looks like that one
> include file is the problem.
> 
>   Dave
> 
> On Thu, Mar 3, 2016 at 8:02 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote:
> Dave,
> 
> you should not expect anything when mixing Fortran compilers
> (and to be on the safe side, you might not expect much when mixing C/C++ 
> compilers too,
> for example, if you built ompi with intel and use gcc for your app, gcc might 
> complain about unresolved symbols from the intel runtime)
> 
> if you compile OpenMPI with gfortran 4.8.5, the automatically generated 
> mpif-sizeof.h contains
> 
> ! Sad panda.
> !
> ! This compiler does not support the Right Stuff to enable MPI_SIZEOF.
> ! Specifically: we need support for the INTERFACE keyword,
> ! ISO_FORTRAN_ENV, and the STORAGE_SIZE() intrinsic on all types.
> ! Apparently, this compiler does not support both of those things, so
> ! this file will be (effecitvely) blank (i.e., we didn't bother
> ! generating the necessary stuff for MPI_SIZEOF because the compiler
> ! doesn't support
> ! it).
> !
> ! If you want support for MPI_SIZEOF, please use a different Fortran
> ! compiler to build Open MPI.
> 
> intel fortran compilers have the right stuff, so mpif-sizeof.h is usable, and 
> you get something very different.
> 
> Cheers,
> 
> Gilles
> 
> 
> On 3/4/2016 10:17 AM, Dave Turner wrote:
>> 
>>  My understanding is that OpenMPI built with either Intel or
>> GNU compilers should be able to use the other compilers using the
>> OMPI_CC and OMPI_FC environmental variables.
>>  For OpenMPI-1.8.8 built with Intel compilers, if you try to
>> compile any code that includes mpif.h using OMPI_FC=gfortran it
>> fails.  The Intel build creates mpi-sizeof.h that dimensions
>> arrays to more than 6 dimensions which gfortran cannot handle.
>> The example below illustrates this.
>>  I wasn't able to find any other reports like this on the
>> web, and I don't see any way of specifying a path to an alternate
>> mpif.h include file.  This looks to be a bug to me, but please let
>> me know if I missed a config flag somewhere.
>> 
>>Dave Turner
>> 
>> 
>> 
>> Selene cat bugtest.F
>> ! Program to illustrate bug when OpenMPI is compiled with Intel
>> !compilers but run using OMPI_FC=gfortran.
>> 
>>   PROGRAM BUGTEST
>> 
>>   INCLUDE "mpif.h"
>> 
>>   END
>> Selene cat go
>> #!/bin/bash
>> 
>> 
>> echo "Compile test using default ifort"
>> 
>> mpifort --version
>> mpifort bugtest.F -o bugtest_ifort
>> 
>> 
>> echo "Compile test using gfortan when OpenMPI was compiled with ifort/icc"
>> 
>> export OMPI_FC=gfortran
>> 
>> mpifort --version
>> mpifort bugtest.F -o bugtest_gfortran
>> 
>> 
>> Selene ./go
>> Compile test using default ifort
>> ifort (IFORT) 15.0.3 20150407
>> Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.
>> 
>> Compile test 

Re: [OMPI devel] mpif.h on Intel build when run with OMPI_FC=gfortran

2016-03-03 Thread Larry Baker
Gilles,

Before anyone takes my opinion as gospel, I should note that I only checked my 
copies of the Fortran 90 Handbook (Adams et al.) and the Fortran 2003 Handbook 
(Adams et al.).  If more than seven dimensions is permitted by Fortran 2008 or 
Fortran 15, I stand corrected.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 3 Mar 2016, at 6:11 PM, Gilles Gouaillardet wrote:

> Larry,
> 
> currently, OpenMPI generate mpif-sizeof.h with up to 15 dimensions with intel 
> compilers, but up to 7 dimensions with "recent" gcc (for example gcc 5.2 and 
> higher)
> 
> so i guess the logic behind this is "give the compiler all it can handle", so 
> if intel somehow "extended" the standard to support up to 15 dimensions,  
> then OpenMPI generates mpif-sizeof.h with up to 15 dimensions.
> /* otherwise, you could use 10 dimensions arrays in your application, as long 
> as they are not an argument of MPI_SIZEOF */
> 
> Jeff, can you please comment on that ?
> 
> Cheers,
> 
> Gilles
> 
> On 3/4/2016 10:43 AM, Larry Baker wrote:
>> I have never tried to mix compilers like Dave mentions.  In any event, the 
>> Fortran standard specifies no more than seven dimensions are allowed in an 
>> array declaration.  I'm puzzled why OpenMPI would generate code that 
>> violates the Fortran standard?
>> 
>> Larry Baker
>> US Geological Survey
>> 650-329-5608
>> ba...@usgs.gov
>> 
>> 
>> 
>> On 3 Mar 2016, at 5:17 PM, Dave Turner wrote:
>> 
>>> 
>>>  My understanding is that OpenMPI built with either Intel or
>>> GNU compilers should be able to use the other compilers using the
>>> OMPI_CC and OMPI_FC environmental variables.
>>>  For OpenMPI-1.8.8 built with Intel compilers, if you try to
>>> compile any code that includes mpif.h using OMPI_FC=gfortran it
>>> fails.  The Intel build creates mpi-sizeof.h that dimensions
>>> arrays to more than 6 dimensions which gfortran cannot handle.
>>> The example below illustrates this.
>>>  I wasn't able to find any other reports like this on the
>>> web, and I don't see any way of specifying a path to an alternate
>>> mpif.h include file.  This looks to be a bug to me, but please let
>>> me know if I missed a config flag somewhere.
>>> 
>>>Dave Turner
>>> 
>>> 
>>> 
>>> Selene cat bugtest.F
>>> ! Program to illustrate bug when OpenMPI is compiled with Intel
>>> !compilers but run using OMPI_FC=gfortran.
>>> 
>>>   PROGRAM BUGTEST
>>> 
>>>   INCLUDE "mpif.h"
>>> 
>>>   END
>>> Selene cat go
>>> #!/bin/bash
>>> 
>>> 
>>> echo "Compile test using default ifort"
>>> 
>>> mpifort --version
>>> mpifort bugtest.F -o bugtest_ifort
>>> 
>>> 
>>> echo "Compile test using gfortan when OpenMPI was compiled with ifort/icc"
>>> 
>>> export OMPI_FC=gfortran
>>> 
>>> mpifort --version
>>> mpifort bugtest.F -o bugtest_gfortran
>>> 
>>> 
>>> Selene ./go
>>> Compile test using default ifort
>>> ifort (IFORT) 15.0.3 20150407
>>> Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.
>>> 
>>> Compile test using gfortan when OpenMPI was compiled with ifort/icc
>>> GNU Fortran (Gentoo 4.9.3 p1.4, pie-0.6.4) 4.9.3
>>> Copyright (C) 2015 Free Software Foundation, Inc.
>>> 
>>> GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
>>> You may redistribute copies of GNU Fortran
>>> under the terms of the GNU General Public License.
>>> For more information about these matters, see the file named COPYING
>>> 
>>> mpif-sizeof.h:75.48:
>>> Included at mpif.h:61:
>>> Included at bugtest.F:6:
>>> 
>>>   COMPLEX(REAL128), DIMENSION(1,1,1,1,1,1,1,*)::x
>>> 1
>>> Error: Array specification at (1) has more than 7 dimensions
>>> mpif-sizeof.h:82.48:
>>> Included at mpif.h:61:
>>> Included at bugtest.F:6:
>>> 
>>>   COMPLEX(REAL128), DIMENSION(1,1,1,1,1,1,1,1,*)::x
>>> 1
>>> Error: Array specification at (1) has more than 7 dimensions
>>> mpif-sizeof.h:89.48:
>>> Included at mpif.h:61:
>>> Included at bugtest.F:6:
>>> 
>>> 

Re: [OMPI devel] mpif.h on Intel build when run with OMPI_FC=gfortran

2016-03-03 Thread Larry Baker
I have never tried to mix compilers like Dave mentions.  In any event, the 
Fortran standard specifies no more than seven dimensions are allowed in an 
array declaration.  I'm puzzled why OpenMPI would generate code that violates 
the Fortran standard?

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 3 Mar 2016, at 5:17 PM, Dave Turner wrote:

> 
>  My understanding is that OpenMPI built with either Intel or
> GNU compilers should be able to use the other compilers using the
> OMPI_CC and OMPI_FC environmental variables.
>  For OpenMPI-1.8.8 built with Intel compilers, if you try to
> compile any code that includes mpif.h using OMPI_FC=gfortran it
> fails.  The Intel build creates mpi-sizeof.h that dimensions
> arrays to more than 6 dimensions which gfortran cannot handle.
> The example below illustrates this.
>  I wasn't able to find any other reports like this on the
> web, and I don't see any way of specifying a path to an alternate
> mpif.h include file.  This looks to be a bug to me, but please let
> me know if I missed a config flag somewhere.
> 
>Dave Turner
> 
> 
> 
> Selene cat bugtest.F
> ! Program to illustrate bug when OpenMPI is compiled with Intel
> !compilers but run using OMPI_FC=gfortran.
> 
>   PROGRAM BUGTEST
> 
>   INCLUDE "mpif.h"
> 
>   END
> Selene cat go
> #!/bin/bash
> 
> 
> echo "Compile test using default ifort"
> 
> mpifort --version
> mpifort bugtest.F -o bugtest_ifort
> 
> 
> echo "Compile test using gfortan when OpenMPI was compiled with ifort/icc"
> 
> export OMPI_FC=gfortran
> 
> mpifort --version
> mpifort bugtest.F -o bugtest_gfortran
> 
> 
> Selene ./go
> Compile test using default ifort
> ifort (IFORT) 15.0.3 20150407
> Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.
> 
> Compile test using gfortan when OpenMPI was compiled with ifort/icc
> GNU Fortran (Gentoo 4.9.3 p1.4, pie-0.6.4) 4.9.3
> Copyright (C) 2015 Free Software Foundation, Inc.
> 
> GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
> You may redistribute copies of GNU Fortran
> under the terms of the GNU General Public License.
> For more information about these matters, see the file named COPYING
> 
> mpif-sizeof.h:75.48:
> Included at mpif.h:61:
> Included at bugtest.F:6:
> 
>   COMPLEX(REAL128), DIMENSION(1,1,1,1,1,1,1,*)::x
> 1
> Error: Array specification at (1) has more than 7 dimensions
> mpif-sizeof.h:82.48:
> Included at mpif.h:61:
> Included at bugtest.F:6:
> 
>   COMPLEX(REAL128), DIMENSION(1,1,1,1,1,1,1,1,*)::x
> 1
> Error: Array specification at (1) has more than 7 dimensions
> mpif-sizeof.h:89.48:
> Included at mpif.h:61:
> Included at bugtest.F:6:
> 
>   COMPLEX(REAL128), DIMENSION(1,1,1,1,1,1,1,1,1,*)::x
> 1
> Error: Array specification at (1) has more than 7 dimensions
> 
> [ More of the same errors have been clipped ]
> 
> 
> -- 
> Work: davetur...@ksu.edu (785) 532-7791
>  2219 Durland, Manhattan KS  66502
> Home:drdavetur...@gmail.com
>   cell: (785) 770-5929
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/03/18671.php



Re: [OMPI devel] 16 byte real in Fortran

2015-10-14 Thread Larry Baker
The INTEGER*n, LOGICAL*n, REAL*n, etc., syntax has never been legal Fortran.  
Fortran originally had only INTEGER, REAL, DOUBLE PRECISION, and COMPLEX 
numeric types.  Fortran 90 added the notion of a KIND of numeric, but left 
unspecified the mapping of numeric KINDs to processor-specific storage.  KIND 
can be thought of as an opaque identifier.  There is no requirement, for 
example that KIND n means a variable occupies n bytes of storage, though this 
is commonly done.  (As is the association of KIND=1 to REAL and KIND=2 to 
DOUBLE PRECISION.)  Instead, the language provides portable means of specifying 
the desired behavior of an available KIND, such as digits of precision.  
Unfortunately, when marshalling data for interchange, bits matter—the number 
and their meaning.  High-level languages don't support such concepts very well. 
Starting  with C99 (Section 7.18.1), C forces the compiler implementation to 
define macros for supported integer widths (in bits).  However, like Fortran, 
there is no requirement that any exact number of bits be supported (Section 
7.18.1.1); the standard only requires integer types with a minimum of 8, 16, 
32, and 64 bits (Section 7.18.1.2).  Nothing is said at all about 
floating-point data types and the correspondence with the integer types.  This 
is what APIs like OpenMPI have to struggle with in the real world.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 14 Oct 2015, at 3:38 PM, Jeff Squyres (jsquyres) wrote:

> On Oct 14, 2015, at 5:53 PM, Vladimír Fuka <vladimir.f...@gmail.com> wrote:
>> 
>>> As that ticket notes if REAL*16 <> long double Open MPI should be
>>> disabling redutions on MPI_REAL16. I can take a look and see if I can
>>> determine why that is not working as expected.
>> 
>> Does it really need to be just disabled when the `real(real128)` is
>> actually equivalent to c_long_double? Wouldn't making the explicit
>> interfaces to MPI_Send and others to accept `real(real128)` make more
>> sense? As I wrote in the stackoverflow post, the MPI standard (3.1,
>> pages 628 and 674) is not very clear if MPI_REAL16 corresponds to
>> real*16 or real(real128) if these differ, but making it correspond to
>> real(real128) might be reasonable.
> 
> As I understand it, real*16 is not a real type -- it's a commonly-used type 
> and supported by many (all?) compilers, but it's not actually defined in the 
> Fortran spec.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/10/18170.php



Re: [OMPI devel] Odd master build failure with Studio 12.4 on Linux w/ -m32

2015-02-27 Thread Larry Baker
Gack!  Can't type.  compiler definition (FF, FC, LD) should have been compiler 
definition (CC, FC, LD).

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 27 Feb 2015, at 10:39 AM, Larry Baker wrote:

> On 27 Feb 2015, at 10:14 AM, Jeff Squyres (jsquyres) wrote:
> 
>> Yes, you do need to specify -m32 in CFLAGS and --with-wrapper*, because you 
>> may well want to build OMPI with one set of flags and build MPI applications 
>> with a different set of flags.  Hence, the wrappers don't contain all the 
>> CFLAGS used to build OMPI, for example.
> 
> 
> I think of -m32 and -m64 as really selecting different compilers.  (Which 
> might actually be true under the covers.)  When I use the -m flag, I 
> generally add it to the compiler definition (FF, FC, LD), not the XXFLAGS.  
> If one follows that convention, will OpenMPI always pass the -m flag on to 
> the wrapper scripts so the addition of --with-wrapper* is not necessary?  It 
> would have to handle the embedded blank properly, which may be tricky.
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
> 



Re: [OMPI devel] Odd master build failure with Studio 12.4 on Linux w/ -m32

2015-02-27 Thread Larry Baker
On 27 Feb 2015, at 10:14 AM, Jeff Squyres (jsquyres) wrote:

> Yes, you do need to specify -m32 in CFLAGS and --with-wrapper*, because you 
> may well want to build OMPI with one set of flags and build MPI applications 
> with a different set of flags.  Hence, the wrappers don't contain all the 
> CFLAGS used to build OMPI, for example.


I think of -m32 and -m64 as really selecting different compilers.  (Which might 
actually be true under the covers.)  When I use the -m flag, I generally add it 
to the compiler definition (FF, FC, LD), not the XXFLAGS.  If one follows that 
convention, will OpenMPI always pass the -m flag on to the wrapper scripts so 
the addition of --with-wrapper* is not necessary?  It would have to handle the 
embedded blank properly, which may be tricky.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



Re: [OMPI devel] Fortran issue

2015-02-20 Thread Larry Baker
Good grins.  Thanks Paul.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 20 Feb 2015, at 11:49 AM, Paul Hargrove wrote:

> 
>INTEGER JEFF(3)
>DATA JEFF/4HMR. ,4HFORT,3HRAN/
> 
> If you can understand that, you should probably pretend you can't :-)
> 
> -Paul [who has actually used Computed GO TO and Arithmetic IF]
> 
> On Fri, Feb 20, 2015 at 11:28 AM, Larry Baker <ba...@usgs.gov> wrote:
> Excellent, Mr. Fortran.  Thank you.
> 
> By the way, I meant to write Branch ON Low Bit Set/Clear.
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
> 
> 
> 
> On 20 Feb 2015, at 11:22 AM, Jeff Squyres (jsquyres) wrote:
> 
>> On Feb 20, 2015, at 2:12 PM, Larry Baker <ba...@usgs.gov> wrote:
>>> 
>>> Beware, this has/may not always be the case.  This is due to C's historical 
>>> confusion/misuse of integers as boolean data types.  On VAX hardware, the 
>>> low bit was the only significant part of a Fortran LOGICAL data type, owing 
>>> to the architectural support (Branch of Low Bit Set/Clear) for the low bit 
>>> in a status word meaning success/failure.  I doubt anyone uses VAXes and 
>>> MPI, so this is not likely to cause users problems.
>> 
>> Note that this comment was referring to two things:
>> 
>> 1. 0/1 array index issues
>> 2. .true./.false. issues
>> 
>> We actually check for the value of .true. in configure, and use that 
>> everywhere.  I believe this particular portion of the code simply looks for 
>> .false.==(C int)0, and .true. is anything else.  That was deemed good enough 
>> because this portion of the code is simply *checking* for true/false.  Where 
>> we *assign* true/false in the Fortran boolean sense, we use the value 
>> determined by configure.
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/02/17012.php
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/02/17013.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/02/17014.php



Re: [OMPI devel] Fortran issue

2015-02-20 Thread Larry Baker
On 20 Feb 2015, at 3:09 AM, Gilles Gouaillardet wrote:

> George,
> 
> this is correctly handled in ompi_testany_f :
> 
> /* Increment index by one for fortran conventions.  Note that
>all Fortran compilers have FALSE==0; we just need to check
>for any nonzero value (because TRUE is not always 1) */
> 
> Cheers,
> 
> Gilles


Beware, this has/may not always be the case.  This is due to C's historical 
confusion/misuse of integers as boolean data types.  On VAX hardware, the low 
bit was the only significant part of a Fortran LOGICAL data type, owing to the 
architectural support (Branch of Low Bit Set/Clear) for the low bit in a status 
word meaning success/failure.  I doubt anyone uses VAXes and MPI, so this is 
not likely to cause users problems.

See http://h71000.www7.hp.com/doc/82final/6443/6443pro_026.html:

> 8.3 Logical Data Representations
> 
> Logical data can be one, two, four, or eight bytes in length.
> 
> The default size used for a LOGICAL data declaration without a kind parameter 
> (or size specifier) is LOGICAL (KIND=4) (same as LOGICAL*4), unless you do 
> one of the following:
> 
>   • Explicitly declare the length of a LOGICAL declaration by using a 
> kind parameter, such as LOGICAL (KIND=4). HP Fortran provides intrinsic 
> LOGICAL kinds of 1, 2, 4, and 8. Each LOGICAL kind number corresponds to 
> number of bytes used by that intrinsic representation. 
> You can also use a size specifier, such as LOGICAL*4, but be aware this is an 
> extension to the Fortran 90 standard.
> 
>   • Use the FORTRAN command /INTEGER_SIZE=nn qualifier to control the 
> size of default (without a kind parameter or size specifier) LOGICAL and 
> INTEGER declarations (see Section 2.3.26).
> To improve performance, avoid using 2-byte or 1-byte logical declarations 
> (see Chapter 5).
> 
> Intrinsic LOGICAL*1 or LOGICAL (KIND=1) values are stored in a single byte.
> 
> Logical (intrinsic) values can also be stored in the following sizes of 
> contiguous bytes starting on an arbitrary byte boundary:
> 
>   • Two bytes (LOGICAL (KIND=2) or LOGICAL*2)
>   • Four bytes (LOGICAL (KIND=4) or LOGICAL*4)
>   • Eight bytes (LOGICAL (KIND=8) or LOGICAL*8)
> 
> The low-order bit determines whether the logical value is true or false. 
> Logical variables can also be interpreted as integer data (an extension to 
> the Fortran 90 standard). For example, in addition to having logical values 
> .TRUE. and .FALSE., LOGICAL*1 data can also have values in the range --128 to 
> 127.
> 
> LOGICAL*1, LOGICAL*2, LOGICAL*4, and LOGICAL*8 data representations appear in 
> Figure 8-5.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



Re: [OMPI devel] OMPI devel] [1.8.4rc2] orterun SEGVs on Solaris-10/SPARC

2014-12-12 Thread Larry Baker
On 12 Dec 2014, at 5:22 PM, Paul Hargrove wrote:

> HOWEVER, while the patch catches the "%u" case, there are plenty of potential 
> ways to hit the same problem if, for instance, one uses "%zu" for size_t.  
> Additionally, I've already noted that the code for "%ld", "%lx", "%lX", "%lf" 
> are all currently incorrect.


Not sure if it is applicable, but C99 has an  header which 
#include's  and provides additional capabilities, such as 
printf()/scanf() format macros for the types defined in .

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



Re: [OMPI devel] OMPI devel] [1.8.4rc2] orterun SEGVs on Solaris-10/SPARC

2014-12-12 Thread Larry Baker
Or, slightly modified using a defensive coding style:

>   return 1 + vsnprintf(dummy, sizeof( dummy ), fmt, ap);

if you like sizeof() [which I prefer].  if you like sizeof:

>   return 1 + vsnprintf(dummy, sizeof dummy, fmt, ap);
> 


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 12 Dec 2014, at 5:22 PM, Paul Hargrove wrote:

> OK, applying my attached patch (based on Gilles's observation) resolved the 
> problem!
> So I fully expect Ralph's plan to use "%d" to also resolve this.
> 
> HOWEVER, while the patch catches the "%u" case, there are plenty of potential 
> ways to hit the same problem if, for instance, one uses "%zu" for size_t.  
> Additionally, I've already noted that the code for "%ld", "%lx", "%lX", "%lf" 
> are all currently incorrect.
> 
> So, I ask: "Why isn't guess_strlen() just implemented as follows?"
> 
> /* From man vsnprintf:
>  *The functions snprintf and vsnprintf do not write more  than
>  * size  bytes (including the trailing '\0').  If the output was truncated
>  * due to this limit then the return value is  the  number  of  characters
>  * (not  including the trailing '\0') which would have been written to the
>  * final string if enough space had been available. 
>  */
> static int guess_strlen(const char *fmt, va_list ap)
> { 
>   char dummy[1];
>   return 1 + vsnprintf(dummy, 1, fmt, ap);
> }
> 
> 
> BTW: I do see some messages like "select: Interrupted system call" which I 
> assume are related to the timeout code (and thus the subject of a different 
> thread).
> 
> 
> -Paul 
> 
> On Fri, Dec 12, 2014 at 3:14 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> Thanks, Gilles!
> 
> I was looking at that same code just now and completely missed the lack of a 
> case for '%u' (and '%lu').  I will add one now and see if that resolves the 
> problem
> 
> 
> -Paul
> 
> On Fri, Dec 12, 2014 at 3:10 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> Ralph,
> 
> I cannot find a case for the %u format is guess_strlen
> And since the default does not invoke va_arg()
> I
> it seems strlen is invoked on nnuma instead of arch
> 
> Makes sense ?
> 
> Cheers,
> 
> Gilles
> 
> Ralph Castain <r...@open-mpi.org> wrote:
> Afraid I’m drawing a blank, Paul - I can’t see how we got to a bad address 
> down there. This is at the beginning of orte_init, so there are no threads 
> running nor has anything much happened.
> 
> Do you have any suggestions?
> 
> 
>> On Dec 12, 2014, at 9:02 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>> 
>> Ralph,
>> 
>> The "arch" variable looks fine:
>> Current function is opal_hwloc_base_get_topo_signature
>>  2134nnuma, nsocket, nl3, nl2, nl1, ncore, nhwt, arch);
>> (dbx) print arch
>> arch = 0x1001700a0 "sun4v"
>> 
>> And so is "fmt":
>> 
>> Current function is opal_asprintf
>>   194   length = opal_vasprintf(ptr, fmt, ap);
>> (dbx) print fmt
>> fmt = 0x7eeada98 "%uN:%uS:%uL3:%uL2:%uL1:%uC:%uH:%s"
>> 
>> However, things have gone bad in guess_strlen():
>> 
>> Current function is guess_strlen
>>71   len += (int)strlen(sarg);
>> (dbx) print sarg
>> sarg = 0x2 ""
>> 
>> -Paul
>> 
>> On Fri, Dec 12, 2014 at 2:24 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> Hmmm….this is really odd. I actually do have a protection for that arch 
>> value being NULL, and you are in the code section when it isn’t.
>> 
>> Do you still have the core file around? If so, can you print out the value 
>> of the “arch” variable? It would be in the 
>> opal_hwloc_base_get_topo_signature level.
>> 
>> I’m wondering if that value has been hosed, and the problem is memory 
>> corruption somewhere.
>> 
>> 
>>> On Dec 11, 2014, at 8:56 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>> Thanks Paul - I will post a fix for this tomorrow. Looks like Sparc isn’t 
>>> returning an architecture type for some reason, and I didn’t protect 
>>> against it.
>>> 
>>> 
>>>> On Dec 11, 2014, at 7:39 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>> 
>>>> Backtrace for the Solaris-10/SPARC SEGV appears below.
>>>> I've changed the subject line to distinguish this from the earlier report.
>>>> 
>>>> -Paul
>>>> 
>>>> program te

Re: [OMPI devel] still supporting pgi?

2014-12-11 Thread Larry Baker
On 11 Dec 2014, at 2:12 PM, Paul Hargrove wrote:

> I believe Larry Baker of USGS is also a PGI user (in production, rather than 
> just testing as I do). 


That is correct.

Although we are running a rather old Rocks cluster kit (CentOS based) which is 
so old that we cannot run the latest PGI releases.  Some time after the first 
of the year I plan to update Rocks and PGI and Intel and Oracle and GNU.  I'm 
giving up on PathScale and AMD/Open64.  I have already updated all the cluster 
firmware.  I just get side tracked.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at execution on an embedded ARM Linux kernel version 3.15.0

2014-11-25 Thread Larry Baker
Allan,

If you can still boot the old embedded system, a lot of times the config 
parameters are saved as /proc/config.gz.  You can at least them compare the two 
configs.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 25 Nov 2014, at 11:11 AM, Allan Wu wrote:

> Thanks Paul! Unfortunately '/boot' is not available in my embedded linux, and 
> I do not have the configuration file for the old kernel since it is provided 
> as is. However, I have the new kernel configuration since I compiled it 
> myself. Would it be helpful if I provide you the .config file when I compile 
> the kernel? It maybe quite painful to look through that file though. Is there 
> any other way that I can obtain the configuration? 
> 
> I checked my config for the new kernel, and UNIX-domain sockets and Sys V IPC 
> are both enabled in the build. Are there any other possibilities I can check?
> 
> Thanks,
> Di
> 
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu
> 
> On Tue, Nov 25, 2014 at 10:45 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> Allan,
> 
> A likely possibility is that some important kernel feature (that Open MPI 
> assumes is present) is missing.
> That includes not only "kernel modules" as you mention, but also features 
> configure in (or out) of the base kernel.
> For instance, some embedded kernels omit UNIX-domain sockets and SysV IPC 
> support.
> 
> If you can send me (preferably off-list) the kernel config files for the old 
> an new kernels I may be able to spot something.
> If present, you are looking for /boot/config-[VERSION]
> 
> -Paul
> 
> On Tue, Nov 25, 2014 at 10:25 AM, Allan Wu <al...@cs.ucla.edu> wrote:
> I'm sorry I forgot to change the subject when I reply to the digest issue. 
> Please find my original email below. 
> 
> Regards,
> Di
> 
> On Tue, Nov 25, 2014 at 10:19 AM, Allan Wu <al...@cs.ucla.edu> wrote:
> Thanks Ralph for the reply. Sorry about the log file, I think I forgot to put 
> an extension to the file. Please find a new one attached with this email. 
> 
> I'm sorry for not enough debugging information, but 'omp_info' and 
> '--debug-devel' are the only ways I know for collecting information, are 
> there any other things I can try to provide more info?
> 
> When I execute 'mpirun --debug-devel -np 1 ./helloworld', all the output is 
> the logging information in my last email. It got stuck at  "[fpga1:00718] 
> tmp: /tmp", and nothing from my helloworld program is printed out to the 
> screen. So I think it is mpirun failing to start my executable, not failing 
> to terminate.
> 
> I was wondering if this has anything to do with my newer kernel version, 
> since it works well in the old case. 
> 
> Thanks,
> --
> Di Wu (Allan)
> PhD student, VAST Laboratory,
> Department of Computer Science, UC Los Angeles
> Email: al...@cs.ucla.edu
> 
> 
> Date: Tue, 25 Nov 2014 07:29:51 -0800
> From: Ralph Castain <r...@open-mpi.org>
> To: Open MPI Developers <de...@open-mpi.org>
> Subject: Re: [OMPI devel] OpenMPI v1.8 and v1.8.3 mpirun hangs at
> execution   on an embedded ARM Linux kernel version 3.15.0
> Message-ID: <898cb117-f6a6-4569-89c3-49b75d65b...@open-mpi.org>
> Content-Type: text/plain; charset="utf-8"
> 
> I don?t know what you put in that log file, but it was an executable and I?m 
> not feeling that trusting :-)
> 
> I?m afraid there isn?t enough debug output there to really tell anything. 
> From what little I can see, I?m guessing that the application ran fine and 
> you got the usual ?hello? output and the helloworld process exited safely - 
> is that correct? And so it is solely mpirun that is failing to cleanly 
> terminate?
> 
> 
> > On Nov 24, 2014, at 11:24 PM, Allan Wu <al...@cs.ucla.edu> wrote:
> >
> > Hello everyone,
> >
> > I have cross-compiled OpenMPI for an embedded ARM Linux. Everything works 
> > fine for my system based on Linux 3.8.0. I have previously submitted a post 
> > related to my compilation, which can be found here: 
> > http://www.open-mpi.org/community/lists/devel/2014/04/14440.php 
> > <http://www.open-mpi.org/community/lists/devel/2014/04/14440.php>. When I 
> > recently upgraded my Linux kernel to 3.15.0, mpirun begins to stuck at even 
> > the helloworld program. The program consists only simple APIs: MPI_Init, 
> > MPI_Comm_size, MPI_Comm_rank, MPI_Finalize. The problem occurs even at 
> > 'mpirun -np 1 ./helloworld', and below are the output with --debug-devel 
> > (before it got stuck):
> > [fpga1:00716] sess_dir_finaliz

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-29 Thread Larry Baker
PGI 14.7 is VERY new -- I just received the announcement on Sunday.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 29 Jul 2014, at 4:25 PM, Paul Hargrove wrote:

> I have license for PGI and installations of 14.1 and 14.4
> I will see what I can do today in terms of testing.
> 
> -Paul
> 
> 
> On Tue, Jul 29, 2014 at 4:23 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> Tetsuya --
> 
> I am unable to test with the PGI compiler -- I don't have a license.  I was 
> hoping that LANL would be able to test today, but I don't think they got to 
> it.
> 
> Can you send more details?
> 
> E.g., can you send the all the stuff listed on 
> http://www.open-mpi.org/community/help/ for 1.8 and 1.8.2rc2 for the 14.7 
> compiler?
> 
> I'm *guessing* that we've done something new in the changes since 1.8 that 
> PGI doesn't support, and we need to disable that something (hopefully while 
> not needing to disable the entire mpi_f08 bindings...).
> 
> 
> 
> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
> 
> >
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
> > program. Then, it causes linking error:
> >
> > [mishima@manage work]$ cat test.f
> >  program hello_world
> >  use mpi_f08
> >  implicit none
> >
> >  type(MPI_Comm) :: comm
> >  integer :: myid, npes, ierror
> >  integer :: name_length
> >  character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
> >
> >  call mpi_init(ierror)
> >  comm = MPI_COMM_WORLD
> >  call MPI_Comm_rank(comm, myid, ierror)
> >  call MPI_Comm_size(comm, npes, ierror)
> >  call MPI_Get_processor_name(processor_name, name_length, ierror)
> >  write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
> > +"Process", myid, "of", npes, "is on", trim(processor_name)
> >  call MPI_Finalize(ierror)
> >
> >  end program hello_world
> >
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to `mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
> >
> > So, I did some more tests with previous version of PGI and
> > openmpi-1.8. The results are summarized as follows:
> >
> >  PGI13.10   PGI14.7
> > openmpi-1.8   OK OK
> > openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post: 
> > http://www.open-mpi.org/community/lists/devel/2014/07/15303.php
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15335.php
> 
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15336.php



Re: [OMPI devel] oshmem fixed sized type support

2014-05-05 Thread Larry Baker
Jeff,

The stdint.h header is Section 7.18 Integer types  in the C99 
standard (I can mail you a PDF copy if you like).  It says

> 7.18.1.1 Exact-width integer types
> 
> 1 The typedef name intN_t designates a signed integer type with width N , 
> no padding bits, and a two’s complement representation. Thus, int8_t denotes 
> a signed integer type with a width of exactly 8 bits.
> 
> 2 The typedef name uintN_t designates an unsigned integer type with width 
> N . Thus, uint24_t denotes an unsigned integer type with a width of exactly 
> 24 bits.
> 
> 3 These types are optional. However, if an implementation provides 
> integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for 
> the signed types) that have a two’s complement representation, it shall 
> define the corresponding typedef names.

If OpenMPI requires C99 conformance, stdint.h will be there, but you will have 
to check for any width (u)int_t's you care about by name.  Since these are 
typedefs, I am not sure how that might be done in CPP; a configure step might 
be required.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 5 May 2014, at 2:46 PM, Jeff Squyres (jsquyres) wrote:

> Josh --
> 
> Is this the Right fix?
> 
> I ask because we check for  in configure.  I'm sure it's always 
> there for Linux, but is it *always* there?  Indeed, are all the fixed size 
> types always guaranteed to be available?
> 
> 
> 
> On May 2, 2014, at 12:14 PM, svn-commit-mai...@open-mpi.org wrote:
> 
>> Author: jladd (Joshua Ladd)
>> Date: 2014-05-02 12:14:05 EDT (Fri, 02 May 2014)
>> New Revision: 31605
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/31605
>> 
>> Log:
>> Adding missing include for OSHMEM changes necessary to support Java bindings.
>> 
>> Text files modified: 
>>  trunk/oshmem/include/shmem.h.in | 2 +-  
>> 
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>> 
>> Modified: trunk/oshmem/include/shmem.h.in
>> ==
>> --- trunk/oshmem/include/shmem.h.in  Fri May  2 10:28:45 2014(r31604)
>> +++ trunk/oshmem/include/shmem.h.in  2014-05-02 12:14:05 EDT (Fri, 02 May 
>> 2014)  (r31605)
>> @@ -14,7 +14,7 @@
>> 
>> 
>> #include  /* include for ptrdiff_t */
>> -
>> +#include  /* include for fixed width types */
>> #if defined(c_plusplus) || defined(__cplusplus)
>> #include 
>> #define OSHMEM_COMPLEX_TYPE(type)std::complex
>> ___
>> svn-full mailing list
>> svn-f...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14685.php



Re: [OMPI devel] 1.7.4rc: MPI_F08_INTERFACES_CALLBACKS build failure with PathScale 4.0.12.1

2014-01-22 Thread Larry Baker
My copy of the Fortran 2003 Standard (Adams, et al., The Fortran 203 Handbook), 
says Fortran Names (incl. procedures, section 3.2.2) are permitted to be up to 
63 characters.  This is not phrased as a requirement, though.  It could be that 
a conforming processor could restrict this to fewer characters i.e., if the 
linker/loader does not support that many characters in an external symbol.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 22 Jan 2014, at 8:50 AM, Jeff Squyres (jsquyres) wrote:

> On Jan 21, 2014, at 11:49 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
>> Looks like we may be getting closer, but are not quite there:
>> 
>>  PPFC mpi-f08.lo
>>   BIND(C, name="ompi_type_create_hindexed_block_f")
>>^
>> pathf95-1690 pathf95: ERROR OMPI_TYPE_CREATE_HINDEXED_BLOCK_F, File = 
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.7-latest-linux-x86_64-pathcc-4.0/openmpi-1.7.4rc2r30361/ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h,
>>  Line = 605, Column = 17
>>  NAME= specifier in BIND clause requires scalar character constant
> 
> Wow.  Pulling on this thread turned up a whole pile of bugs :-\, including 
> several other names that are >=32 characters:
> 
> Found long name: ompi_type_create_indexed_block_f (32)
> Found long name: ompi_type_create_hindexed_block_f (33)
> Found long name: pompi_type_create_indexed_block_f (33)
> Found long name: pompi_type_create_hindexed_block_f (34)
> Found long name: pompi_file_get_position_shared_f (32)
> Found long name: pompi_file_write_ordered_begin_f (32)
> 
> Can you do me a favor and cd into ompi/mpi/fortran/use-mpi-f08 and try to 
> manually "make type_create_indexed_block_f08.lo" and see if it also 
> complains?  That's a 32 character name -- let's see if the limit is >=32 or 
> >=33...
> 
>> pathf95-1044 pathf95: INTERNAL OMPI_COMM_CREATE_KEYVAL_F, File = 
>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-1.7-latest-linux-x86_64-pathcc-4.0/openmpi-1.7.4rc2r30361/ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h,
>>  Line = 1242, Column = 38
>>  Internal : Unexpected ATP_PGM_UNIT in check_interoperable_pgm_unit()
>> make[2]: *** [mpi-f08.lo] Error 1
>> make[2]: Leaving directory 
>> `/global/scratch2/sd/hargrove/OMPI/openmpi-1.7-latest-linux-x86_64-pathcc-4.0/BLD/ompi/mpi/fortran/use-mpi-f08'
>> 
>> The first error appears likely to be due to the 33-character name for the C 
>> binding.
>> Not sure if that is a limitation allowed by the fortran spec, or an 
>> arbitrary limitation in this compiler.
>> 
>> The "Internal" may be a show-stopper (not OMPI's fault), unless it goes away 
>> once the prior error is resolved.
> 
> I'll ask Pathscale; thanks.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Warnings from pgcc-13.10

2014-01-17 Thread Larry Baker
Paul,

From what I can see in the arguments to OPAL_OUTPUT_VERBOSE() in line 356 at 
https://bitbucket.org/ompiteam/ompi-svn-mirror/src/f48eeda443104a64dc89e4f5fab4c940e44d8615/opal/mca/db/hash/db_hash.c,
 this is the same PGI bug I reported 22 Jul 2010, which was assigned TPR 17139.


> Customer information:
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
> 
> Product: 2183-WS
> PIN: 507549
> 
> Problem description:
> 
> I am trying to track down the warnings that occur when compiling the UCAR 
> NetCDF package with PGI compilers.  I have found a case that gcc does not 
> warn about, but pgcc does.  If I understand the code and the C (1990) 
> standard, I think pgcc is complaining too much.
> 
> You can reproduce the warnings by downloading the UCAR NetCDF source package, 
> netcdf.tar.gz, fromhttp://www.unidata.ucar.edu/software/netcdf/.  Assuming 
> you download it to /usr/local/src, here are the commands that illustrate the 
> warnings:
> 
> # cd /usr/local/src
> # tar -xzf netcdf.tar.gz
> # cd netcdf-4.1.1
> # ./configure >/dev/null 2>&1
> # cd ncgen
> # pgcc -DHAVE_CONFIG_H -I. -I.. -I../fortran   -I.. -I../libsrc 
> -I../libsrc-g -O2 -tp amd64 -Msignextend -DNO_PGI_OFFSET -c -o genf77.o 
> genf77.c
> PGC-W-0095-Type cast required for this conversion (genf77.c: 498)
> PGC-W-0095-Type cast required for this conversion (genf77.c: 511)
> PGC/x86-64 Linux 10.3-0: compilation completed with warnings
> 
> To eliminate the warnings, I had to modify the two source lines to cast the 
> result from static const char* f77varncid() as (char *):
> 
>>/* Use the specialized put_att_XX routines if possible*/
>>switch (basetype->typ.typecode) {
>>case NC_BYTE:
>>case NC_SHORT:
>>case NC_INT:
>>case NC_FLOAT:
>>case NC_DOUBLE:
>>f77attrify(asym,code);
>>codedump(code);
>>bbClear(code);
>>bbprintf0(stmt,"stat = nf_put_att_%s(ncid, %s, %s, %s, %lu, %sval)\n",
>>nfstype(basetype->typ.typecode),
>>(asym->att.var == NULL?"NF_GLOBAL"
>>  :(char *) f77varncid(asym->att.var)),   
>> <--- here
>>f77escapifyname(asym->name),
>>nftype(basetype->typ.typecode),
>>len,
>>ncftype(basetype->typ.typecode));
>>codedump(stmt);
>>break;
>> 
>>case NC_CHAR:
>>len = bbLength(code);
>>f77quotestring(code);
>>bbprintf0(stmt,"stat = nf_put_att_text(ncid, %s, %s, %lu, ",
>>(asym->att.var == NULL?"NF_GLOBAL"
>>  :(char *)f77varncid(asym->att.var)),   
>> <--- and here
>>f77escapifyname(asym->name),
>>(len==0?1:len));
>>codedump(stmt);
>>codedump(code);
>>codeline(")");
>>break;
> 
> Here is static const char* f77varncid():
> 
>> /* Compute the name for a given var's id*/
>> /* Watch out: the result is a static*/
>> static const char*
>> f77varncid(Symbol* vsym)
>> {
>>const char* tmp1;
>>char* vartmp;
>>tmp1 = f77name(vsym);
>>vartmp = poolalloc(strlen(tmp1)+strlen("_id")+1);
>>strcpy(vartmp,tmp1);
>>strcat(vartmp,"_id");
>>return vartmp;
>> }
> 
> There are other lines in genf77.c that use f77varncid() in a print statement, 
> so the warnings do not occur every time f77varncid() provides a string for %s:
> 
>>if (nvars > 0) {
>>f77skip();
>>f77comment("variable ids");
>>for(ivar = 0; ivar < nvars; ivar++) {
>>Symbol* vsym = (Symbol*)listget(vardefs,ivar);
>>bbprintf0(stmt,"integer %s;\n", f77varncid(vsym));
>>codedump(stmt);
>>}
> 
> The warnings occur in the only two instances where f77varncid() is used in a 
> conditional expression.  In both cases, the second operand is a literal 
> string, e.g., "NF_GLOBAL".  I would have thought that a (static const char*) 
> and a string literal would be compatible types.  I experimented with a (const 
> char *) cast instead of a (char *) cast, but that does not work.  I think it 
> should.
> 
> I admit, I have an old copy of the C standard — from 1990.  But, here's my 
> interpretation of what it says about this:
> 
> • 6.1.4 String literals, says string literals are converted to

Re: [OMPI devel] Warnings from pgcc-13.10

2014-01-17 Thread Larry Baker
Paul, Ralph,I had several issues in 2010 with with PGI pgcc being overly picky about type mismatches.  Attached are my e-mails from that time.  I was working on NetCDF and OpenMPI.  In the OpenMPI report (17 Aug 2010), I found problems in conditional expressions.  The last e-mail in the thread from PGI said they fixed the bugs in the 12.10 release.  But, that e-mail (14 Dec 2012) only cites TPRs 17185 and 17186, not my earlier TPR 17139.  I have not revisited these issues since then, so I don't know if that old bug is still around and is what is biting you.
Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov

On 17 Jan 2014, at 8:56 AM, Ralph Castain wrote:--- Begin Message ---

Customer information:

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

Product: 2183-WS
PIN: 507549

Problem description:

I am trying to track down the warnings that occur when compiling the  
UCAR NetCDF package with PGI compilers.  I have found a case that gcc  
does not warn about, but pgcc does.  If I understand the code and the  
C (1990) standard, I think pgcc is complaining too much.


You can reproduce the warnings by downloading the UCAR NetCDF source  
package, netcdf.tar.gz, from http://www.unidata.ucar.edu/software/netcdf/ 
.  Assuming you download it to /usr/local/src, here are the commands  
that illustrate the warnings:


# cd /usr/local/src
# tar -xzf netcdf.tar.gz
# cd netcdf-4.1.1
# ./configure >/dev/null 2>&1
# cd ncgen
# pgcc -DHAVE_CONFIG_H -I. -I.. -I../fortran   -I.. -I../libsrc - 
I../libsrc-g -O2 -tp amd64 -Msignextend -DNO_PGI_OFFSET -c -o  
genf77.o genf77.c

PGC-W-0095-Type cast required for this conversion (genf77.c: 498)
PGC-W-0095-Type cast required for this conversion (genf77.c: 511)
PGC/x86-64 Linux 10.3-0: compilation completed with warnings

To eliminate the warnings, I had to modify the two source lines to  
cast the result from static const char* f77varncid() as (char *):



/* Use the specialized put_att_XX routines if possible*/
switch (basetype->typ.typecode) {
case NC_BYTE:
case NC_SHORT:
case NC_INT:
case NC_FLOAT:
case NC_DOUBLE:
f77attrify(asym,code);
codedump(code);
bbClear(code);
bbprintf0(stmt,"stat = nf_put_att_%s(ncid, %s, %s, %s, %lu,  
%sval)\n",

nfstype(basetype->typ.typecode),
(asym->att.var == NULL?"NF_GLOBAL"
  :(char *) f77varncid(asym- 
>att.var)),   <--- here

f77escapifyname(asym->name),
nftype(basetype->typ.typecode),
len,
ncftype(basetype->typ.typecode));
codedump(stmt);
break;

case NC_CHAR:
len = bbLength(code);
f77quotestring(code);
bbprintf0(stmt,"stat = nf_put_att_text(ncid, %s, %s, %lu, ",
(asym->att.var == NULL?"NF_GLOBAL"
  :(char *)f77varncid(asym- 
>att.var)),   <--- and here

f77escapifyname(asym->name),
(len==0?1:len));
codedump(stmt);
codedump(code);
codeline(")");
break;


Here is static const char* f77varncid():


/* Compute the name for a given var's id*/
/* Watch out: the result is a static*/
static const char*
f77varncid(Symbol* vsym)
{
const char* tmp1;
char* vartmp;
tmp1 = f77name(vsym);
vartmp = poolalloc(strlen(tmp1)+strlen("_id")+1);
strcpy(vartmp,tmp1);
strcat(vartmp,"_id");
return vartmp;
}


There are other lines in genf77.c that use f77varncid() in a print  
statement, so the warnings do not occur every time f77varncid()  
provides a string for %s:



if (nvars > 0) {
f77skip();
f77comment("variable ids");
for(ivar = 0; ivar < nvars; ivar++) {
Symbol* vsym = (Symbol*)listget(vardefs,ivar);
bbprintf0(stmt,"integer %s;\n", f77varncid(vsym));
codedump(stmt);
}


The warnings occur in the only two instances where f77varncid() is  
used in a conditional expression.  In both cases, the second operand  
is a literal string, e.g., "NF_GLOBAL".  I would have thought that a  
(static const char*) and a string literal would be compatible types.   
I experimented with a (const char *) cast instead of a (char *) cast,  
but that does not work.  I think it should.


I admit, I have an old copy of the C standard — from 1990.  But,  
here's my interpretation of what it says about this:


• 6.1.4 String literals, says string literals are converted to an  
array of type char.  If the program attempts to modify a string  
literal, the behavior is undefined.  It does not say that the type has  
the const type qualifier (though, you would think it should).


• 6.3.15 Conditional operator, says if the second and third operands  
are p

Re: [OMPI devel] 1.7.4rc2r30168 - PGI F08 failure

2014-01-09 Thread Larry Baker
I wonder if the reason PGI V10.x does not use mpi_f08 bindings is that old PGI 
compiler version number parsing error.  (Unless, of course, if PGI V11.x or 
V12.x do use mpi_f08 bindings.)  In that old (autoconf?) bug, decisions were 
made about features supported on PGI compilers by parsing the version number of 
the compiler.  Trouble was, only the first digit was examined, leading to PGI 
V10.x, V11.x, V12.x, ..., all being parsed as V1.  My recollection is that some 
C++ code was affected.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 9 Jan 2014, at 4:35 PM, Paul Hargrove wrote:

> My attempts to build the current 1.7.4rc tarball with versions 8.0 and 9.0 of 
> the Portland Group compilers fails miserably on compilation of 
> mpi-f08-types.F90.
> 
> I am sort of surprised by the attempt to build Fortran 2008 support w/ such 
> old compilers.
> I think that fact itself is the real bug here, right? 
> 
> With pgi-10.0 I see configure say:
> checking if building Fortran 'use mpi' bindings... yes
> checking if building Fortran 'use mpi_f08' bindings... no
> 
> But pgi-8.0 and 9.0 both get identified as "good" compilers.
> 
> pgi-9.0:
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports BIND(C)... yes
> checking if Fortran compiler supports BIND(C) with LOGICAL params... yes
> checking if Fortran compiler supports optional arguments... yes
> checking if Fortran compiler supports private... no
> checking if Fortran compiler supports abstract... yes
> checking if Fortran compiler supports asynchronous... no
> checking if Fortran compiler supports procedure... no
> checking size of Fortran type(test_mpi_handle)... 4
> checking Fortran compiler F08 assumed rank syntax... not cached; checking
> checking for Fortran compiler support of TYPE(*), DIMENSION(..)... no
> checking Fortran compiler F08 assumed rank syntax... no
> checking which mpi_f08 implementation to build... "good" compiler, no array 
> subsections
> configure: WARNING: Temporary development override: forcing the use of F08 
> wrappers
> checking if building Fortran 'use mpi_f08' bindings... yes
> 
> pgi-8.0 (almost, but not quite, the same):
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports BIND(C)... yes
> checking if Fortran compiler supports BIND(C) with LOGICAL params... yes
> checking if Fortran compiler supports optional arguments... yes
> checking if Fortran compiler supports private... no
> checking if Fortran compiler supports abstract... no
> checking if Fortran compiler supports asynchronous... no
> checking if Fortran compiler supports procedure... no
> checking size of Fortran type(test_mpi_handle)... 4
> checking Fortran compiler F08 assumed rank syntax... not cached; checking
> checking for Fortran compiler support of TYPE(*), DIMENSION(..)... no
> checking Fortran compiler F08 assumed rank syntax... no
> checking which mpi_f08 implementation to build... "good" compiler, no array 
> subsections
> configure: WARNING: Temporary development override: forcing the use of F08 
> wrappers
> checking if building Fortran 'use mpi_f08' bindings... yes
> 
> The bzip2-compressed config.log files for pgi-8.0 and 9.0 are attached.
> 
> -Paul 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] http://newscenter.lbl.gov/wp-content/uploads/2008/07/yelick-berkeleyview-july081.pdf

2013-10-23 Thread Larry Baker
Ralph, et al.,

Kathy Yelick is featured in this month's ACM Member News sidebar (p. 17).  It 
also turns out she will be giving a lecture at SC13.

http://sc13.supercomputing.org/content/sc13-feature-acm-athena-lecturer-katherine-yelick

Check it out.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 6 Jun 2013, at 11:07 AM, Larry Baker wrote:

> Ralph,
> 
> I mentioned at dinner how much I like listening to UCB Prof. Kathy Yelick's 
> lectures.  Here's some slides I found of hers.  She is also associate lab 
> director of NERSC at LBNL.
> 
> http://newscenter.lbl.gov/wp-content/uploads/2008/07/yelick-berkeleyview-july081.pdf
> 
> http://www.lanl.gov/orgs/hpc/salishan/pdfs/Salishan%20slides/Yelick.pdf
> 
> http://www.hpcsw.org/presentations/wed/yelick.pdf
> 
> Here's the UCB web page about their view of HPC/Parallel computing.
> 
> http://view.eecs.berkeley.edu/wiki/Main_Page
> 
> I think the one-way (DMA) messaging model is very important, especially when 
> it can be hidden and automatically (and correctly) done by the compiler 
> (requires new languages?).
> 
> Any time you get a chance to hear her (or invite her to speak) you will enjoy 
> it.  The work they are doing there will inform you to benefit your own work.
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
> 
> 
> 



Re: [OMPI devel] How to deal with F90 mpi.mod with single stack and multiple compiler suites?

2013-08-22 Thread Larry Baker
Chris,

.mod files are compiler-specific, and may even be version-specific.  You may, 
however, be lucky enough to compile the Fortran interface definitions with 
ifort and supply that mpi.mod to ifort, even though the actual code was 
compiled with gfortran.  I have never tried that -- we build separate 
development trees for each OpenMPI version/compiler product/compiler version.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 22 Aug 2013, at 6:50 AM, Jeff Squyres (jsquyres) wrote:

> Sadly, probably not. :(. You'll prbably have the same problem with c++, too. 
> 
> There *may* be compatibility command line options for ifort/icpc to make them 
> link compatible w gfortran/g++, but I've never had much faith in them. 
> 
> Sent from my phone. No type good. 
> 
> On Aug 22, 2013, at 2:24 AM, "Christopher Samuel" <sam...@unimelb.edu.au> 
> wrote:
> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>> 
>> Hi folks,
>> 
>> We've got what we thought would be a fairly standard OMPI (1.6.5)
>> install which is a single install built with GCC and then setting the
>> appropriate variables to use the Intel compilers when someone loads
>> our "intel" module:
>> 
>> $ module show intel
>> [...]
>> setenv   OMPI_CC icc
>> setenv   OMPI_CXX icpc
>> setenv   OMPI_F77 ifort
>> setenv   OMPI_FC ifort
>> setenv   OMPI_CFLAGS -xHOST -O3 -mkl=sequential
>> setenv   OMPI_FFLAGS -xHOST -O3 -mkl=sequential
>> setenv   OMPI_FCFLAGS -xHOST -O3 -mkl=sequential
>> setenv   OMPI_CXXFLAGS -xHOST -O3 -mkl=sequential
>> 
>> This works wonderfully, *except* when our director attempted to build
>> an F90 program with the Intel compilers that fails to build because
>> the mpi.mod F90 module was produced with gfortran rather than the
>> Intel compilers. :-(
>> 
>> Is there any way to avoid having to do parallel installs of OMPI with
>> GCC and Intel compilers just to have two different versions of these
>> files?
>> 
>> My brief googling hasn't indicated anything, and I don't see anything
>> in the mpif90 manual page (though I have to admit I've had to rush to
>> try and get this done before I need to leave for the day). :-(
>> 
>> cheers,
>> Chris
>> - -- 
>> Christopher SamuelSenior Systems Administrator
>> VLSCI - Victorian Life Sciences Computation Initiative
>> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>> http://www.vlsci.org.au/  http://twitter.com/vlsci
>> 
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1.4.11 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>> 
>> iEYEARECAAYFAlIVrpIACgkQO2KABBYQAh/GAQCggQGnc18kSfMcGle3a3pWZGgD
>> UQ8AoIz61uuOPj+TFJwSYMTaAtUBLk3J
>> =yJ6J
>> -END PGP SIGNATURE-
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] June OMPI developer's meeting

2013-05-10 Thread Larry Baker
Aurelien,

Your hotel may have an SFO shuttle service, and there are airport vans that 
make the rounds to the various hotels.  When you have an itinerary, let us 
know.  I live on the coast pretty much due W of SFO, and I work in Menlo Park.  
If our schedules match, I could probably shuttle you at least part of the way.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 10 May 2013, at 8:07 AM, Ralph Castain wrote:

> It's doable, if you have patience. You have to take CalTrain from the airport 
> to San Jose - see:
> 
> http://www.bart.gov/docs/visitorguide0610.pdf
> 
> Basically, you take BART from SFO to the Millbrae CalTrain station, then take 
> the southbound train to San Jose.
> 
> 
> On May 10, 2013, at 7:42 AM, Aurélien Bouteiller <boute...@icl.utk.edu> wrote:
> 
>> I will be attending. 
>> 
>> Can some local chime in and tell me how practical it is to land in San 
>> Francisco and use public transportation to go to San Jose? Plane schedule to 
>> San Jose directly is not very flexible. 
>> 
>> Aurelien 
>> 
>> 
>> 
>> Le 7 mai 2013 à 15:19, Larry Baker <ba...@usgs.gov> a écrit :
>> 
>>> On 6 May 2013, at 11:14 AM, Jeff Squyres (jsquyres) wrote:
>>> 
>>>> We typically do something informally scheduled on the day of, or somesuch 
>>>> (e.g., around 4pm people start wondering aloud what we should do for 
>>>> dinner :-) ).  But if there is interest for others to attend, we can 
>>>> probably set up something ahead of time.
>>> 
>>> This option will work best for me.  All I need is an e-mail notice of where 
>>> and when within 30 minutes or so of the reservation time (depending on the 
>>> traffic on 101 :) ).
>>> 
>>> Larry Baker
>>> US Geological Survey
>>> 650-329-5608
>>> ba...@usgs.gov
>>> 
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> --
>> * Dr. Aurélien Bouteiller
>> * Researcher at Innovative Computing Laboratory
>> * University of Tennessee
>> * 1122 Volunteer Boulevard, suite 309b
>> * Knoxville, TN 37996
>> * 865 974 9375
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] June OMPI developer's meeting

2013-05-07 Thread Larry Baker
On 6 May 2013, at 11:14 AM, Jeff Squyres (jsquyres) wrote:

> We typically do something informally scheduled on the day of, or somesuch 
> (e.g., around 4pm people start wondering aloud what we should do for dinner 
> :-) ).  But if there is interest for others to attend, we can probably set up 
> something ahead of time.


This option will work best for me.  All I need is an e-mail notice of where and 
when within 30 minutes or so of the reservation time (depending on the traffic 
on 101 :) ).

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov





Re: [OMPI devel] June OMPI developer's meeting

2013-04-30 Thread Larry Baker
Jeff,

I would like to try to get down there (from Menlo Park) to meet and thank the 
OpenMPI developers, but I doubt I would be a very useful participant.  Will 
there be a social event that I could attend?  Maybe there is one at the MPI 
Forum?  Because of the US budget sequestration, I cannot spend any money, 
unless I do it on my own (in which case I have to bend over backwards to 
obscure my government affiliation -- like no mention of USGS on my registration 
or name badge).

Thanks,

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 30 Apr 2013, at 2:58 PM, Jeff Squyres (jsquyres) wrote:

> We are having a developer's meeting right before the MPI Forum meeting in 
> June, 2013.  All are welcome to attend.  Two primary topics for this meeting 
> will be: a) progress since the UTK meeting on moving the BTLs down to OPAL, 
> and b) asynchronous progress.
> 
> *** PLEASE SIGN UP ON THE WIKI PAGE SO THAT I CAN GET YOU A VISITOR'S BADGE 
> AND WIFI ***
> 
> Here's the wiki page:
> 
>https://svn.open-mpi.org/trac/ompi/wiki/Jun13Meeting
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [OMPI svn] svn:open-mpi r27601 - trunk

2012-11-15 Thread Larry Baker
Nathan,

I'm no Perl programmer AT ALL, so take this for what it it worth ($0).

This RE correctly parses a version no. either at the beginning of the string or 
preceded by a white-space character.

> $ echo "10.4.2" | perl -E 'while () { if ( m/(^|\s)((\d+\.)+\d+)/ ) { 
> $version = $2 ; print $version, "\n" ; last } }'
> 10.4.2


I modified the RE from the last one I saw you post ($version =~ 
m/\s([\d\.]+\w?)/m;) to allow for multiple digit fields, to remove the "m" 
modifier ($ is gone now), and to only allow digits in the last field of the 
version number.

I don't know all the contortions of version strings you are trying to match, 
i.e., why you allowed any alphanumeric (including _) for the last field.  This 
one will match all digits with an optional single letter suffix at the end 
(i.e., must be at a word boundary).

> $ echo "10.4.2A" | perl -E 'while () { if ( 
> m/(^|\s)((\d+\.)+\d+([a-zA-Z]\b)?)/ ) { $version = $2 ; print $version, "\n" 
> ; last } }'
> 10.4.2A

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 15 Nov 2012, at 8:42 AM, Hjelm, Nathan T wrote:

> Committed as r27615. Let me know if there are any more issues.
> 
> -Nathan
> 
> 
> From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] on behalf of 
> Ralph Castain [r...@open-mpi.org]
> Sent: Thursday, November 15, 2012 8:53 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r27601 - trunk
> 
> Looks fine to me. I would only add one further refinement - I think we should 
> check m4, but add a check in autogen.pl so that if we get nothing useful back 
> from -v (or whatever), then output a warning that we couldn't validate the 
> version and assume it is okay.
> 
> I believe the tool will return a non-zero status if the option isn't 
> supported, so we should be able to do this - yes?
> 
> 
> On Nov 15, 2012, at 7:48 AM, "Hjelm, Nathan T" <hje...@lanl.gov> wrote:
> 
>> Since the version of m4 that comes with Solaris likely works with all our 
>> .m4 files and there is no way to check the version (no --version, -v, -V, or 
>> anything from what I can tell) I guess we have no choice but to not check 
>> the m4 version.
>> 
>> flex on the other hand we can check. How about this for the new regex (for 
>> reference the old one is $version =~ m/\s([\d\w\.]+)$/m; -- matching a 
>> version at the end of the line):
>> 
>> $version =~ m/\s([\d\.]+\w?)/m;
>> 
>> It works with Apple's flex and still works with glibtoolize, autoconf, and 
>> automake.
>> 
>>  Searching for autoconf
>>Found autoconf version 2.69; checking version...
>>  Found version component 2 -- need 2
>>  Found version component 69 -- need 65
>>==> ACCEPTED
>>  Searching for libtoolize
>> libtoolize not found
>>  Searching for glibtoolize
>>Found glibtoolize version 2.4.2; checking version...
>>  Found version component 2 -- need 2
>>  Found version component 4 -- need 2
>>==> ACCEPTED
>>  Searching for automake
>>Found automake version 1.12.2; checking version...
>>  Found version component 1 -- need 1
>>  Found version component 12 -- need 11
>>==> ACCEPTED
>>  Searching for flex
>>Found flex version 2.5.35; checking version...
>>  Found version component 2 -- need 2
>>  Found version component 5 -- need 5
>>  Found version component 35 -- need 35
>>==> ACCEPTED
>>  Searching for m4
>>Found m4 version 1.4.6; checking version...
>>  Found version component 1 -- need 1
>>  Found version component 4 -- need 4
>>  Found version component 6 -- need 16
>>==> Too low!  Skipping this version
>>  Searching for gm4
>>    Found gm4 version 1.4.16; checking version...
>>  Found version component 1 -- need 1
>>  Found version component 4 -- need 4
>>  Found version component 16 -- need 16
>>==> ACCEPTED
>> 
>> 
>> -Nathan
>> 
>> 
>> From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] on behalf of 
>> Paul Hargrove [phhargr...@lbl.gov]
>> Sent: Wednesday, November 14, 2012 7:37 PM
>> To: Larry Baker
>> Cc: Open MPI Developers
>> Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r27601 - trunk
>> 
>> Larry,
>> 
>> I just wanted to speak up quickly to be sure nobody used your example to 
>> "fix" the Mac OS problem and thereby break Solaris instead.  No personal 
>> att

Re: [OMPI devel] [OMPI svn] svn:open-mpi r27601 - trunk

2012-11-14 Thread Larry Baker
Ralph,

> Ick - usually tools support some kind of version option. :-(


In the olden days, -V (big V) was a reasonably standard request for version no. 
(little V being verbosity).  The GHNU command line parsing added the wordy 
options preceded by double dashes.  Unfortunately, gcc does not follow this 
convention (aargh).

> savaii:~ baker$ gcc -V
> gcc-4.2: argument to `-V' is missing

> savaii:~ baker$ gcc -v
> Using built-in specs.
> Target: i686-apple-darwin10
> Configured with: /var/tmp/gcc/gcc-5666.3~6/src/configure --disable-checking 
> --enable-werror --prefix=/usr --mandir=/share/man 
> --enable-languages=c,objc,c++,obj-c++ 
> --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib 
> --build=i686-apple-darwin10 --program-prefix=i686-apple-darwin10- 
> --host=x86_64-apple-darwin10 --target=i686-apple-darwin10 
> --with-gxx-include-dir=/include/c++/4.2.1
> Thread model: posix
> gcc version 4.2.1 (Apple Inc. build 5666) (dot 3)

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



Re: [OMPI devel] [OMPI svn] svn:open-mpi r27601 - trunk

2012-11-14 Thread Larry Baker
Paul,

1) I wasn't trying to solve the --version issue, only the parsing of the 
response.
2) I assumed from the initial e-mail that the broken parser was in a Perl 
script.  I'm not a Perl person, so I wrote the example regular expression 
parser in sed.

These commands were done on my Mac OS X 10.6 system.  I have no idea where the 
apps came from.  I know the sed, at least, does not recognize regular 
expressions documented for GNU sed (such as \< \> for begin/end word).  Maybe 
it is a BSD sed?

I was just trying to illustrate how to fix the broken parsing of Ralph's "flex 
--version".  Assuming the RE parser I wrote is satisfactory, it would have to 
be adapted to fit in the framework, i.e., it has to be portable.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 14 Nov 2012, at 5:41 PM, Paul Hargrove wrote:

> On Wed, Nov 14, 2012 at 6:26 PM, Larry Baker <ba...@usgs.gov> wrote:
> m4 --version | sed -n -E -e 
> '1s/^.*[^A-Za-z0-9_-]?([0-9]+[.][0-9]+[.][0-9]+)[^A-Za-z0-9_-]?.*$/\1/p'
> 
>  
> There are STILL problems with this approach as it is TWICE specific to GNU 
> software:
> 
> 1) M4 on OpenBSD (maybe others) doesn't support a "--version" flag:
> $ m4 --version | sed -n -E -e 
> '1s/^.*[^A-Za-z0-9_-]?([0-9]+[.][0-9]+[.][0-9]+)[^A-Za-z0-9_-]?.*$/\1/p'
> m4: unknown option -- -
> usage: m4 [-gPs] [-Dname[=value]] [-d flags] [-I dirname] [-o filename]
> [-t macro] [-Uname] [file ...]
> 
> 2) sed on Solaris (maybe others) doesn't support a "-E" flag:
> $ m4 --version | sed -n -E -e 
> '1s/^.*[^A-Za-z0-9_-]?([0-9]+[.][0-9]+[.][0-9]+)[^A-Za-z0-9_-]?.*$/\1/p'
> /bin/sed: illegal option -- E
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [OMPI svn] svn:open-mpi r27601 - trunk

2012-11-14 Thread Larry Baker
Ralph,

Try sed -n -E -e 
'1s/^.*[^A-Za-z0-9_-]?([0-9]+[.][0-9]+[.][0-9]+)[^A-Za-z0-9_-]?.*$/\1/p' (or 
its equivalent in Perl).

-n = Don't print out lines that do not match the pattern
-E = Telsl sed to recognize +
pattern = .. (no attempt to rule out nonsense like 
0.0.0)

> savaii:~ baker$ m4 --version
> GNU M4 1.4.6
> Copyright (C) 2006 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> Written by Rene' Seindal.
> 
> savaii:~ baker$ m4 --version | sed -n -E -e 
> '1s/^.*[^A-Za-z0-9_-]?([0-9]+[.][0-9]+[.][0-9]+)[^A-Za-z0-9_-]?.*$/\1/p'
> 1.4.6

> savaii:~ baker$ gcc --version
> i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
> Copyright (C) 2007 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> 
> savaii:~ baker$ gcc --version | sed -n -E -e 
> '1s/^.*[^A-Za-z0-9_-]?([0-9]+[.][0-9]+[.][0-9]+)[^A-Za-z0-9_-]?.*$/\1/p'
> 4.2.1

> savaii:~ baker$ flex --version
> flex 2.5.35
> 
> savaii:~ baker$ flex --version | sed -n -E -e 
> '1s/^.*[^A-Za-z0-9_-]?([0-9]+[.][0-9]+[.][0-9]+)[^A-Za-z0-9_-]?.*$/\1/p'
> 2.5.35

To match Ralph's sample failure:

> savaii:~ baker$ echo "flex 2.5.35 Apple(flex-31)" | sed -n -E -e 
> '1s/^.*[^A-Za-z0-9_-]?([0-9]+[.][0-9]+[.][0-9]+)[^A-Za-z0-9_-]?.*$/\1/p'
> 2.5.35

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 14 Nov 2012, at 3:26 PM, Ralph Castain wrote:

> Sorry Nathan - I had to revert this out as it broke builds on Mac ML. The 
> problem is that the find_and_check parser looks for parens to find the 
> version number, expecting something like this:
> 
> $ m4 --version
> m4 (GNU M4) 1.4.16
> 
> or this:
> 
> $ gcc --version
> i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1
> 
> However, on Mac ML, you get this for flex:
> 
> $ flex --version
> flex 2.5.35 Apple(flex-31)
> 
> And so the parser incorrectly rejects the flex version. We'll have to come up 
> with a more robust way of getting version numbers so we can do this test.
> 
> 
> 
> On Nov 12, 2012, at 11:28 PM, svn-commit-mai...@open-mpi.org wrote:
> 
>> Author: hjelmn (Nathan Hjelm)
>> Date: 2012-11-13 02:28:10 EST (Tue, 13 Nov 2012)
>> New Revision: 27601
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27601
>> 
>> Log:
>> enforce minimum flex version in autogen.pl
>> 
>> Text files modified: 
>> trunk/autogen.pl | 4 
>> 1 files changed, 4 insertions(+), 0 deletions(-)
>> 
>> Modified: trunk/autogen.pl
>> ==
>> --- trunk/autogen.pl Tue Nov 13 02:21:10 2012(r27600)
>> +++ trunk/autogen.pl 2012-11-13 02:28:10 EST (Tue, 13 Nov 2012)  (r27601)
>> @@ -56,11 +56,13 @@
>> my $ompi_automake_version = "1.11.1";
>> my $ompi_autoconf_version = "2.65";
>> my $ompi_libtool_version = "2.2.6b";
>> +my $ompi_flex_version = "2.5.35";
>> 
>> # Search paths
>> my $ompi_autoconf_search = "autoconf";
>> my $ompi_automake_search = "automake";
>> my $ompi_libtoolize_search = "libtoolize;glibtoolize";
>> +my $ompi_flex_search = "flex";
>> 
>> # One-time setup
>> my $username;
>> @@ -797,6 +799,7 @@
>>   GNU Autoconf: $ompi_autoconf_version
>>   GNU Automake: $ompi_automake_version
>>   GNU Libtool: $ompi_libtool_version
>> +Flex: $ompi_flex_version
>> =\n";
>>   my_exit(1);
>> }
>> @@ -1015,6 +1018,7 @@
>> _and_check("autoconf", $ompi_autoconf_search, $ompi_autoconf_version);
>> _and_check("libtool", $ompi_libtoolize_search, $ompi_libtool_version);
>> _and_check("automake", $ompi_automake_search, $ompi_automake_version);
>> +_and_check("flex", $ompi_flex_search, $ompi_flex_version);
>> 
>> #---
>> 
>> ___
>> svn mailing list
>> s...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] r27078 and OMPI build

2012-08-24 Thread Larry Baker
Nice catch!

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 24 Aug 2012, at 4:55 PM, Paul Hargrove wrote:

> OK, I have a vanilla configure+make running on both SPARC/Solaris-10 and 
> AMD64/Solaris-11.
> I am using the 12.3 Oracle compilers in both cases to match the original 
> report.
> I'll post the results when they complete.
> 
> In the meantime, I took a quick look at the code and have a pretty reasonable 
> guess as to the cause.
> Looking at ompi/mca/coll/ml/coll_ml.h I see:
> 
>827  int mca_coll_ml_memsync_intra(mca_coll_ml_module_t *module, int 
> bank_index);
> [...]
>996  static inline __opal_attribute_always_inline__
>997  int 
> mca_coll_ml_buffer_recycling(mca_coll_ml_collective_operation_progress_t 
> *ml_request)
>998  {
> [...]
>   1023  rc = mca_coll_ml_memsync_intra(ml_module, 
> ml_memblock->memsync_counter);
> [...]
>   1041  }
> 
> Based on past experience w/ the Sun/Oracle compilers on another project (See 
> http://bugzilla.hcs.ufl.edu/cgi-bin/bugzilla3/show_bug.cgi?id=193 ), I 
> suspect that this static-inline-always function is being emitted by the 
> compiler in every object which includes this header even if they don't call 
> it..  The call on line 1023 then results in the undefined reference to 
> mca_coll_ml_memsync_intra.  Basically it is not safe for an inline function 
> in a header to call an extern function that isn't available to every object 
> that includes the header REGARDLESS of whether the object invokes the inline 
> function or not.
> 
> -Paul
> 
> 
> 
> On Fri, Aug 24, 2012 at 4:40 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Oracle uses an abysmally complicated configure line, but nearly all of it is 
> irrelevant to the problem here. For this, I would suggest just doing a 
> vanilla ./configure - if the component gets pulled into libmpi, then we know 
> there is a problem.
> 
> Thanks!
> 
> Just FYI: here is there actual configure line, just in case you spot 
> something problematic:
> 
> CC=cc CXX=CC F77=f77 FC=f90  --with-openib  --enable-openib-connectx-xrc  
> --without-udapl 
> --disable-openib-ibcm  --enable-btl-openib-failover   --without-dtrace  
> --enable-heterogeneous
> --enable-cxx-exceptions --enable-shared --enable-orterun-prefix-by-default 
> --with-sge
> --enable-mpi-f90 --with-mpi-f90-size=small  --disable-peruse --disable-state 
> --disable-mpi-thread-multiple   --disable-debug  --disable-mem-debug  
> --disable-mem-profile 
> CFLAGS="-xtarget=ultra3 -m32 -xarch=sparcvis2 -xprefetch -xprefetch_level=2 
> -xvector=lib -Qoption
> cg -xregs=no%appl -xdepend=yes -xbuiltin=%all -xO5"  
> CXXFLAGS="-xtarget=ultra3 -m32
> -xarch=sparcvis2 -xprefetch -xprefetch_level=2 -xvector=lib -Qoption cg 
> -xregs=no%appl -xdepend=yes
> -xbuiltin=%all -xO5 -Bstatic -lCrun -lCstd -Bdynamic"  
> FFLAGS="-xtarget=ultra3 -m32 -xarch=sparcvis2
> -xprefetch -xprefetch_level=2 -xvector=lib -Qoption cg -xregs=no%appl 
> -stackvar -xO5" 
> FCFLAGS="-xtarget=ultra3 -m32 -xarch=sparcvis2 -xprefetch -xprefetch_level=2 
> -xvector=lib -Qoption
> cg -xregs=no%appl -stackvar -xO5"  
> --prefix=/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/installs/JA08/install
>  
> --mandir=${prefix}/man  --bindir=${prefix}/bin  --libdir=${prefix}/lib 
> --includedir=${prefix}/include   
> --with-tm=/ws/ompi-tools/orte/torque/current/shared-install32 
> --enable-contrib-no-build=vt --with-package-string="Oracle Message Passing 
> Toolkit "
> --with-ident-string="@(#)RELEASE VERSION 1.9openmpi-1.5.4-r1.9a1r27092"
> 
> and the error he gets is:
> 
> make[2]: Entering directory
> `/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/mpi-install/s3rI/src/openmpi-1.9a1r27092/ompi/tools/ompi_info'
>   CCLD ompi_info
> Undefined first referenced
>  symbol   in file
> mca_coll_ml_memsync_intra   ../../../ompi/.libs/libmpi.so
> ld: fatal: symbol referencing errors. No output written to .libs/ompi_info
> make[2]: *** [ompi_info] Error 2
> make[2]: Leaving directory
> `/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/mpi-install/s3rI/src/openmpi-1.9a1r27092/ompi/tools/ompi_info'
> make[1]: *** [install-recursive] Error 1
> make[1]: Leaving directory
> `/workspace/euloh/hpc/mtt-scratch/burl-ct-t2k-3/ompi-tarball-testing/mpi-install/s3rI/src/openmpi-1.9a1r27092/ompi'
> make: *** [install-recursive] Error 1
> 
> On Aug 24, 2012, at 4:30 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
>> I have access to a few different Solaris machines and can of

Re: [OMPI devel] [OMPI svn-docs] svn:open-mpi-tests r2002 - trunk/ibm/collective

2012-07-11 Thread Larry Baker
The value of i is exactly as it would be in C for the value of a loop control 
variable at loop exit.  (As opposed to being undefined, which is what is used 
to be.)  This dates from Fortran-77.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



On 11 Jul 2012, at 10:44 AM, Jeff Squyres wrote:

> Ya, I saw Brian's commit, too.
> 
> Ah, I see what happens -- i is actually 101, not 100.  Frackin' Fortran...
> 
> 
> On Jul 11, 2012, at 12:57 PM, Eugene Loh wrote:
> 
>> Brian caught it.  I simply applied the change to the other ibarrier_f* 
>> tests.  With this and your "remove bozo debug statements" (+ sleeps) 
>> putbacks (26768/trunk and 26769/v1.7), I'm hoping our ibarrier_f* MTT 
>> time-outs will disappear.
>> 
>> On 7/11/2012 9:26 AM, Jeff Squyres wrote:
>>> I thought i would be 100 at the end of that do loop.
>>> 
>>> $%#@#@$% Fortran.  :-(
>>> 
>>> 
>>> On Jul 11, 2012, at 12:25 PM,<svn-commit-mai...@open-mpi.org>  wrote:
>>> 
>>>> Author: eugene (Eugene Loh)
>>>> Date: 2012-07-11 12:25:09 EDT (Wed, 11 Jul 2012)
>>>> New Revision: 2002
>>>> 
>>>> Log:
>>>> Apply the "right value when calling waitall" fix to
>>>> all ibm/collective/ibarrier_f* tests.
>>>> 
>>>> Text files modified:
>>>>  trunk/ibm/collective/ibarrier_f.f90   | 4 ++--
>>>>  trunk/ibm/collective/ibarrier_f08.f90 | 3 ++-
>>>>  trunk/ibm/collective/ibarrier_f90.f90 | 3 ++-
>>>>  3 files changed, 6 insertions(+), 4 deletions(-)
>>>> 
>>>> Modified: trunk/ibm/collective/ibarrier_f.f90
>>>> ==
>>>> --- trunk/ibm/collective/ibarrier_f.f90Wed Jul 11 12:03:04 2012
>>>> (r2001)
>>>> +++ trunk/ibm/collective/ibarrier_f.f902012-07-11 12:25:09 EDT (Wed, 
>>>> 11 Jul 2012)  (r2002)
>>>> @@ -31,6 +31,7 @@
>>>> ! Comments may be sent to:
>>>> !Richard Treumann
>>>> !treum...@kgn.ibm.com
>>>> +! Copyright (c) 2012  Oracle and/or its affiliates.
>>>> 
>>>>  program ibarrier
>>>>  implicit none
>>>> @@ -57,8 +58,7 @@
>>>>  do i = 1, 100
>>>> call MPI_Ibarrier(MPI_COMM_WORLD, req(i), ierr)
>>>>  end do
>>>> -  i = 100
>>>> -  call MPI_Waitall(i, req, statuses, ierr)
>>>> +  call MPI_Waitall(100, req, statuses, ierr)
>>>> 
>>>>  call MPI_Barrier(MPI_COMM_WORLD, ierr)
>>>>  call MPI_Finalize(ierr)
>>>> 
>>>> Modified: trunk/ibm/collective/ibarrier_f08.f90
>>>> ==
>>>> --- trunk/ibm/collective/ibarrier_f08.f90  Wed Jul 11 12:03:04 2012
>>>> (r2001)
>>>> +++ trunk/ibm/collective/ibarrier_f08.f90  2012-07-11 12:25:09 EDT (Wed, 
>>>> 11 Jul 2012)  (r2002)
>>>> @@ -31,6 +31,7 @@
>>>> ! Comments may be sent to:
>>>> !Richard Treumann
>>>> !treum...@kgn.ibm.com
>>>> +! Copyright (c) 2012  Oracle and/or its affiliates.
>>>> 
>>>>  program ibarrier
>>>>  use mpi_f08
>>>> @@ -56,7 +57,7 @@
>>>>  do i = 1, 100
>>>> call MPI_Ibarrier(MPI_COMM_WORLD, req(i))
>>>>  end do
>>>> -  call MPI_Waitall(i, req, MPI_STATUSES_IGNORE)
>>>> +  call MPI_Waitall(100, req, MPI_STATUSES_IGNORE)
>>>> 
>>>>  call MPI_Barrier(MPI_COMM_WORLD)
>>>>  call MPI_Finalize()
>>>> 
>>>> Modified: trunk/ibm/collective/ibarrier_f90.f90
>>>> ==
>>>> --- trunk/ibm/collective/ibarrier_f90.f90  Wed Jul 11 12:03:04 2012
>>>> (r2001)
>>>> +++ trunk/ibm/collective/ibarrier_f90.f90  2012-07-11 12:25:09 EDT (Wed, 
>>>> 11 Jul 2012)  (r2002)
>>>> @@ -31,6 +31,7 @@
>>>> ! Comments may be sent to:
>>>> !Richard Treumann
>>>> !treum...@kgn.ibm.com
>>>> +! Copyright (c) 2012  Oracle and/or its affiliates.
>>>> 
>>>>  program ibarrier
>>>>  use mpi
>>>> @@ -57,7 +58,7 @@
>>>>  do i = 1, 100
>>>> call MPI_Ibarrier(MPI_COMM_WORLD, req(i), ierr)
>>>>  end do
>>>> -  call MPI_Waitall(i, req, statuses, ierr)
>>>> +  call MPI_Waitall(100, req, statuses, ierr)
>>>> 
>>>>  call MPI_Barrier(MPI_COMM_WORLD, ierr)
>>>>  call MPI_Finalize(ierr)
>>>> ___
>>>> svn-docs-full mailing list
>>>> svn-docs-f...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-docs-full
>>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] MPI_Cart_coords_f segv with Intel compiler

2012-05-24 Thread Larry Baker
Terry,

What you are seeing is a bug in the vectorizer in the Intel 2011.6.233 release. 
 We've talked about this before.  You should probably remove that compiler from 
your system(s).  I think the new release of OpenMPI describes this problem, but 
does not stop if from occurring.  I write a patch for ptmalloc2/malloc.c for 
OpenMPI 1.4.3 which automatically adjusts the optimization level for 
_int_malloc(), which is where the bug occurs.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

-- Start of Patch --
--- opal/mca/memory/ptmalloc2/malloc.c.original 2010-04-13 10:30:26.0 
-0700
+++ opal/mca/memory/ptmalloc2/malloc.c  2011-11-04 15:01:37.0 -0700
@@ -2,6 +2,17 @@
 /* Copyright (c) 2010  Cisco Systems, Inc.  All rights reserved.
  */
 
+/* With Intel Composer XE V12.1.0, release 2011.6.233, any launch   */
+/* fails, even before main(), due to a bug in the vectorizer (see   */
+/* https://svn.open-mpi.org/trac/ompi/changeset/25290).  The fix is */
+/* to disable vectorization by reducing the optimization level to   */
+/* -O1 for _int_malloc().  The only reliable method to identify */
+/* release 2011.6.233 is the predefined __INTEL_COMPILER_BUILD_DATE */
+/* macro, which will have the value 20110811 (Linux, Windows, and   */
+/* Mac OS X).  (The predefined __INTEL_COMPILER macro is nonsense,  */
+/* , and both the 2011.6.233 and 2011.7.256 releases identify   */
+/* themselves as V12.1.0 from the -v command line option.)  */
+
 #define OPAL_DISABLE_ENABLE_MEM_DEBUG 1
 #include "opal_config.h"
 
@@ -3945,6 +3956,12 @@
   -- malloc --
 */
 
+#ifdef __INTEL_COMPILER_BUILD_DATE
+#if __INTEL_COMPILER_BUILD_DATE == 20110811
+#pragma GCC optimization_level 1
+#endif
+#endif
+
 Void_t*
 _int_malloc(mstate av, size_t bytes)
 {
-- End of Patch --

On 24 May 2012, at 6:54 AM, TERRY DONTJE wrote:

> I am seeing several Cart Fortran tests (like MPI_Cart_coords_f) segv in 
> opal_memory_ptmalloc2_int_free when OMPI trunk is compiled with icc 12.1.0 
> for 64 bit on linux.   Just wondering if anyone has seen anything similar to 
> this with a different version of icc.  Other non-Intel compilers seem to not 
> exhibit this issue.
> 
> -- 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] Clam AntiVirus

2012-03-22 Thread Larry Baker
I was reading the FAQs for the ClamAV anti-virus program (included on  
Mac OS X) at http://www.clamav.net/lang/en/faq/faq-upgrade/.  At the  
end is a note that caught my eye about problem compilers.


ClamAV supports a wide variety of compilers, hardware and operating  
systems. Our core compiler is gcc with Linux on 32 and 64 bit Intel  
platforms, though we also test using other compilers, including  
Sun’s C compiler, Microsoft’s Visual Studio, Intel’s C compiler,  
LLVM-GCC, and others. To date we have only found one compiler that  
we do not support, GCC version 4.0.0 to 4.0.1 inclusive. We have  
found that version of the compiler produces incorrect code on all of  
the platforms and operating systems on which we have tested it.  
ClamAV will not work using that compiler and you MUST switch to an  
alternative, such as GCC3.4 or GCC4.1. Please contact your vendor  
for further information. Please refer to gcc’s bugzilla for further  
information. If you want to see a proof of why gcc 4.0.1 generates  
wrong code for the kernel read the relevant article on kerneltrap.  
More information about this bug is also available in our bugzilla .  
Our configure scripts will detect if your compiler is affected by  
this bug and refuse to generate a non working binary with the  
following error message: your compiler has gcc PR26763-2 bug, use a  
different compiler . If you are on MacOS X, you can try an  
alternative compiler, LLVM-GCC4.2-2.2, which has official binaries  
available

Last update: Apr 15, 2010



Two things occurred to me that might be appropriate for OpenMPI:

1) If GCC4.0.1 is as bad as it sounds, that might be worth mentioning  
in the OpenMPI README.
2) OpenMPI might borrow ClamAV's configuration logic to recognize GCC  
4.0.1 (and, maybe the faulty Intel V12.x compiler) and balk.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



Re: [OMPI devel] Collective communications may be abend when it use over 2GiB buffer

2012-03-05 Thread Larry Baker

George,

I think Yuki's interpretation is correct.


The following is one of the suspicious parts.
(Many similar code in ompi/coll/tuned/*.c)

--- in ompi/coll/tuned/coll_tuned_allgather.c (V1.4.X's trunk)---
398tmprecv = (char*) rbuf + rank * rcount * rext;
-

if this condition is met, "rank * rcount" is overflowed.
So, we fixed it tentatively like following:
(cast int to size_t)
--- in ompi/coll/tuned/coll_tuned_allgather.c --
398tmprecv = (char*) rbuf + (size_t)rank * rcount * rext;



Based on my understanding of the C standard this operation should be  
done on the most extended type, in this particular case the one of  
the rext (ptrdiff_t). Thus I would say the displacement should be  
correctly computed.


In my copy of C99, section 6.5 Expressions says " the order of  
evaluation of subexpressions and the order in which side effects take  
place are both unspecified.  There is a footnote 71 that "specifies  
the precedence of operators in the evaluation of an expressions, which  
is the same as the order of the major subclauses of this subclause,  
highest precedence first."  It is the footnote that implies  
multiplication (6.5.5 Multiplicative operators) has higher precedence  
than addition (6.5.6 Additive operators) in the expression "(char*)  
rbuf + rank * rcount * rext".  But, the main text states that there is  
no ordering of the subexpression "rank * rcount * rext".  When the  
compiler chooses to evaluate "rank * rcount" first, the overflow  
described by Yuki can result.  I think you are correct that the  
subexpression will get promoted to (ptrdiff_t), but that is not quite  
the same thing.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



Re: [OMPI devel] 1.5 supported systems

2012-02-29 Thread Larry Baker

Paul,

Thanks.  As it happened, I just had someone come to me yesterday  
looking for help compiling OpenMPI on a Mac using the Intel  
compilers.  I compiled and validated the 1.5.3 distribution on my Mac  
(10.5.8) using Intel V11.1 (not the latest).  He has the V11.1  
compilers on his Mac, which is running 10.6.  I'll be sure to report  
any problems we run into.


I do not have time to systematically go through my 1.4.3 patches  
against the 1.5.5 code, but I did just look at configure in the  
nightly 1.5.5rc3 candidate.  I don't remember which platform/compiler  
caused me to fix this, but I still see a logic problem in enabling  
support for Fortran<->C data marshaling.  (It was probably a make  
check failure.)  In configure, I added extra checks to make sure that  
the data formats are the same between the Fortran and C compilers.   
The 1.5.5 configure still has the incomplete tests.


This is the snippet of my patch to the 1.4.3 configure with the added  
logic (the line numbers will be wrong, but you get the idea):



@@ -47269,7 +47277,7 @@
 # there are some places in the code where we have to have  
*something*.


 cat >>confdefs.h <<_ACEOF
-#define OMPI_HAVE_FORTRAN_REAL16 $ofc_have_type
+#define OMPI_HAVE_FORTRAN_REAL16 ( $ofc_have_type &&  
OMPI_REAL16_MATCHES_C )

 _ACEOF


@@ -50744,7 +50752,7 @@
 # there are some places in the code where we have to have  
*something*.


 cat >>confdefs.h <<_ACEOF
-#define OMPI_HAVE_FORTRAN_COMPLEX32 $ofc_have_type
+#define OMPI_HAVE_FORTRAN_COMPLEX32 ( $ofc_have_type &&  
OMPI_REAL16_MATCHES_C )

 _ACEOF


@@ -58335,7 +58343,7 @@


 cat >>confdefs.h <<_ACEOF
-#define OMPI_HAVE_F90_REAL16 $ofc_have_type
+#define OMPI_HAVE_F90_REAL16 ( $ofc_have_type &&  
OMPI_REAL16_MATCHES_C )

 _ACEOF


@@ -60152,7 +60160,7 @@


 cat >>confdefs.h <<_ACEOF
-#define OMPI_HAVE_F90_COMPLEX32 $ofc_have_type
+#define OMPI_HAVE_F90_COMPLEX32 ( $ofc_have_type &&  
OMPI_REAL16_MATCHES_C )

 _ACEOF





Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 28 Feb 2012, at 10:05 PM, Paul Hargrove wrote:


I went ahead and tried Intel's latest compilers for MacOS 10.6.
They don't yet support MacOS 10.7.

All looks good w/ these compilers and the 1.5.5rc3 tarball.
I think this testing is too preliminary to consider this a  
"supported" compiler.


-Paul

On Wed, Feb 22, 2012 at 6:57 PM, Paul H. Hargrove  
<phhargr...@lbl.gov> wrote:

I have NOT been running Intel's compilers on Macs, only on Linux.
I *tried* PGI's compilers on MacOS, but that was a flop.
I have used Clang (comes w/ XCode 4.2) on MacOS, and that works for  
me but is not extensively tested.


-Paul

On 2/22/2012 6:13 PM, Larry Baker wrote:


Paul,

Haven't you been running Intel compilers on OS X?

Also, do we have specifics about which gcc's on Mac OS X?  I have  
(OS X 10.5.8):



savaii:~ baker$ ls -l /usr/bin/gcc*
lrwxr-xr-x  1 root  wheel   7 Oct  2  2009 /usr/bin/gcc ->  
gcc-4.0

-r-xr-xr-x  1 root  wheel  258368 Feb 19  2008 /usr/bin/gcc-3.3
-rwxr-xr-x  1 root  wheel   93088 Feb  5  2009 /usr/bin/gcc-4.0
-rwxr-xr-x  1 root  wheel  105680 Apr 27  2009 /usr/bin/gcc-4.2



savaii:~ baker$ ls -l /usr/bin/cc*
lrwxr-xr-x  1 root  wheel  7 Oct  2  2009 /usr/bin/cc -> gcc-4.0



savaii:~ baker$ ls /Developer/usr/llvm-gcc-4.2/bin/*cc*
/Developer/usr/llvm-gcc-4.2/bin/i686-apple-darwin9-llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/powerpc-apple-darwin9-llvm-gcc-4.2



Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 22 Feb 2012, at 5:55 PM, Paul H. Hargrove wrote:

Folks at Oracle should decide, but I suspect "Solaris 10" should  
be updated to "Solaris 10 and 11", or just "11".


-Paul

On 2/22/2012 2:44 PM, Jeffrey Squyres wrote:
Please verify this list of supported systems for the v1.5.5  
release:


- The run-time systems that are currently supported are:
  - rsh / ssh
  - LoadLeveler
  - PBS Pro, Open PBS, Torque
  - Platform LSF (v7.0.2 and later)
  - SLURM
  - Cray XT-3, XT-4, and XT-5
  - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine
  - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008)

- Systems that have been tested are:
  - Linux (various flavors/distros), 32 bit, with gcc, and Oracle
Solaris Studio 12
  - Linux (various flavors/distros), 64 bit (x86), with gcc,  
Absoft,

Intel, Portland, and Oracle Solaris Studio 12 compilers (*)
  - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and
Absoft compilers (*)
  - Oracle Solaris 10, 32 and 64 bit (SPARC, i386, x86_64), with
Oracle Solaris Studio 12

  (*) Be sure to read the Compiler Notes, below.

- Other systems have been lightly (but not fully tested):
  - Other 64 bit platforms (e.g., Linux on PPC64)
  - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008);
see the

Re: [OMPI devel] 1.5 supported systems

2012-02-23 Thread Larry Baker
I'm not yet using the Mac OS X LLVM compilers.  I have been under the  
impression that LLVM compilers are not GNU compilers.  However, given  
the names llvm-gcc-x.x, I guess they are some sort of hybrid.  (gcc  
front end, LLVM backend?)  I agree with Jeff's point about not getting  
too specific about gcc version numbers unless there are known  
problems.  However, if someone told me that gcc was supported, I would  
not automatically assume that meant llvm-gcc.  As Paul showed us, the  
"gcc" command on Mac OS X 10.7 is a soft link to an llvm-gcc compiler,  
not a gcc compiler.  When we say that "gcc" is supported, is that  
intended to mean the command or the compiler?  I would assume it meant  
the latter.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 23 Feb 2012, at 4:44 AM, Jeffrey Squyres wrote:

I don't think I want to get specific about the gcc versions on any  
platform, unless we know that they *don't* work.  There's too many  
versions and variants of gcc out there to have an inclusive list --  
I'd rather have an *exclusive* list.



On Feb 23, 2012, at 3:39 AM, Paul Hargrove wrote:


And here is the 10.7 machine as promised:

ProductName:Mac OS X
ProductVersion: 10.7.3
BuildVersion:   11D50b
lrwxr-xr-x  1 root  wheel  12 Oct 27 14:01 /usr/bin/gcc -> llvm- 
gcc-4.2


-Paul

On Wed, Feb 22, 2012 at 7:44 PM, Paul H. Hargrove  
<phhargr...@lbl.gov> wrote:
I can get exact info from my MacOS 10.7 machine later, but its gcc  
is llvm-gcc-4.2 IIRC.

Here are my 10.5 and 10.6:

ProductName:Mac OS X
ProductVersion: 10.5.8
BuildVersion:   9L31a
powerpc
lrwxr-xr-x  1 root  wheel   7 Nov  1  2008 /usr/bin/gcc ->  
gcc-4.0

-r-xr-xr-x  1 root  wheel  258368 Feb 19  2008 /usr/bin/gcc-3.3
-rwxr-xr-x  1 root  wheel   93088 Jul 17  2008 /usr/bin/gcc-4.0
-rwxr-xr-x  1 root  wheel  105680 May 18  2008 /usr/bin/gcc-4.2

ProductName:Mac OS X
ProductVersion: 10.5.8
BuildVersion:   9L30
i386
lrwxr-xr-x  1 root  wheel  7 Nov  8  2007 /usr/bin/gcc -> gcc-4.0
-rwxr-xr-x  1 root  wheel  93072 Sep 23  2007 /usr/bin/gcc-4.0

ProductName:Mac OS X
ProductVersion: 10.6.8
BuildVersion:   10K549
i386
lrwxr-xr-x  1 root  wheel   7 Sep 29  2009 /usr/bin/gcc ->  
gcc-4.2

-rwxr-xr-x  1 root  wheel   97392 May 18  2009 /usr/bin/gcc-4.0
-rwxr-xr-x  1 root  wheel  166128 May 18  2009 /usr/bin/gcc-4.2


On 2/22/2012 6:13 PM, Larry Baker wrote:

Paul,

Haven't you been running Intel compilers on OS X?

Also, do we have specifics about which gcc's on Mac OS X?  I have  
(OS X 10.5.8):



savaii:~ baker$ ls -l /usr/bin/gcc*
lrwxr-xr-x  1 root  wheel   7 Oct  2  2009 /usr/bin/gcc ->  
gcc-4.0

-r-xr-xr-x  1 root  wheel  258368 Feb 19  2008 /usr/bin/gcc-3.3
-rwxr-xr-x  1 root  wheel   93088 Feb  5  2009 /usr/bin/gcc-4.0
-rwxr-xr-x  1 root  wheel  105680 Apr 27  2009 /usr/bin/gcc-4.2



savaii:~ baker$ ls -l /usr/bin/cc*
lrwxr-xr-x  1 root  wheel  7 Oct  2  2009 /usr/bin/cc -> gcc-4.0



savaii:~ baker$ ls /Developer/usr/llvm-gcc-4.2/bin/*cc*
/Developer/usr/llvm-gcc-4.2/bin/i686-apple-darwin9-llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/powerpc-apple-darwin9-llvm-gcc-4.2


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 22 Feb 2012, at 5:55 PM, Paul H. Hargrove wrote:

Folks at Oracle should decide, but I suspect "Solaris 10" should  
be updated to "Solaris 10 and 11", or just "11".


-Paul

On 2/22/2012 2:44 PM, Jeffrey Squyres wrote:
Please verify this list of supported systems for the v1.5.5  
release:


- The run-time systems that are currently supported are:
 - rsh / ssh
 - LoadLeveler
 - PBS Pro, Open PBS, Torque
 - Platform LSF (v7.0.2 and later)
 - SLURM
 - Cray XT-3, XT-4, and XT-5
 - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine
 - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008)

- Systems that have been tested are:
 - Linux (various flavors/distros), 32 bit, with gcc, and Oracle
   Solaris Studio 12
 - Linux (various flavors/distros), 64 bit (x86), with gcc,  
Absoft,

   Intel, Portland, and Oracle Solaris Studio 12 compilers (*)
 - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and
   Absoft compilers (*)
 - Oracle Solaris 10, 32 and 64 bit (SPARC, i386, x86_64), with
   Oracle Solaris Studio 12

 (*) Be sure to read the Compiler Notes, below.

- Other systems have been lightly (but not fully tested):
 - Other 64 bit platforms (e.g., Linux on PPC64)
 - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008);
   see the README.WINDOWS file.



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] 1.5 supported systems

2012-02-22 Thread Larry Baker

Paul,

Haven't you been running Intel compilers on OS X?

Also, do we have specifics about which gcc's on Mac OS X?  I have (OS  
X 10.5.8):



savaii:~ baker$ ls -l /usr/bin/gcc*
lrwxr-xr-x  1 root  wheel   7 Oct  2  2009 /usr/bin/gcc -> gcc-4.0
-r-xr-xr-x  1 root  wheel  258368 Feb 19  2008 /usr/bin/gcc-3.3
-rwxr-xr-x  1 root  wheel   93088 Feb  5  2009 /usr/bin/gcc-4.0
-rwxr-xr-x  1 root  wheel  105680 Apr 27  2009 /usr/bin/gcc-4.2



savaii:~ baker$ ls -l /usr/bin/cc*
lrwxr-xr-x  1 root  wheel  7 Oct  2  2009 /usr/bin/cc -> gcc-4.0



savaii:~ baker$ ls /Developer/usr/llvm-gcc-4.2/bin/*cc*
/Developer/usr/llvm-gcc-4.2/bin/i686-apple-darwin9-llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/powerpc-apple-darwin9-llvm-gcc-4.2



Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 22 Feb 2012, at 5:55 PM, Paul H. Hargrove wrote:

Folks at Oracle should decide, but I suspect "Solaris 10" should be  
updated to "Solaris 10 and 11", or just "11".


-Paul

On 2/22/2012 2:44 PM, Jeffrey Squyres wrote:

Please verify this list of supported systems for the v1.5.5 release:

- The run-time systems that are currently supported are:
  - rsh / ssh
  - LoadLeveler
  - PBS Pro, Open PBS, Torque
  - Platform LSF (v7.0.2 and later)
  - SLURM
  - Cray XT-3, XT-4, and XT-5
  - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine
  - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008)

- Systems that have been tested are:
  - Linux (various flavors/distros), 32 bit, with gcc, and Oracle
Solaris Studio 12
  - Linux (various flavors/distros), 64 bit (x86), with gcc, Absoft,
Intel, Portland, and Oracle Solaris Studio 12 compilers (*)
  - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and
Absoft compilers (*)
  - Oracle Solaris 10, 32 and 64 bit (SPARC, i386, x86_64), with
Oracle Solaris Studio 12

  (*) Be sure to read the Compiler Notes, below.

- Other systems have been lightly (but not fully tested):
  - Other 64 bit platforms (e.g., Linux on PPC64)
  - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008);
see the README.WINDOWS file.



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] openmpi-1.5.5rc1: 2nd gmake dependence (mostly VT)

2011-12-20 Thread Larry Baker
> I am pretty sure a literal "rm -rf" should be fine.

Not necessarily.  I'm not at work.  But I think either -f or -r might not be 
legal on all Unix's (Tru64 Unix?  AIX?).

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov




On Dec 20, 2011, at 5:19 PM, Paul H. Hargrove wrote:

> For the first time I tried "make clean" on FreeBSD and found /another/ 
> GNU-vs-Berkeley Make problem.
> 
> The problem is use of $(RM) in several Makefile.am's (see below for list).
> The onlt non-VT instance (ompi_info/Makefile.am) occurs in openmpi-1.4.5rc1 
> as well.
> 
> $(RM) is a predefined variable in GNU Make, not provided by Berkeley Make (or 
> by Automake for that matter).
> I am pretty sure a literal "rm -rf" should be fine.
> 
> -Paul
> 
>> $ find openmpi-1.5.5rc1 -name Makefile.am | xargs grep -w RM
>> openmpi-1.5.5rc1/ompi/tools/ompi_info/Makefile.am:  test -z 
>> "$(OMPI_CXX_TEMPLATE_REPOSITORY)" || $(RM) -rf 
>> $(OMPI_CXX_TEMPLATE_REPOSITORY)
>> openmpi-1.5.5rc1/ompi/contrib/vt/vt/extlib/otf/tests/hello/Makefile.am: 
>> $(RM) *.otf *.def *.events *.marker *.otf.z *.def.z *.events.z *.marker.z
>> openmpi-1.5.5rc1/ompi/contrib/vt/vt/extlib/otf/tests/generic_streams-mpi/Makefile.am:
>>$(RM) *.otf *.def *.events *.marker *.otf.z *.def.z *.events.z *.marker.z
>> openmpi-1.5.5rc1/ompi/contrib/vt/vt/extlib/otf/tests/generic_streams/Makefile.am:
>>$(RM) *.otf *.def *.events *.marker *.otf.z *.def.z *.events.z 
>> *.marker.z
>> openmpi-1.5.5rc1/ompi/contrib/vt/vt/extlib/otf/tests/progress/Makefile.am:   
>>$(RM) *.otf *.def *.events *.marker *.otf.z *.def.z *.events.z *.marker.z
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [BUG?] OpenMPI with openib on SPARC64: Signal: Bus error (10)

2011-11-23 Thread Larry Baker
I think of -m32 (and -m64) as really selecting a different compiler.   
My practice is to put those flags in the compiler/linker environment  
variables.  For example:



# ./configure >configure.log 2>&1 \
   --prefix=/usr/local/openmpi --with-sge \
   CC="gcc -m32" \
   CFLAGS="-g -O3" \
   CXX="g++ -m32" \
   CXXFLAGS="-g -O3" \
   FC="gfortran -m32" \
   FCFLAGS="-g -O3" \
   F77="gfortran -m32" \
   FFLAGS="-g -O3"


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 23 Nov 2011, at 10:52 AM, TERRY DONTJE wrote:


On 11/23/2011 1:45 PM, Lukas Razik wrote:


TERRY DONTJE <terry.don...@oracle.com> wrote

Can you build OMPI as a 32 bit library and see if that works any  
better?
So you mean I shall leave the whole OFED stack as 64 bit and build  
only openmpi as 32 bit?
I believe the OFED user libraries will need to be 32 bit also or the  
32 bit MPI libraries will not be able to use them.

How must I configure openmpi that it'll be definitely built as 32bit?
You need to change the CFLAGS, CXXFLAGS, FFLAGS and FCFLAGS in the  
configure line such that you replace "-m64" with "-m32" or just "- 
m32" if "-m64" is not there?

Regards,
Lukas



--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] PGI error invoked when svnversion is unavailable

2011-11-15 Thread Larry Baker
Tom,This is because the code in OpenMPI presumes macros will be expanded in pragmas, but that is not required by the C standard.  (See my e-mails below from last year with PGI, TPR 17186.)  I fixed OpenMPI 1.4.3 configure in the attached patch.  My patch also disables inline assembly for PGI C++, the same as for PGI C.  (Something similar may also have to be done to solve Eugene's asm statement warnings on Solaris 11.)  It also fixes detection of support for marshaling Fortran REAL16 and COMPLEX32 data types. Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov On 15 Nov 2011, at 12:49 PM, Thomas Rothrock CTR SMDC SimCtr/GaN Corporation wrote:I am building on a separate (unnetworked) system than the one I check outSVN sources from, thus subversion was never installed on this system and the"svnversion" command is unavailable.  After configure, this eventuallyresults in OPAL_IDENT_STRING getting set to an empty string ("").  Thisseems to invoke an odd error in the Portland Group (PGI) C compiler (pgcc),such that	#pragma ident ""results in:	PGC-F-0010-File write error occurred (temporary pragma .s file)which is is a bit misleading and took me a while to track down the problem.My testing has shown that the C++ compiler (pgCC) does not fail with thesame error (or any error at all) and completes, but pgcc fails this case inat least all versions since 8.0-6 and probably earlier.  I have filed asupport request with PGI to see what they say about it, but of course thisdoes nothing for current and older versions.  My quick workaround was tojust install subversion so that the empty string never gets set to beginwith.  Ultimately though should OPAL_IDENT_STRING be ending up empty whenthe "svnversion" command is not available?  ---   Tom Rothrock     US Army Space & Missile Defense Command Simulation Center   256-955-3382 (DSN 645)   FAX 256-955-1231    Main SimCtr Phone:  256-955-3750  ---  This email capability is supported by Department of Defense  systems and is subject to monitoring.  Please refrain from  using this address for non-Government purposes.  ---___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/develLarry Baker wrote:Dave,I too read the C99 standard and found that macro substitution is not required in pragmas.  I complained to the OpenMPI folks that they are relying a non-standard feature.  The reason I brought this to your attention is to point out that pgcc and pgCC behave differently (puzzling), neither one describes their behavior, and ident is not documented as a recognized pragma.  Please make sure the documentation is updated to describe whatever the behavior is for pgcc and pgCC.Thanks,Larry BakerUS Geological Survey650-329-5608ba...@usgs.govLarry,Thank you. I have added your remarks to the FTO, and included documentation into itsresolution.daveOn Oct 7, 2010, at 10:46 AM, PGI Technical Support wrote:Larry,An update to TPR 17186. =For c99, it is not required to perform macro replacement in pragmas.However, there are a few exceptions in pgc, such as within 'omp', 'pgi' & 'acc' pragmas.c99 does define a method which effects replacement within pragmas; themethod uses the _Pragma operator; e.g.,#define FOO "foo"#define IDENT(x) _Pragma("ident") xIDENT(FOO)will generate#pragma ident "foo"We will add allowing macro replacement within'#pragma ident'  in our 11.0 release.==regards,davePGI Technical Support wrote:TPR 17186.thanks again,daveLarry Baker wrote:Customer information:Larry BakerUS Geological Survey650-329-5608ba...@usgs.govProduct: 2183-WSPIN: 507549Problem description:pgcc issues the warning Pragma ignored – string expected after #pragma ident when compiling openmpi-1.4.2 from http://www.open-mpi.org.The source of this problem is that OpenMPI #defines the string it wants to use in a #pragma ident instead of using a literal string value.  However, pgcc does not perform macro substitution on #pragma ident statements.  Curiously, pgCC does!  This is not documented anywhere.  Also, #pragma ident is not listed as a recognized pragma, even though it seems to be properly compiled into the ELF object file.  It would be consistent with gcc and pgCC if pgcc would perform macro substitution in pragmas.Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov-- Dave Borer    Customer Service Manager, The Portland Groupemail    dave.bo...@st.comphone    (503)-431-7113-- Dave Borer	Customer Service Manager, The Portland Groupemail		dave.bo...@st.comphone		(503)-431-7113


Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-11-08 Thread Larry Baker
The good news is that the issue reported in R25290 is fixed in the latest Intel compilers release (2011.7.256).  The bad news is that both the 2011.6.233 and 2011.7.256 releases identify themselves as V12.1.0 from the command line.  (I reported this bug to Intel already.)  They can only be reliably distinguished using the predefined __INTEL_COMPILER_BUILD_DATE macro.  I verified that the build dates for all three compilers we have -- Linux, Mac OS X, and Windows -- are the same.I developed a more targeted patch (attached) for OpenMPI 1.4.3 opal/mca/memory/ptmalloc2/malloc.c which disables vectorization for _int_malloc() only if an Intel compiler with the 2011.6.233 release build date is found (__INTEL_COMPILER_BUILD_DATE == 20110811).  This patch could presumably make its way into all the copies of opal/mca/memory/ptmalloc2/malloc.c in the various versions of OpenMPI that are still being maintained. Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov On 17 Oct 2011, at 8:18 PM, George Bosilca wrote:Larry,Sorry for not updating this thread. The issue was identified and fixed by Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290). Please read the comments and the linked thread on the Intel forum for more info about.I couldn't find a trace of this being fixed in the 1.4 series, so I would wait upgrading until this issue gets resolved.  Thanks,    george.On Oct 17, 2011, at 23:00 , Larry Baker wrote:George,I have not had time to look over the 1.4.3 make check failure for Intel 2011.6.233 compilers.  Have you?I had planned to get 1.4.3 compiled on all six of our compilers using the latest compiler releases.  I was putting off upgrading to 1.4.4 or 1.5.x until after that to minimize the number of things that could go wrong.  Do you recommend otherwise? Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov On 7 Oct 2011, at 6:46 PM, George Bosilca wrote:The may_alias attribute was part of a forward-looking attribute checking, at a time where few compiler supported them. This explains why they are not widely used in the library itself. Moreover, as they do not affect the compilation itself (as your test highlights this is not the issue with the icc 2011.6.233 compiler), there is no urge to remove the may_alias support.I just got that particular version of the compiler installed on one of our machines. I'll give it a try over the weekend.  george.On Oct 7, 2011, at 20:21 , Larry Baker wrote:The test for the __may_alias_ attribute uses the following short code snippet:int * p_value __attribute__ ((__may_alias__));intmain (){  ;  return 0;}Indeed, for Intel 2011 compilers prior to 2011.6.233, this results in a warning:root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c may_alias_test.c(123): warning #1292: attribute "__may_alias__" ignored  int * p_value __attribute__ ((__may_alias__));                                ^[root@hydra openmpi-1.4.3]# module unload compilers/intel/2011.5.220[root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c I modified ./configure to forceompi_cv___attribute__may_alias=0Then I compiled and tested the library.  Unfortunately, the results were exactly the same:make  check-TESTSmake[3]: Entering directory `/state/partition1/root/src/openmpi-1.4.3/test/datatype'/bin/sh: line 4: 26326 Segmentation fault      ${dir}$tstFAIL: checksum/bin/sh: line 4: 26359 Segmentation fault      ${dir}$tstFAIL: position2 of 2 tests failedPlease report to http://www.open-mpi.org/community/help/I could not find any use of the may_alias attribute, other than in a #define in opal/include/opal_config_bottom.h.  Is OMPI_HAVE_ATTRIBUTE_MAY_ALIAS just cruft that can be removed? Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov On 7 Oct 2011, at 11:08 AM, Larry Baker wrote:I ran into a problem this past week trying to upgrade our OpenMPI 1.4.3 for the latest Intel 2011 compiler, 2011.6.233.make check fails with Segmentation Fault errors:[root@hydra openmpi-1.4.3]# tail -20 ../openmpi-1.4.3-check-intel.6.233.log/bin/sh ../../libtool --tag=CC   --mode=link icc  -DNDEBUG -g -O3 -finline-functions -fno-strict-aliasing -restrict -pthread -fvisibility=hidden -shared-intel -export-dynamic -shared-intel  -o ddt_pack ddt_pack.o ../../ompi/libmpi.la -lnsl -lutil  libtool: link: icc -DNDEBUG -g -O3 -finline-functions -fno-strict-aliasing -restrict -pthread -fvisibility=hidden -shared-intel -shared-intel -o .libs/ddt_pack ddt_pack.o -Wl,--export-dynamic  ../../ompi/.libs/libmpi.so /usr/local/src/openmpi-1.4.3/orte/.libs/libopen-rte.so /usr/local/src/openmpi-1.4.3/opal/.libs/libopen-pal.so -ldl -lnsl -lutil -pthread -Wl,-rpath -Wl,/usr/local/libmake[3]: Leaving directory `/state/partition1/root/src/openmpi-1.4.3/test/datatype'make  check-T

Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-10-19 Thread Larry Baker
I posted my findings about the bad version no. macros to the same  
thread that described the Intel V12.1 optimizer bug (http://software.intel.com/en-us/forums/showthread.php?t=87132 
).  The response I got is:



Posted By: Hubert Haberstock (Intel)
__

The build date is currently the only suitable macro. This allows to  
check for the Intel Compiler and for specific compiler versions.  
Makes sense? Regards, Hubert.

__


That is contrary to what the online V12.1 documentation says.  I'm  
going to find out what the previous versions do, then report this  
through my normal support channels.  If the documentation is wrong,  
they should fix it; if the documentation is right, they should fix the  
compiler.  (However, there will still be an errant V12.1.0 that  
reports itself as , so use of the version no. macros will never be  
reliable without a hack to handle this errant case.)  I'll report here  
what I find about the values of the version no. macros.  It is  
probably better, though, that automake/libtool rely on the output of  
icc -v, since that seems to always result in a value that matches the  
version of the product (as opposed to #define __INTEL_COMPILER   
and #define __ICC  from within the V12.1.0 compiler).


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 19 Oct 2011, at 10:47 AM, Jeff Squyres wrote:


Did this get reported to the Intel compiler support people?


On Oct 19, 2011, at 8:24 AM, George Bosilca wrote:


Thanks Larry,

Will forward this info upstream.

 george.

On Oct 18, 2011, at 21:56 , Larry Baker wrote:


George,

Thanks for the update.  FYI, here's all the version numbers  
reported by the compiler releases I have installed:



[baker@hydra ~]$ module load compilers/intel/11.1.080
[baker@hydra ~]$ icc -v
Version 11.1
[baker@hydra ~]$ module unload compilers/intel/11.1.080



[baker@hydra ~]$ module load compilers/intel/2011.3.174
[baker@hydra ~]$ icc -v
Version 12.0.3
[baker@hydra ~]$ module unload compilers/intel/2011.3.174



[baker@hydra ~]$ module load compilers/intel/2011.4.191
[baker@hydra ~]$ icc -v
Version 12.0.4
[baker@hydra ~]$ module unload compilers/intel/2011.4.191



[baker@hydra ~]$ module load compilers/intel/2011.5.220
[baker@hydra ~]$ icc -v
Version 12.0.5
[baker@hydra ~]$ module unload compilers/intel/2011.5.220



[baker@hydra ~]$ module load compilers/intel/2011.6.233
[baker@hydra ~]$ icc -v
icc version 12.1.0 (gcc version 4.1.2 compatibility)
[baker@hydra ~]$ module unload compilers/intel/2011.6.233


Another problem I found with the Intel 12.1.0 compiler: I started  
to look at adding a test for the Intel compiler version around the  
#pragma that disables optimization for OpenMPI and I found the  
__ICC and __INTEL_COMPILER predefined macros (compiler version  
no.) are not properly defined:


$ icc -E -dD hello.c | grep __INTEL_COMPILER
#define __INTEL_COMPILER 
#define __INTEL_COMPILER_BUILD_DATE 20110811

$ icc -E -dD hello.c | grep __ICC
#define __ICC 

$ icc -v
icc version 12.1.0 (gcc version 4.1.2 compatibility)

I do not know if there is code in OpenMPI that looks at __ICC and  
__INTEL_COMPILER, but that could cause problems.  (Pass this on  
upstream to the libtool people?)


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 17 Oct 2011, at 8:18 PM, George Bosilca wrote:


Larry,

Sorry for not updating this thread. The issue was identified and  
fixed by Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290 
). Please read the comments and the linked thread on the Intel  
forum for more info about.


I couldn't find a trace of this being fixed in the 1.4 series, so  
I would wait upgrading until this issue gets resolved.


 Thanks,
   george.

On Oct 17, 2011, at 23:00 , Larry Baker wrote:


George,

I have not had time to look over the 1.4.3 make check failure  
for Intel 2011.6.233 compilers.  Have you?


I had planned to get 1.4.3 compiled on all six of our compilers  
using the latest compiler releases.  I was putting off upgrading  
to 1.4.4 or 1.5.x until after that to minimize the number of  
things that could go wrong.  Do you recommend otherwise?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 7 Oct 2011, at 6:46 PM, George Bosilca wrote:

The may_alias attribute was part of a forward-looking attribute  
checking, at a time where few compiler supported them. This  
explains why they are not widely used in the library itself.  
Moreover, as they do not affect the compilation itself (as your  
test highlights this is not the issue with the icc 2011.6.233  
compiler), there is no urge to remove the may_alias support.


I just got that particular version of the compiler installed on  
one of our machines. I'll give it a try over the weekend.


 george.

On Oct 7, 2011, at 20:21 , Larry Baker wrote:

The test for the __may_alias_ attribute uses the following  
short code

Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-10-18 Thread Larry Baker

George,

Thanks for the update.  FYI, here's all the version numbers reported  
by the compiler releases I have installed:



[baker@hydra ~]$ module load compilers/intel/11.1.080
[baker@hydra ~]$ icc -v
Version 11.1
[baker@hydra ~]$ module unload compilers/intel/11.1.080



[baker@hydra ~]$ module load compilers/intel/2011.3.174
[baker@hydra ~]$ icc -v
Version 12.0.3
[baker@hydra ~]$ module unload compilers/intel/2011.3.174



[baker@hydra ~]$ module load compilers/intel/2011.4.191
[baker@hydra ~]$ icc -v
Version 12.0.4
[baker@hydra ~]$ module unload compilers/intel/2011.4.191



[baker@hydra ~]$ module load compilers/intel/2011.5.220
[baker@hydra ~]$ icc -v
Version 12.0.5
[baker@hydra ~]$ module unload compilers/intel/2011.5.220



[baker@hydra ~]$ module load compilers/intel/2011.6.233
[baker@hydra ~]$ icc -v
icc version 12.1.0 (gcc version 4.1.2 compatibility)
[baker@hydra ~]$ module unload compilers/intel/2011.6.233



Another problem I found with the Intel 12.1.0 compiler: I started to  
look at adding a test for the Intel compiler version around the  
#pragma that disables optimization for OpenMPI and I found the __ICC  
and __INTEL_COMPILER predefined macros (compiler version no.) are not  
properly defined:


$ icc -E -dD hello.c | grep __INTEL_COMPILER
#define __INTEL_COMPILER 
#define __INTEL_COMPILER_BUILD_DATE 20110811

$ icc -E -dD hello.c | grep __ICC
#define __ICC 

$ icc -v
icc version 12.1.0 (gcc version 4.1.2 compatibility)

I do not know if there is code in OpenMPI that looks at __ICC and  
__INTEL_COMPILER, but that could cause problems.  (Pass this on  
upstream to the libtool people?)


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 17 Oct 2011, at 8:18 PM, George Bosilca wrote:


Larry,

Sorry for not updating this thread. The issue was identified and  
fixed by Rainer in r25290 (https://svn.open-mpi.org/trac/ompi/changeset/25290 
). Please read the comments and the linked thread on the Intel forum  
for more info about.


I couldn't find a trace of this being fixed in the 1.4 series, so I  
would wait upgrading until this issue gets resolved.


  Thanks,
george.

On Oct 17, 2011, at 23:00 , Larry Baker wrote:


George,

I have not had time to look over the 1.4.3 make check failure for  
Intel 2011.6.233 compilers.  Have you?


I had planned to get 1.4.3 compiled on all six of our compilers  
using the latest compiler releases.  I was putting off upgrading to  
1.4.4 or 1.5.x until after that to minimize the number of things  
that could go wrong.  Do you recommend otherwise?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 7 Oct 2011, at 6:46 PM, George Bosilca wrote:

The may_alias attribute was part of a forward-looking attribute  
checking, at a time where few compiler supported them. This  
explains why they are not widely used in the library itself.  
Moreover, as they do not affect the compilation itself (as your  
test highlights this is not the issue with the icc 2011.6.233  
compiler), there is no urge to remove the may_alias support.


I just got that particular version of the compiler installed on  
one of our machines. I'll give it a try over the weekend.


  george.

On Oct 7, 2011, at 20:21 , Larry Baker wrote:

The test for the __may_alias_ attribute uses the following short  
code snippet:



int * p_value __attribute__ ((__may_alias__));
int
main ()
{

  ;
  return 0;
}


Indeed, for Intel 2011 compilers prior to 2011.6.233, this  
results in a warning:



root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220
[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c
may_alias_test.c(123): warning #1292: attribute "__may_alias__"  
ignored

  int * p_value __attribute__ ((__may_alias__));
^

[root@hydra openmpi-1.4.3]# module unload compilers/intel/ 
2011.5.220



[root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233
[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c



I modified ./configure to force


ompi_cv___attribute__may_alias=0



Then I compiled and tested the library.  Unfortunately, the  
results were exactly the same:



make  check-TESTS
make[3]: Entering directory `/state/partition1/root/src/ 
openmpi-1.4.3/test/datatype'

/bin/sh: line 4: 26326 Segmentation fault  ${dir}$tst
FAIL: checksum
/bin/sh: line 4: 26359 Segmentation fault  ${dir}$tst
FAIL: position

2 of 2 tests failed
Please report to http://www.open-mpi.org/community/help/




I could not find any use of the may_alias attribute, other than  
in a #define in opal/include/opal_config_bottom.h.  Is  
OMPI_HAVE_ATTRIBUTE_MAY_ALIAS just cruft that can be removed?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 7 Oct 2011, at 11:08 AM, Larry Baker wrote:

I ran into a problem this past week trying to upgrade our  
OpenMPI 1.4.3 for the latest 

Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-10-07 Thread Larry Baker
The test for the __may_alias_ attribute uses the following short code  
snippet:



int * p_value __attribute__ ((__may_alias__));
int
main ()
{

  ;
  return 0;
}


Indeed, for Intel 2011 compilers prior to 2011.6.233, this results in  
a warning:



root@hydra openmpi-1.4.3]# module load compilers/intel/2011.5.220
[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c
may_alias_test.c(123): warning #1292: attribute "__may_alias__"  
ignored

  int * p_value __attribute__ ((__may_alias__));
^

[root@hydra openmpi-1.4.3]# module unload compilers/intel/2011.5.220



[root@hydra openmpi-1.4.3]# module load compilers/intel/2011.6.233
[root@hydra openmpi-1.4.3]# icc -c may_alias_test.c



I modified ./configure to force


ompi_cv___attribute__may_alias=0



Then I compiled and tested the library.  Unfortunately, the results  
were exactly the same:



make  check-TESTS
make[3]: Entering directory `/state/partition1/root/src/ 
openmpi-1.4.3/test/datatype'

/bin/sh: line 4: 26326 Segmentation fault  ${dir}$tst
FAIL: checksum
/bin/sh: line 4: 26359 Segmentation fault  ${dir}$tst
FAIL: position

2 of 2 tests failed
Please report to http://www.open-mpi.org/community/help/




I could not find any use of the may_alias attribute, other than in a  
#define in opal/include/opal_config_bottom.h.  Is  
OMPI_HAVE_ATTRIBUTE_MAY_ALIAS just cruft that can be removed?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 7 Oct 2011, at 11:08 AM, Larry Baker wrote:

I ran into a problem this past week trying to upgrade our OpenMPI  
1.4.3 for the latest Intel 2011 compiler, 2011.6.233.


make check fails with Segmentation Fault errors:

[root@hydra openmpi-1.4.3]# tail -20 ../openmpi-1.4.3-check-intel. 
6.233.log
/bin/sh ../../libtool --tag=CC   --mode=link icc  -DNDEBUG -g -O3 - 
finline-functions -fno-strict-aliasing -restrict -pthread - 
fvisibility=hidden -shared-intel -export-dynamic -shared-intel  -o  
ddt_pack ddt_pack.o ../../ompi/libmpi.la -lnsl -lutil
libtool: link: icc -DNDEBUG -g -O3 -finline-functions -fno-strict- 
aliasing -restrict -pthread -fvisibility=hidden -shared-intel - 
shared-intel -o .libs/ddt_pack ddt_pack.o -Wl,--export- 
dynamic  ../../ompi/.libs/libmpi.so /usr/local/src/openmpi-1.4.3/ 
orte/.libs/libopen-rte.so /usr/local/src/openmpi-1.4.3/opal/.libs/ 
libopen-pal.so -ldl -lnsl -lutil -pthread -Wl,-rpath -Wl,/usr/local/ 
lib
make[3]: Leaving directory `/state/partition1/root/src/ 
openmpi-1.4.3/test/datatype'

make  check-TESTS
make[3]: Entering directory `/state/partition1/root/src/ 
openmpi-1.4.3/test/datatype'

/bin/sh: line 4:  6322 Segmentation fault  ${dir}$tst
FAIL: checksum
/bin/sh: line 4:  6355 Segmentation fault  ${dir}$tst
FAIL: position

2 of 2 tests failed
Please report to http://www.open-mpi.org/community/help/

make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory `/state/partition1/root/src/ 
openmpi-1.4.3/test/datatype'

make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/state/partition1/root/src/ 
openmpi-1.4.3/test/datatype'

make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory `/state/partition1/root/src/ 
openmpi-1.4.3/test'

make: *** [check-recursive] Error 1



Before trying to track down the problem, I thought I'd describe what  
I see here in case someone recognizes what might be happening.


We have been using OpenMPI 1.4.3 compiled with the Intel 2011.3.174  
compiler.  I've updated the Intel 2011 compilers as they have come  
out with new versions: 2011.4.191, 2011.5.220, and now 2011.6.233.   
However, I've not recompiled OpenMPI 1.4.3 until now.


Since the original compilation of OpenMPI 1.4.3 with the Intel  
2011.3.174 compilers, I have installed libnuma and libnuma-devel  
RPMs on our cluster front end.  I noticed that changed the OpenMPI  
1.4.3 ./configure output.  To test that this was not the cause of  
the problem, I recompiled OpenMPI 1.4.3 using both the CentOS/Rocks  
GNU compilers and the Intel 2011.3.174 compilers.  They both passed  
all the make check tests.


To find out when this problem first occurs, I systematically  
configured, compiled, and checked OpenMPI 1.4.3 with all four  
versions of the Intel 2011 compilers we have.  We use the modules  
package to load the compiler environment:



[root@hydra openmpi-1.4.3]# env | grep /opt/intel
LD_LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/ 
intel64:/opt/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64
PATH=/opt/intel/composer_xe_2011_sp1.6.233/bin/intel64:/usr/ 
kerberos/sbin:/usr/kerberos/bin:/usr/java/latest/bin:/usr/local/ 
sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/eclipse:/opt/ 
ganglia/bin:/opt/ganglia/sbin:/opt/maui/bin:/opt/torque/bin:/opt/ 
t

[OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-10-07 Thread Larry Baker
iously where I will start looking for the source of the  
problem.


Maybe someone reading this list knows what the purpose of that test  
is, whether the Intel 2011 compilers are expected to have this feature  
enabled, and whether the code this enables can cause this problem if  
the Intel 2011.6.233 compilers do not fully support whatever this test  
is intended to discern.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



Re: [OMPI devel] Building Error

2011-08-16 Thread Larry Baker

Matthew,

The best I can come up with is that somehow the declaration of  
external orte_odls in orte/mca/odls/odls.h


ORTE_DECLSPEC extern orte_odls_base_module_t orte_odls;  /* holds  
selected module's function pointers */


does not exactly match the definition of orte_odis in orte/mca/odis/ 
base/odls_base_open.c



orte_odls_base_module_t orte_odls;



ORTE_DECLSPEC might include some decorations having to do with the  
visibility attribute.  Try adding --disable-visibility to your  
configure.


Otherwise, I see in orte/mca/odis/base/odls_base_open.c that orte_odis  
is not defined if ORTE_DISABLE_FULL_SUPPORT == 1.  I tried to compile  
with --without-rte-support to force #define ORTE_DISABLE_FULL_SUPPORT  
1, but the make failed before it reached the link that failed for  
you.  When --without-rte-support is requested in 1.4.3, there are  
declarations that depend on typedefs that are skipped, causing the  
make to fail.  You may be encountering something subtle like that when  
configure deduces some behavior for pgcc and the code doesn't quite  
have the conditional compilation tests in the right place.


You might try a newer version of OpenMPI, which might have fixed  
problem like --without-rte-support failing.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 16 Aug 2011, at 11:53 AM, Matthew Russell wrote:


Hi Larry,

Thank you for your interest.

I believe your solution is the right one, however I think there's  
some other issues causing some problems too.


When I add the search_paths_first flag to my configure, the command  
that breaks in the Makefile is,


libtool: link: /opt/pgi/osx86-64/10.9/bin/pgcc -DNDEBUG -O2 - 
Msignextend -V -search_paths_first -o orte-clean orte- 
clean.o  ../../../orte/.libs/libopen-rte.a /Users/matt/software/ 
openmpi/openmpi-1.4.3/opal/.libs/libopen-pal.a -lutil

pgcc-Error-Unknown switch: -search_paths_first

pgcc 10.9-0 64-bit target on Apple OS/X -tp nehalem-64
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2010, STMicroelectronics, Inc.  All Rights Reserved.
make: *** [orte-clean] Error 1

The problem there is that that libtool isn't passing the "-Wl,"  
along with the search_path_first error, so it isn't getting to the  
linker.  If I try to manually build it, I still have missing symbols:


matt@pontus:orte-clean$ pgcc -DNDEBUG -O2 -Msignextend -V -Wl,- 
search_paths_first -o orte-clean orte-clean.o  ../../../orte/.libs/ 
libopen-rte.a /Users/matt/software/openmpi/openmpi-1.4.3/opal/.libs/ 
libopen-pal.a -lutil


pgcc 10.9-0 64-bit target on Apple OS/X -tp nehalem-64
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2010, STMicroelectronics, Inc.  All Rights Reserved.
Undefined symbols for architecture x86_64:
  "_orte_odls", referenced from:
  _orte_errmgr_base_error_abort in libopen- 
rte.a(errmgr_base_fns.o)

ld: symbol(s) not found for architecture x86_64



On Tue, Aug 16, 2011 at 2:46 PM, Larry Baker <ba...@usgs.gov> wrote:
Matthew,

What configure options did you use?

I can try to replicate your findings, as best I can, using the Intel  
compiler on my desktop Mac (Leopard).  One thing I want to  
investigate is which libutil is supposed to be linked.  There is no - 
L in the failing link step.  Is that possibly the error?


I have PGI and about five other compilers on our cluster.  I'll get  
to OpenMPI 1.4.3 with all those as soon as I fetch the latest  
versions and reinstall my cluster software (Rocks just came out with  
5.4.3).


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 16 Aug 2011, at 9:44 AM, Matthew Russell wrote:

Hmm, I tried the recommendation above, adding -Wl,- 
search_paths_first, and I still ran into the same issue.  I suspect  
it is an issue with PGI.


Meanwhile, I've been able to get my applications (CMAQ) working  
with MPICH2, so for now at least I am going to continue with that.


Thanks for the responses!

On Mon, Aug 15, 2011 at 8:43 PM, Ralph Castain <r...@open-mpi.org>  
wrote:
FWIW: I build OMPI on Mac OS-X (Snow Leopard) every day, without  
adding any extra flags, without problem. The citation below relates  
to something from a long time ago, I believe - haven't seen that  
problem in quite some time.


I do not, however, use PGI. We regularly have problems with PGI on  
a variety of systems, and I suspect you are hitting one here - but  
can't confirm it as we don't have PGI licenses to use for testing.


The Xgrid support is broken, but has nothing to do with the problem  
you describe. Just means you can't launch via Xgrid.




On Aug 15, 2011, at 2:53 PM, Larry Baker wrote:


Matthew,

I have the same type of error on a completely different software  
package on Mac OS X.  The error occurs because of the way that Mac  
OS X searches for -lutil.  If the libutil.a ORTE needs is theirs,  
i.e., not the system libutil.dylib, then you have exactly the same  
pr

Re: [OMPI devel] Building Error

2011-08-16 Thread Larry Baker

Matthew,

orte_odls is a global variable defined in odls_base_open.c.

I used your configure options, but did not override the compiler or  
compiler flags options.  configure used gcc.  odls_base_open.c gets  
compiled and then the object gets inserted into libmca_odls.a.  Later,  
it looks like it also gets inserted into libopen-rte.0.dylib.  The  
link step to create orte-clean references libopen-rte.dylib:



gcc:

/bin/sh ../../../libtool --tag=CC   --mode=link gcc  -O3 -DNDEBUG - 
finline-functions -fno-strict-aliasing  -fvisibility=hidden  -export- 
dynamic   -o orte-clean orte-clean.o ../../../orte/libopen-rte.la - 
lutil


libtool: link: gcc -O3 -DNDEBUG -finline-functions -fno-strict- 
aliasing -fvisibility=hidden -o .libs/orte-clean orte- 
clean.o  ../../../orte/.libs/libopen-rte.dylib /Users/baker/Desktop/ 
Software/OpenMPI/1.4.3/openmpi-1.4.3/opal/.libs/libopen-pal.dylib - 
lutil



Your link step does not; it references a static version of libopen-rte:


pgcc:

/bin/sh ../../../libtool --tag=CC   --mode=link /opt/pgi/ 
osx86-64/10.9/bin/pgcc  -DNDEBUG -O2 -Msignextend -V   -export- 
dynamic   -o orte-clean orte-clean.o ../../../orte/libopen-rte.la- 
lutil


libtool: link: /opt/pgi/osx86-64/10.9/bin/pgcc -DNDEBUG -O2 - 
Msignextend -V -o orte-clean orte-clean.o  ../../../orte/.libs/ 
libopen-rte.a /Users/matt/software/openmpi/openmpi-1.4.3/opal/.libs/ 
libopen-pal.a -lutil


Undefined symbols for architecture x86_64:
  "_orte_odls", referenced from:
  _orte_errmgr_base_error_abort in libopen- 
rte.a(errmgr_base_fns.o)

ld: symbol(s) not found for architecture x86_64



I will try to configure my setup to use static libraries and see what  
changes.


I think the experiment with -search_paths_first was a red herring.  I  
think odls_base_open.o is not in libopen-rte.a for some reason.  Or,  
the external name that gets defined in odls_base_open.c is not the  
same as the external name being referenced in errmgr_base_fns.c.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 16 Aug 2011, at 11:53 AM, Matthew Russell wrote:


Hi Larry,

Thank you for your interest.

I believe your solution is the right one, however I think there's  
some other issues causing some problems too.


When I add the search_paths_first flag to my configure, the command  
that breaks in the Makefile is,


libtool: link: /opt/pgi/osx86-64/10.9/bin/pgcc -DNDEBUG -O2 - 
Msignextend -V -search_paths_first -o orte-clean orte- 
clean.o  ../../../orte/.libs/libopen-rte.a /Users/matt/software/ 
openmpi/openmpi-1.4.3/opal/.libs/libopen-pal.a -lutil

pgcc-Error-Unknown switch: -search_paths_first

pgcc 10.9-0 64-bit target on Apple OS/X -tp nehalem-64
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2010, STMicroelectronics, Inc.  All Rights Reserved.
make: *** [orte-clean] Error 1

The problem there is that that libtool isn't passing the "-Wl,"  
along with the search_path_first error, so it isn't getting to the  
linker.  If I try to manually build it, I still have missing symbols:


matt@pontus:orte-clean$ pgcc -DNDEBUG -O2 -Msignextend -V -Wl,- 
search_paths_first -o orte-clean orte-clean.o  ../../../orte/.libs/ 
libopen-rte.a /Users/matt/software/openmpi/openmpi-1.4.3/opal/.libs/ 
libopen-pal.a -lutil


pgcc 10.9-0 64-bit target on Apple OS/X -tp nehalem-64
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2010, STMicroelectronics, Inc.  All Rights Reserved.
Undefined symbols for architecture x86_64:
  "_orte_odls", referenced from:
  _orte_errmgr_base_error_abort in libopen- 
rte.a(errmgr_base_fns.o)

ld: symbol(s) not found for architecture x86_64



On Tue, Aug 16, 2011 at 2:46 PM, Larry Baker <ba...@usgs.gov> wrote:
Matthew,

What configure options did you use?

I can try to replicate your findings, as best I can, using the Intel  
compiler on my desktop Mac (Leopard).  One thing I want to  
investigate is which libutil is supposed to be linked.  There is no - 
L in the failing link step.  Is that possibly the error?


I have PGI and about five other compilers on our cluster.  I'll get  
to OpenMPI 1.4.3 with all those as soon as I fetch the latest  
versions and reinstall my cluster software (Rocks just came out with  
5.4.3).


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 16 Aug 2011, at 9:44 AM, Matthew Russell wrote:

Hmm, I tried the recommendation above, adding -Wl,- 
search_paths_first, and I still ran into the same issue.  I suspect  
it is an issue with PGI.


Meanwhile, I've been able to get my applications (CMAQ) working  
with MPICH2, so for now at least I am going to continue with that.


Thanks for the responses!

On Mon, Aug 15, 2011 at 8:43 PM, Ralph Castain <r...@open-mpi.org>  
wrote:
FWIW: I build OMPI on Mac OS-X (Snow Leopard) every day, without  
adding any extra flags, without problem. The citation below relates  
to something from a long time ag

Re: [OMPI devel] Building Error

2011-08-16 Thread Larry Baker

Matthew,

What configure options did you use?

I can try to replicate your findings, as best I can, using the Intel  
compiler on my desktop Mac (Leopard).  One thing I want to investigate  
is which libutil is supposed to be linked.  There is no -L in the  
failing link step.  Is that possibly the error?


I have PGI and about five other compilers on our cluster.  I'll get to  
OpenMPI 1.4.3 with all those as soon as I fetch the latest versions  
and reinstall my cluster software (Rocks just came out with 5.4.3).


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 16 Aug 2011, at 9:44 AM, Matthew Russell wrote:

Hmm, I tried the recommendation above, adding -Wl,- 
search_paths_first, and I still ran into the same issue.  I suspect  
it is an issue with PGI.


Meanwhile, I've been able to get my applications (CMAQ) working with  
MPICH2, so for now at least I am going to continue with that.


Thanks for the responses!

On Mon, Aug 15, 2011 at 8:43 PM, Ralph Castain <r...@open-mpi.org>  
wrote:
FWIW: I build OMPI on Mac OS-X (Snow Leopard) every day, without  
adding any extra flags, without problem. The citation below relates  
to something from a long time ago, I believe - haven't seen that  
problem in quite some time.


I do not, however, use PGI. We regularly have problems with PGI on a  
variety of systems, and I suspect you are hitting one here - but  
can't confirm it as we don't have PGI licenses to use for testing.


The Xgrid support is broken, but has nothing to do with the problem  
you describe. Just means you can't launch via Xgrid.




On Aug 15, 2011, at 2:53 PM, Larry Baker wrote:


Matthew,

I have the same type of error on a completely different software  
package on Mac OS X.  The error occurs because of the way that Mac  
OS X searches for -lutil.  If the libutil.a ORTE needs is theirs,  
i.e., not the system libutil.dylib, then you have exactly the same  
problem I did.


Here are my notes for the fix using gcc.  You will have to find out  
the equivalent method to pass the -search_paths_first linker option  
using pgcc.


# Mac OS X searches for shared libraries before static libraries.   
Thus, -L -lutil finds the system libutil.dylib
# before our libutil.a, which causes undefined references in the  
link step because it is using the wrong library.  The
# ld -search_paths_first option forces ld to search each directory  
first for a matching library, instead of all directories

# first for a shared library.
# Note: this is the form to pass -search_paths_first to ld when $ 
(CC) is the linker command in makefile.ux

export LDFLAGS=-Wl,-search_paths_first


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 15 Aug 2011, at 1:01 PM, Matthew Russell wrote:




I hope this problem merits being posted here.

On OS X (Snow Leopard, and Lion), I cannot seem to build Open MPI.

After a lot of building, I get the error:

/bin/sh ../../../libtool --tag=CC   --mode=link /opt/pgi/ 
osx86-64/10.9/bin/pgcc  -DNDEBUG -O2 -Msignextend -V   -export- 
dynamic   -o orte-clean orte-clean.o ../../../orte/libopen-rte.la- 
lutil
libtool: link: /opt/pgi/osx86-64/10.9/bin/pgcc -DNDEBUG -O2 - 
Msignextend -V -o orte-clean orte-clean.o  ../../../orte/.libs/ 
libopen-rte.a /Users/matt/software/openmpi/openmpi-1.4.3/ 
opal/.libs/libopen-pal.a -lutil

Undefined symbols for architecture x86_64:
  "_orte_odls", referenced from:
  _orte_errmgr_base_error_abort in libopen- 
rte.a(errmgr_base_fns.o)

ld: symbol(s) not found for architecture x86_64

This is with the PGI 10.9 compiler, OpenMPI 1.4.3, platform is 86x64

The README does not list PGI as a compiler that OpenMPI was tested  
with, and there are notes about it's support for XGrid being  
broken (I'm not sure if this is related.)


I seem to get the error regardless of which configure flags I'm  
using, just for completeness though, here are the flags I am using:
./configure --prefix=/usr/local/openmpi_pg --enable-mpi-f77 -- 
enable-mpi-f90 --with-memory-manager=none


Has anyone else got or fixed this error?

I looked at other postings in this list, such as http://www.open-mpi.org/community/lists/devel/2007/05/1590.php 
 , but they didn't help much.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Building Error

2011-08-16 Thread Larry Baker
The problem I have on our own package is definitely a current problem  
that requires the ld -search_paths_first option on Mac OS X.  From man  
ld on Mac OS X Snow Leopard:



-search_paths_first
By default the -lx and -weak-lx options first search  
for a

file of the form `libx.dylib' in each directory in the
library search path, then a file of the form  
`libx.a' is

searched for in the library search paths.  This option
changes it so that in each path `libx.dylib' is  
searched for
then `libx.a' before the next path in the library  
search path

is searched.


Without -Wl,-search_paths_first:

/usr/bin/gcc -m32 -g -O -Wuninitialized -D_MACOSX -D_INTEL - 
D_GFORTRAN -D_USE_PTHREADS -D_USE_SCHED -I/Users/baker/Desktop/ 
Earthworm/merged/include -I./  -o /Users/baker/Desktop/Earthworm/ 
merged/bin/reftek2ew main.o hbeat.o init.o notify.o params.o scn.o  
send.o terminate.o samprate.o /Users/baker/Desktop/Earthworm/merged/ 
lib/transport.o /Users/baker/Desktop/Earthworm/merged/lib/getutil.o / 
Users/baker/Desktop/Earthworm/merged/lib/kom.o /Users/baker/Desktop/ 
Earthworm/merged/lib/logit.o /Users/baker/Desktop/Earthworm/merged/ 
lib/sema_ew.o /Users/baker/Desktop/Earthworm/merged/lib/sleep_ew.o / 
Users/baker/Desktop/Earthworm/merged/lib/time_ew.o /Users/baker/ 
Desktop/Earthworm/merged/lib/threads_ew.o -L./lib -lrtp -lreftek - 
lutil -lm -lpthread

Undefined symbols:
 "_util_sswap", referenced from:
 _reftek_dt in libreftek.a(dt.o)



With -Wl,-search_paths_first:

/usr/bin/gcc -m32 -g -O -Wuninitialized -D_MACOSX -D_INTEL - 
D_GFORTRAN -D_USE_PTHREADS -D_USE_SCHED -I/Users/baker/Desktop/ 
Earthworm/merged/include -I./ -Wl,-search_paths_first -o /Users/ 
baker/Desktop/Earthworm/merged/bin/reftek2ew main.o hbeat.o init.o  
notify.o params.o scn.o send.o terminate.o samprate.o /Users/baker/ 
Desktop/Earthworm/merged/lib/transport.o /Users/baker/Desktop/ 
Earthworm/merged/lib/getutil.o /Users/baker/Desktop/Earthworm/merged/ 
lib/kom.o /Users/baker/Desktop/Earthworm/merged/lib/logit.o /Users/ 
baker/Desktop/Earthworm/merged/lib/sema_ew.o /Users/baker/Desktop/ 
Earthworm/merged/lib/sleep_ew.o /Users/baker/Desktop/Earthworm/ 
merged/lib/time_ew.o /Users/baker/Desktop/Earthworm/merged/lib/ 
threads_ew.o -L./lib -lrtp -lreftek -lutil -lm -lpthread


While this may not be what Matthew is encountering, it is definitely  
something to keep in your bag or tricks.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 15 Aug 2011, at 5:43 PM, Ralph Castain wrote:

FWIW: I build OMPI on Mac OS-X (Snow Leopard) every day, without  
adding any extra flags, without problem. The citation below relates  
to something from a long time ago, I believe - haven't seen that  
problem in quite some time.


I do not, however, use PGI. We regularly have problems with PGI on a  
variety of systems, and I suspect you are hitting one here - but  
can't confirm it as we don't have PGI licenses to use for testing.


The Xgrid support is broken, but has nothing to do with the problem  
you describe. Just means you can't launch via Xgrid.



On Aug 15, 2011, at 2:53 PM, Larry Baker wrote:


Matthew,

I have the same type of error on a completely different software  
package on Mac OS X.  The error occurs because of the way that Mac  
OS X searches for -lutil.  If the libutil.a ORTE needs is theirs,  
i.e., not the system libutil.dylib, then you have exactly the same  
problem I did.


Here are my notes for the fix using gcc.  You will have to find out  
the equivalent method to pass the -search_paths_first linker option  
using pgcc.


# Mac OS X searches for shared libraries before static libraries.   
Thus, -L -lutil finds the system libutil.dylib
# before our libutil.a, which causes undefined references in the  
link step because it is using the wrong library.  The
# ld -search_paths_first option forces ld to search each directory  
first for a matching library, instead of all directories

# first for a shared library.
# Note: this is the form to pass -search_paths_first to ld when $ 
(CC) is the linker command in makefile.ux

export LDFLAGS=-Wl,-search_paths_first


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 15 Aug 2011, at 1:01 PM, Matthew Russell wrote:




I hope this problem merits being posted here.

On OS X (Snow Leopard, and Lion), I cannot seem to build Open MPI.

After a lot of building, I get the error:

/bin/sh ../../../libtool --tag=CC   --mode=link /opt/pgi/ 
osx86-64/10.9/bin/pgcc  -DNDEBUG -O2 -Msignextend -V   -export- 
dynamic   -o orte-clean orte-clean.o ../../../orte/libopen-rte.la- 
lutil
libtool: link: /opt/pgi/osx86-64/10.9/bin/pgcc -DNDEBUG -O2 - 
Msignextend -V -o orte-clean orte-clean.o  ../../../orte/.libs/ 
libopen-rte.a /Users/matt/software/openmpi/openmpi-1.4.3/ 
opal/.libs/libopen-pal.a -lutil

Undefined symbols for architecture x86_64:
  &

Re: [OMPI devel] Building Error

2011-08-15 Thread Larry Baker

Matthew,

I have the same type of error on a completely different software  
package on Mac OS X.  The error occurs because of the way that Mac OS  
X searches for -lutil.  If the libutil.a ORTE needs is theirs, i.e.,  
not the system libutil.dylib, then you have exactly the same problem I  
did.


Here are my notes for the fix using gcc.  You will have to find out  
the equivalent method to pass the -search_paths_first linker option  
using pgcc.


# Mac OS X searches for shared libraries before static libraries.   
Thus, -L -lutil finds the system libutil.dylib
# before our libutil.a, which causes undefined references in the  
link step because it is using the wrong library.  The
# ld -search_paths_first option forces ld to search each directory  
first for a matching library, instead of all directories

# first for a shared library.
# Note: this is the form to pass -search_paths_first to ld when $ 
(CC) is the linker command in makefile.ux

export LDFLAGS=-Wl,-search_paths_first


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 15 Aug 2011, at 1:01 PM, Matthew Russell wrote:




I hope this problem merits being posted here.

On OS X (Snow Leopard, and Lion), I cannot seem to build Open MPI.

After a lot of building, I get the error:

/bin/sh ../../../libtool --tag=CC   --mode=link /opt/pgi/ 
osx86-64/10.9/bin/pgcc  -DNDEBUG -O2 -Msignextend -V   -export- 
dynamic   -o orte-clean orte-clean.o ../../../orte/libopen-rte.la- 
lutil
libtool: link: /opt/pgi/osx86-64/10.9/bin/pgcc -DNDEBUG -O2 - 
Msignextend -V -o orte-clean orte-clean.o  ../../../orte/.libs/ 
libopen-rte.a /Users/matt/software/openmpi/openmpi-1.4.3/opal/.libs/ 
libopen-pal.a -lutil

Undefined symbols for architecture x86_64:
  "_orte_odls", referenced from:
  _orte_errmgr_base_error_abort in libopen- 
rte.a(errmgr_base_fns.o)

ld: symbol(s) not found for architecture x86_64

This is with the PGI 10.9 compiler, OpenMPI 1.4.3, platform is 86x64

The README does not list PGI as a compiler that OpenMPI was tested  
with, and there are notes about it's support for XGrid being broken  
(I'm not sure if this is related.)


I seem to get the error regardless of which configure flags I'm  
using, just for completeness though, here are the flags I am using:
./configure --prefix=/usr/local/openmpi_pg --enable-mpi-f77 --enable- 
mpi-f90 --with-memory-manager=none


Has anyone else got or fixed this error?

I looked at other postings in this list, such as http://www.open-mpi.org/community/lists/devel/2007/05/1590.php 
 , but they didn't help much.


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.4.4rc2 is up

2011-05-24 Thread Larry Baker
I see in ompi/mca/btl/openib/btl_openib/connect/ 
btl_openib_connect_oob.c and .../btl_openib_connect_xoob.c what looks  
like assignments of the form


   path_mtu = MIN( device_mtu, remote_mtu );

In ompi/mca/btl/openib/btl_openib/connect/ 
btl_openib_connect_oob.c(289) it looks correct:


enum ibv_mtu mtu = (openib_btl->device->mtu < endpoint- 
>rem_info.rem_mtu) ?

openib_btl->device->mtu : endpoint->rem_info.rem_mtu;



However, in ompi/mca/btl/openib/btl_openib/connect/ 
btl_openib_connect_xoob.c(462 and 563), it looks suspicious (different  
remote_mtu's):


attr.path_mtu = (openib_btl->device->mtu < endpoint- 
>rem_info.rem_mtu) ?

openib_btl->device->mtu : rem_info->rem_mtu;



Can someone verify that the code in ompi/mca/btl/openib/btl_openib/ 
connect/btl_openib_connect_xoob.c is correct?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in the  
usual place:


   http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.4.4rc2 is up

2011-05-23 Thread Larry Baker
e is a better way, since C does  
not guarantee the order of bitfields.  Anyway, since C permits enums  
to be used wherever ints can be used, the right hand side of



*mask = IBV_QP_ALT_PATH|IBV_QP_PATH_MIG_STATE;



is equivalent to


*mask = (int) IBV_QP_ALT_PATH | (int) IBV_QP_PATH_MIG_STATE;



, which results in an int.  (And, of course, = 0 is an int.)

The simplest fix would be to cast the result into an enum  
ibv_qp_attr_mask, with comments added that enum ibv_qp_attr_mask *mask  
is really the union of all the bitfields in enum ibv_qp_attr_mask, and  
that the value of *mask may not be a valid enum ibv_qp_attr_mask.


3) ompi/mca/btl/openib/btl_openib/connect/ 
btl_openib_connect_oob.c(289): warning #188: enumerated type mixed  
with another type,


enum ibv_mtu mtu = (openib_btl->device->mtu < endpoint- 
>rem_info.rem_mtu) ?

openib_btl->device->mtu : endpoint->rem_info.rem_mtu;



ompi/mca/btl/openib/btl_openib/connect/btl_openib_connect_xoob.c(462):  
warning #188: enumerated type mixed with another type, and


attr.path_mtu = (openib_btl->device->mtu < endpoint- 
>rem_info.rem_mtu) ?

openib_btl->device->mtu : rem_info->rem_mtu;



ompi/mca/btl/openib/btl_openib/connect/btl_openib_connect_xoob.c(563):  
warning #188: enumerated type mixed with another type:


attr.path_mtu = (openib_btl->device->mtu < endpoint- 
>rem_info.rem_mtu) ?

openib_btl->device->mtu : rem_info->rem_mtu;



The left hand sides are encoded MTUs (enum ibv_mtu, from /usr/include/ 
infiniband/verbs.h):




enum ibv_mtu {
IBV_MTU_256  = 1,
IBV_MTU_512  = 2,
IBV_MTU_1024 = 3,
IBV_MTU_2048 = 4,
IBV_MTU_4096 = 5
};



, while the openib_btl->device->mtu and rem_info->rem_mtu on the right  
hand sides are uint32_t's (encoded?).


By the way, lines 563-564 in ompi/mca/btl/openib/btl_openib/connect/ 
btl_openib_connect_xoob.c look suspicious to me:


ompi/mca/btl/openib/btl_openib/connect/ 
btl_openib_connect_xoob.c(563): warning #188: enumerated type mixed  
with another type:


attr.path_mtu = (openib_btl->device->mtu < endpoint- 
>rem_info.rem_mtu) ?

    openib_btl->device->mtu : rem_info->rem_mtu;





The test on the right hand side of the conditional is endpoint- 
>rem_info.rem_mtu, while the "false" expression is rem_info- 
>rem_mtu.  I suspect one of them is not correct.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in the  
usual place:


   http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.4.4rc2 is up

2011-05-23 Thread Larry Baker

Jeff,

I get the following warning from "make" using the Intel 2011.3.174  
compilers on OpenMPI 1.4.3:


libtool: compile:  icc -DHAVE_CONFIG_H -I. -I../../opal/include - 
I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/ 
linux/plpa/src/libplpa -I../.. -DNDEBUG -g -O3 -finline-functions - 
fno-strict-aliasing -restrict -pthread -fvisibility=hidden -MT  
dt_module.lo -MD -MP -MF .deps/dt_module.Tpo -c dt_module.c  -fPIC - 
DPIC -o .libs/dt_module.o
dt_module.c(709): warning #1224: #warning directive: "No proper C  
type found for COMPLEX32"

  #   warning "No proper C type found for COMPLEX32"
  ^


The code in ompi/datatype/dt_module.c (lines 705-713; exactly the same  
in 1.4.4rc2) is:



#if OMPI_HAVE_FORTRAN_COMPLEX32
#if OMPI_REAL16_MATCHES_C && (OMPI_SIZEOF_FORTRAN_COMPLEX32 ==  
2*SIZEOF_LONG_DOUBLE)
DECLARE_MPI_SYNONYM_DDT( _mpi_complex32.dt,  
"MPI_COMPLEX32", _mpi_ldblcplex.dt );

#else
#   warning "No proper C type found for COMPLEX32"
DECLARE_MPI_SYNONYM_DDT( _mpi_complex32.dt,  
"MPI_COMPLEX32", _mpi_unavailable.dt );

#endif
ompi_mpi_complex32.dt.flags |= DT_FLAG_DATA_FORTRAN |  
DT_FLAG_DATA_COMPLEX;

#endif /* OMPI_HAVE_FORTRAN_COMPLEX32 */



I see from configure, that the Intel compilers support REAL*16 and  
COMPLEX*32, but the representations are different between the C and  
Fortran compilers:



checking if REAL*16 matches bit representation of long double... no
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been  
disabled


I have two observations:

1) Despite the message saying so, configure does not in fact disable  
MPI_REAL16 and MPI_COMPLEX32 support; the code in ompi/datatype/ 
dt_module.c happens to catch the error.  From opal/include/ 
opal_config.h:



#define OMPI_ALIGNMENT_FORTRAN_COMPLEX32 1



#define OMPI_ALIGNMENT_FORTRAN_REAL16 1
#define OMPI_HAVE_F90_COMPLEX32 1
#define OMPI_HAVE_F90_REAL16 1
#define OMPI_HAVE_FORTRAN_COMPLEX32 1
#define OMPI_HAVE_FORTRAN_REAL16 1
#define OMPI_REAL16_MATCHES_C 0
#define OMPI_SIZEOF_FORTRAN_COMPLEX32 32
#define OMPI_SIZEOF_FORTRAN_REAL16 16



2) ompi/datatype/dt_module.c does not catch the same error for the  
incompatible REAL*16 datatype (lines 609-617):



#if OMPI_HAVE_FORTRAN_REAL16
#if (OMPI_SIZEOF_FORTRAN_REAL16 == SIZEOF_LONG_DOUBLE) <-- should be  
#if OMPI_REAL16_MATCHES_C && (OMPI_SIZEOF_FORTRAN_REAL16 ==  
SIZEOF_LONG_DOUBLE)
DECLARE_MPI_SYNONYM_DDT( _mpi_real16.dt, "MPI_REAL16",  
_mpi_long_double.dt );

#else
#   warning "No proper C type found for REAL16"
DECLARE_MPI_SYNONYM_DDT( _mpi_real16.dt, "MPI_REAL16",  
_mpi_unavailable.dt );

#endif
ompi_mpi_real16.dt.flags |= DT_FLAG_DATA_FORTRAN |  
DT_FLAG_DATA_FLOAT;

#endif /* OMPI_HAVE_FORTRAN_REAL16 */



I do not like make warnings.  configure determines that the REAL*16  
and COMPLEX*32 datatypes are incompatible, but then does not actually  
disable them, despite saying it did.  I like defensive code.  The  
COMPLEX*32 datatype protection needs to be applied to the REAL*16  
datatype as well in ompi/datatype/dt_module.c.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in the  
usual place:


   http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.4.4rc2 is up

2011-05-20 Thread Larry Baker

Jeff,

I get warnings from all my compilers running "make check" when  
compiling test/class/ompi_rb_tree.c.  For example, using gcc/g++/ 
gfortran:



ompi_rb_tree.c: In function 'test2':
ompi_rb_tree.c:347: warning: cast to pointer from integer of  
different size
ompi_rb_tree.c:365: warning: cast from pointer to integer of  
different size
ompi_rb_tree.c:373: warning: cast from pointer to integer of  
different size


This is due, I am sure, to the mixing of 64-bit pointers and 32-bit  
integers.  Do you have a "safe" method to do these conversions so  
these warnings go away?  Maybe a macro you use in the library?


While looking at the source of the warnings, I saw that the code in  
test/class/ompi_rb_tree.c, lines 361-368 are duplicated in lines  
369-376 (quoted, below).  Is this intentional?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

function test2() in test/class/ompi_rb_tree.c:

void test2(void)
{
ompi_free_list_t key_list;
ompi_free_list_item_t * new_value;
ompi_rb_tree_t tree;
int rc, i, size;
void * result, * lookup;
void * mem[NUM_ALLOCATIONS];
ompi_free_list_item_t * key_array[NUM_ALLOCATIONS];
struct timeval start, end;

OBJ_CONSTRUCT(_list, ompi_free_list_t);
ompi_free_list_init_new(_list, sizeof(ompi_test_rb_value_t),
CACHE_LINE_SIZE,
OBJ_CLASS(ompi_test_rb_value_t),
0,CACHE_LINE_SIZE,
0, -1 , 128, NULL);

OBJ_CONSTRUCT(, ompi_rb_tree_t);
rc = ompi_rb_tree_init(, mem_node_compare);
if(!test_verify_int(OMPI_SUCCESS, rc)) {
test_failure("failed to properly initialize the tree");
}

size = 1;
for(i = 0; i < NUM_ALLOCATIONS; i++)
{
mem[i] = malloc(size);
if(NULL == mem[i])
{
test_failure("system out of memory");
return;
}
OMPI_FREE_LIST_GET(_list, new_value, rc);
if(OMPI_SUCCESS != rc)
{
test_failure("failed to get memory from free list");
}
key_array[i] = new_value;
((ompi_test_rb_value_t *) new_value)->key.bottom = mem[i];
((ompi_test_rb_value_t *) new_value)->key.top =
(void *) ((size_t) mem[i]  
+ size - 1);
((ompi_test_rb_value_t *) new_value)->registered_mpools[0] =  
(void *) i;
rc = ompi_rb_tree_insert(, &((ompi_test_rb_value_t  
*)new_value)->key,

new_value);
if(OMPI_SUCCESS != rc)
{
test_failure("failed to properly insert a new node");
}
size += 1;
}

gettimeofday(, NULL);
for(i = 0; i < NUM_ALLOCATIONS; i++)
{
lookup = (void *) ((size_t) mem[i] + i);
result = ompi_rb_tree_find(, );
if(NULL == result)
{
test_failure("lookup returned null!");
} else if(i != ((int) ((ompi_test_rb_value_t *) result)- 
>registered_mpools[0]))

{
test_failure("lookup returned wrong node!");
}

result = ompi_rb_tree_find(, );
if(NULL == result)
{
test_failure("lookup returned null!");
} else if(i != ((int) ((ompi_test_rb_value_t *) result)- 
>registered_mpools[0]))

{
test_failure("lookup returned wrong node!");
}

}

gettimeofday(, NULL);

#if 0
i = (end.tv_sec - start.tv_sec) * 100 + (end.tv_usec -  
start.tv_usec);

printf("In a %d node tree, %d lookups took %f microseonds each\n",
NUM_ALLOCATIONS, NUM_ALLOCATIONS * 2,
(float) i / (float) (NUM_ALLOCATIONS * 2));
#endif

for(i = 0; i < NUM_ALLOCATIONS; i++)
{
if(NULL != mem[i])
{
free(mem[i]);
}
OMPI_FREE_LIST_RETURN(&(key_list), key_array[i]);
}

OBJ_DESTRUCT();
OBJ_DESTRUCT(_list);
}

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in the  
usual place:


   http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.4.4rc2 is up

2011-05-19 Thread Larry Baker

Jeff,

I ran into some kind of link error, I think, with PGI 10.3 and OpenMPI  
1.4.2 last year.  I am building a new cluster and we have PGI 11.4  
now.  I am consulting my notes and patches from 1.4.2 to inspect 1.4.3  
to see if the problems I had have been fixed.  I found the .m4 files I  
patched in 1.4.2 were identical in 1.4.3, so I fixed them right off  
the bat.  I found the same was true for the detection of inline  
assembly with C++.  Other problems I had with PGI 10.3 have been fixed  
with PGI 11.4, but I patched them anyway so OpenMPI 1.4.3 will still  
compile cleanly on PGI 10.x.  (I haven't sent you all of those for  
1.4.3; I sent them last year for 1.4.2.)  Finally, I patch the shell  
scripts that generate the Fortran 90 interface routines to remove the  
spurious declarations (without implementations, of course) of  
Character and Logical MPI_SIZEOF() generics, convert dummy arrays to  
assumed-shape arrays, and substantially clean them up/shrink them.


I have compiled and tested (make check) my patched OpenMPI 1.4.3 with  
Rocks 5.4 (CentOS 5.5) gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)  
and PGI pgcc 11.4-0 64-bit target on x86-64 Linux -tp nehalem.  I have  
not been so successful yet with Intel icc Version 12.0.3.174 Build  
20110309.  I have yet to try AMD x86 Open64 GNU gcc version 4.2.0  
(Open64 4.2.5 driver) or whatever I get from PathScale when I transfer  
the license from our old cluster to the new one.


After I get through OpenMPI 1.4.3, I should have time to test 1.4.4.   
Will there be another 1.4.4 release candidate?  Do I have to hurry to  
give you my feedback?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 19 May 2011, at 6:58 PM, Jeff Squyres wrote:


With all the outputs from Paul and Sam, I think we'll be good.

...hmmm.  Wait.  I see that our 1.4.x configure *is* patched to have  
the extra ".".  Here's the lines from configure in 1.4.3 and 1.4.4rc2:


   # Portland Group C++ compiler
   case `$CC -V` in
   *pgCC\ [1-5].* | *pgcpp\ [1-5].*)

It's not in the .m4 file because we patch configure *after* the m4  
file is used to generate configure (Don't ask -- it's a long,  
twisted story).


Can you say what the original problem was that eventually led you to  
this patch?




On May 18, 2011, at 2:08 PM, Larry Baker wrote:


Jeff,

Is this guaranteed to work for all versions of the PGI compiler?   
I.e., does "pgCC -V" always return something in the form of (digit) 
+\. ?


I don't know, but I think so.  See your Nov 2009 discussion of this  
bug and Ralf Wildenhues' libtool.m4 patches at http://www.open-mpi.org/community/lists/users/2009/11/11277.php 
.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 18 May 2011, at 5:50 AM, Jeff Squyres wrote:


(adding libtool-patc...@gnu.org)

Is this guaranteed to work for all versions of the PGI compiler?   
I.e., does "pgCC -V" always return something in the form of (digit) 
+\. ?



On May 17, 2011, at 8:52 PM, Larry Baker wrote:


This bug applies to OpenMPI 1.4.x and 1.5.x.

The libtool.m4 in config and opal/libltdl/m4 do not properly  
determine the version of the PGI compiler, which then set the  
wrong compile/link options.  They interpret V11.4 (version no.  
begins with a 1), for example, as being a V1 to V5 compiler.   
There is a missing period in the pattern, so that only text like  
1.x through 5.x matches.


Here's the diff -u from OpenMPI 1.4.3 (same code, same bug):


[root@hydra openmpi-1.4.3]# diff -u config/libtool.m4{.original,}
--- config/libtool.m4.original  2010-10-05 15:45:44.0 -0700
+++ config/libtool.m4   2011-05-17 15:32:31.0 -0700
@@ -5896,7 +5896,7 @@
pgCC* | pgcpp*)
  # Portland Group C++ compiler
case `$CC -V` in
-   *pgCC\ [[1-5]]* | *pgcpp\ [[1-5]]*)
+   *pgCC\ [[1-5]].* | *pgcpp\ [[1-5]].*)
  _LT_TAGVAR(prelink_cmds, $1)='tpldir=Template.dir~
rm -rf $tpldir~
		$CC --prelink_objects --instantiation_dir $tpldir $objs  
$libobjs $compile_deplibs~


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in  
the usual place:


http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
h

Re: [OMPI devel] SSE instructions?

2011-05-19 Thread Larry Baker

Jeff,

Thanks for your reply.

I inquired the same of PGI.  Dave Borer, PGI Customer Service Manager,  
responded:


 I believe -fast and -fastsse are identical for 64-bit compilers,  
but there are some differences
with 32-bit compilers.  I don't think TCP/IP based MPI routines have  
better performance from
optimizations, unless the processes are all running on the same  
machine.   I will ask engineering
how messages are passed when all the processes are running on the  
same hardware.



I am running on a 64-bit machine; I used -fast.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 19 May 2011, at 6:21 PM, Jeff Squyres wrote:


On May 18, 2011, at 8:32 PM, Larry Baker wrote:

The PGI compilers have a -fast and a -fastsse option.  Does OpenMPI  
make effective/safe use of SSE instructions (block moves maybe?)?


Not really.  The biggest thing that we do that can take advantage of  
vector instructions is memcpy, *mostly* in the shared memory  
transport, but also if your MPI application hap some funky non- 
contiguous MPI datatypes, too.


On their web site, PGI uses -fast in their examples for OpenMPI  
rather than -fastsse.  I don't know why.


Maybe for more portability...?

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.4.4rc2 is up

2011-05-19 Thread Larry Baker
The help text for --with-valgrind in configure appears twice (fixed in  
1.5.3)



  --with-libnuma-libdir=DIR
  Directory where the libnuma software is  
installed
  --with-valgrind(=DIR)   Directory where the valgrind software is  
installed

  --with-memory-manager=TYPE
  Use TYPE for intercepting memory  
management calls to

  control memory pinning.
  --with-plpa-symbol-prefix=STRING
  STRING can be any valid C symbol name. It  
will be
  prefixed to all public PLPA symbols.  
Default:

  "plpa_"
  --with-valgrind(=DIR)   Directory where the valgrind software is  
installed

  --with-timer=TYPE   Build high resolution timer component TYPE



Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in the  
usual place:


   http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.4.4rc2 is up

2011-05-19 Thread Larry Baker

Consider adding the improved description for --with-tm from 1.5.3

  --with-tm(=DIR) Build TM (Torque, PBSPro, and compatible)  
support,

  optionally adding DIR/include, DIR/lib, and
  DIR/lib64 to the search path for headers and
  libraries


to replace the text in 1.4.x


  --with-tm(=DIR) Directory where the tm software is installed


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in the  
usual place:


   http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] SSE instructions?

2011-05-18 Thread Larry Baker
The PGI compilers have a -fast and a -fastsse option.  Does OpenMPI  
make effective/safe use of SSE instructions (block moves maybe?)?  On  
their web site, PGI uses -fast in their examples for OpenMPI rather  
than -fastsse.  I don't know why.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov



Re: [OMPI devel] 1.4.4rc2 is up

2011-05-18 Thread Larry Baker
Jeff,Hmm.  This sounds right, but I'm a little curious as to why this never came up before.I reported this, as well as several others, in August 2010, "Fixes to OpenMPI-1.4.2 for PGI compilers".  (Attached are my patches for OpenMPI 1.4.2.)  At that time I was using the PGI 10.x compilers.What was the specific problem that caused you to add this patch?These warning messages are from PGI C++ 11.4 for the assembly language macros in OpenMPI 1.4.3 opal/include/opal/sys/amd64/atomic.h:libtool: compile:  pgcpp -m64 -DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include -I../../../ompi/include -I../../../opal/mca/paffinity/linux/plpa/src/libplpa -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 -I../../.. -D_REENTRANT -DNDEBUG -g -O3 -DNO_PGI_OFFSET -c mpicxx.cc  -fpic -DPIC -o .libs/mpicxx.o"../../../opal/include/opal/sys/amd64/atomic.h", line 91: warning: "cc"          clobber ignored                         : "memory", "cc");                                     ^"../../../opal/include/opal/sys/amd64/atomic.h", line 83: warning: parameter          "oldval" was set but never used                                          int32_t oldval, int32_t newval)                                                  ^"../../../opal/include/opal/sys/amd64/atomic.h", line 112: warning: "cc"          clobber ignored                         : "memory", "cc"                                     ^"../../../opal/include/opal/sys/amd64/atomic.h", line 104: warning: parameter          "oldval" was set but never used                                           int64_t oldval, int64_t newval)                                                   ^configure defines OMPI_CXX_GCC_INLINE_ASSEMBLY as 1 in opal/include/opal_config.h (unlike OMPI_C_GCC_INLINE_ASSEMBLY, which is defined as 0), which causes the assembly language macros to be used:# find . -name \*.h -exec grep \#define.\*OMPI_.\*_INLINE_ASSEMBLY {} ';' -print#define OMPI_CXX_DEC_INLINE_ASSEMBLY 0#define OMPI_CXX_GCC_INLINE_ASSEMBLY 1#define OMPI_CXX_XLC_INLINE_ASSEMBLY 0#define OMPI_C_DEC_INLINE_ASSEMBLY 0#define OMPI_C_GCC_INLINE_ASSEMBLY 0#define OMPI_C_XLC_INLINE_ASSEMBLY 0./opal/include/opal_config.h Larry BakerUS Geological Survey650-329-5608ba...@usgs.gov On 18 May 2011, at 6:17 AM, Jeff Squyres wrote:Hmm.  This sounds right, but I'm a little curious as to why this never came up before.  What was the specific problem that caused you to add this patch?On May 17, 2011, at 9:41 PM, Larry Baker wrote:This bug applies to OpenMPI 1.4.x and 1.5.x.Inline assembly does not work for PGI compilers.  configure disables inline assembly for PGI C, but neglects to do the same for PGI C++.  The code that disables inline assembly for PGI C needs to be copied to the section that handles inline assembly for C++.Here's the diff -u from OpenMPI 1.4.3 (same code, same bug):[root@hydra openmpi-1.4.3]# diff -u configure{.original,}--- configure.original	2010-10-05 15:48:18.0 -0700+++ configure	2011-05-17 18:35:04.0 -0700@@ -34690,6 +34690,11 @@ { $as_echo "$as_me:$LINENO: checking if $CXX supports GCC inline assembly" >&5 $as_echo_n "checking if $CXX supports GCC inline assembly... " >&6; }+    if test "$ompi_cv_cxx_compiler_vendor" = "portland group" ; then+    # PGI seems to have some issues with our inline assembly.+    # Disable for now.+    asm_result="no (Portland Group)"+    else case $host in *-aix*) # the AIX compilers and linkers really don't do gcc@@ -34813,6 +34818,7 @@ rm -f core conftest.err conftest.$ac_objext conftest_ipa8_conftest.oo \   conftest$ac_exeext conftest.$ac_ext fi+    fi { $as_echo "$as_me:$LINENO: result: $asm_result" >&5 $as_echo "$asm_result" >&6; }Larry BakerUS Geological Survey650-329-5608ba...@usgs.govOn 5 May 2011, at 7:15 AM, Jeff Squyres wrote:Fixed the ROMIO attribute problem properly this time -- it's in the usual place:   http://www.open-mpi.org/software/ompi/v1.4/-- Jeff Squyresjsquy...@cisco.comFor corporate legal information go to:http://www.cisco.com/web/about/doing_business/legal/cri/___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel-- Jeff Squyresjsquy...@cisco.comFor corporate legal information go to:http://www.cisco.com/web/about/doing_business/legal/cri/___devel mailing listde...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel

patch-openmpi-1.4.2.sh
Description: Binary data


Re: [OMPI devel] 1.4.4rc2 is up

2011-05-18 Thread Larry Baker

Jeff,

Is this guaranteed to work for all versions of the PGI compiler?   
I.e., does "pgCC -V" always return something in the form of (digit)+ 
\. ?


I don't know, but I think so.  See your Nov 2009 discussion of this  
bug and Ralf Wildenhues' libtool.m4 patches at http://www.open-mpi.org/community/lists/users/2009/11/11277.php 
. 


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 18 May 2011, at 5:50 AM, Jeff Squyres wrote:


(adding libtool-patc...@gnu.org)

Is this guaranteed to work for all versions of the PGI compiler?   
I.e., does "pgCC -V" always return something in the form of (digit)+ 
\. ?



On May 17, 2011, at 8:52 PM, Larry Baker wrote:


This bug applies to OpenMPI 1.4.x and 1.5.x.

The libtool.m4 in config and opal/libltdl/m4 do not properly  
determine the version of the PGI compiler, which then set the wrong  
compile/link options.  They interpret V11.4 (version no. begins  
with a 1), for example, as being a V1 to V5 compiler.  There is a  
missing period in the pattern, so that only text like 1.x through  
5.x matches.


Here's the diff -u from OpenMPI 1.4.3 (same code, same bug):


[root@hydra openmpi-1.4.3]# diff -u config/libtool.m4{.original,}
--- config/libtool.m4.original  2010-10-05 15:45:44.0 -0700
+++ config/libtool.m4   2011-05-17 15:32:31.0 -0700
@@ -5896,7 +5896,7 @@
  pgCC* | pgcpp*)
# Portland Group C++ compiler
case `$CC -V` in
-   *pgCC\ [[1-5]]* | *pgcpp\ [[1-5]]*)
+   *pgCC\ [[1-5]].* | *pgcpp\ [[1-5]].*)
  _LT_TAGVAR(prelink_cmds, $1)='tpldir=Template.dir~
rm -rf $tpldir~
		$CC --prelink_objects --instantiation_dir $tpldir $objs $libobjs  
$compile_deplibs~


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in  
the usual place:


  http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.4.4rc2 is up

2011-05-17 Thread Larry Baker

This bug applies to OpenMPI 1.4.x and 1.5.x.

Inline assembly does not work for PGI compilers.  configure disables  
inline assembly for PGI C, but neglects to do the same for PGI C++.   
The code that disables inline assembly for PGI C needs to be copied to  
the section that handles inline assembly for C++.


Here's the diff -u from OpenMPI 1.4.3 (same code, same bug):


[root@hydra openmpi-1.4.3]# diff -u configure{.original,}
--- configure.original  2010-10-05 15:48:18.0 -0700
+++ configure   2011-05-17 18:35:04.0 -0700
@@ -34690,6 +34690,11 @@
 { $as_echo "$as_me:$LINENO: checking if $CXX supports GCC  
inline assembly" >&5

 $as_echo_n "checking if $CXX supports GCC inline assembly... " >&6; }

+if test "$ompi_cv_cxx_compiler_vendor" = "portland group" ; then
+# PGI seems to have some issues with our inline assembly.
+# Disable for now.
+asm_result="no (Portland Group)"
+else
 case $host in
 *-aix*)
 # the AIX compilers and linkers really don't do gcc
@@ -34813,6 +34818,7 @@
 rm -f core conftest.err conftest.$ac_objext  
conftest_ipa8_conftest.oo \

   conftest$ac_exeext conftest.$ac_ext
 fi
+fi

 { $as_echo "$as_me:$LINENO: result: $asm_result" >&5
 $as_echo "$asm_result" >&6; }


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in the  
usual place:


   http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.4.4rc2 is up

2011-05-17 Thread Larry Baker

This bug applies to OpenMPI 1.4.x and 1.5.x.

The libtool.m4 in config and opal/libltdl/m4 do not properly determine  
the version of the PGI compiler, which then set the wrong compile/link  
options.  They interpret V11.4 (version no. begins with a 1), for  
example, as being a V1 to V5 compiler.  There is a missing period in  
the pattern, so that only text like 1.x through 5.x matches.


Here's the diff -u from OpenMPI 1.4.3 (same code, same bug):


[root@hydra openmpi-1.4.3]# diff -u config/libtool.m4{.original,}
--- config/libtool.m4.original  2010-10-05 15:45:44.0 -0700
+++ config/libtool.m4   2011-05-17 15:32:31.0 -0700
@@ -5896,7 +5896,7 @@
   pgCC* | pgcpp*)
 # Portland Group C++ compiler
case `$CC -V` in
-   *pgCC\ [[1-5]]* | *pgcpp\ [[1-5]]*)
+   *pgCC\ [[1-5]].* | *pgcpp\ [[1-5]].*)
  _LT_TAGVAR(prelink_cmds, $1)='tpldir=Template.dir~
rm -rf $tpldir~
 		$CC --prelink_objects --instantiation_dir $tpldir $objs $libobjs  
$compile_deplibs~



Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On 5 May 2011, at 7:15 AM, Jeff Squyres wrote:

Fixed the ROMIO attribute problem properly this time -- it's in the  
usual place:


   http://www.open-mpi.org/software/ompi/v1.4/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r24665

2011-05-02 Thread Larry Baker

Ralph,

What about creating a lookup table of static const values with  
comments for readability, and use Tim's code, except for the last  
line, which would lookup the value instead of computing it?


I don't know how often this code path is traversed.  Seeing only this  
snippet of code, I prefer Tim's code because it is clear what the  
valid values are for the input argument (no need to scan all the  
"case"s, find there is a "default", and deduce what the valid input  
values are), it is more efficient in space and time, and, to my eyes,  
more readable (I don't have to know what parse_dots() returns).  I  
suppose a case could also be made that Tim's code is more  
maintainable, given the discovery already of a misplaced (though  
benign) break and the possibility of a typo in all those calls to  
parse_dots().


Just my $.02

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On May 1, 2011, at 7:44 AM, Ralph Castain wrote:

Mostly because I thought it of some value to make the resulting mask  
readable and apparent to someone looking at the code.



On Apr 30, 2011, at 8:14 PM, Tim Mattox wrote:


Why not do this instead of a big switch statement?

pval = strtol(msk, NULL, 10);
if ((pval > 30) || (pval < 1)) {
opal_output(0, "opal_iftupletoaddr: unknown mask");
free(addr);
return OPAL_ERROR;
}
*mask = 0x << (32 - pval);


On Fri, Apr 29, 2011 at 1:56 PM,  <r...@osl.iu.edu> wrote:

Author: rhc
Date: 2011-04-29 13:56:15 EDT (Fri, 29 Apr 2011)
New Revision: 24665
URL: https://svn.open-mpi.org/trac/ompi/changeset/24665

Log:
Cover all the netmask values

Text files modified:
 trunk/opal/util/if.c |   103  
+--

 1 files changed, 96 insertions(+), 7 deletions(-)

Modified: trunk/opal/util/if.c
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 


--- trunk/opal/util/if.c(original)
+++ trunk/opal/util/if.c2011-04-29 13:56:15 EDT (Fri, 29  
Apr 2011)

@@ -534,13 +534,102 @@
* much of the addr to use: e.g., /16
*/
   pval = strtol(msk, NULL, 10);
-if (24 == pval) {
-*mask = 0xFF00;
-} else if (16 == pval) {
-*mask = 0x;
-} else if (8 == pval) {
-*mask = 0xFF00;
-} else {
+switch(pval) {
+case 30:
+*mask = parse_dots("255.255.255.252");
+break;
+case 29:
+*mask = parse_dots("255.255.255.248");
+break;
+case 28:
+*mask = parse_dots("255.255.255.240");
+break;
+case 27:
+*mask = parse_dots("255.255.255.224");
+break;
+case 26:
+*mask = parse_dots("255.255.255.192");
+break;
+case 25:
+*mask = parse_dots("255.255.255.128");
+break;
+case 24:
+break;
+*mask = parse_dots("255.255.255.0");
+break;
+case 23:
+*mask = parse_dots("255.255.254.0");
+break;
+case 22:
+*mask = parse_dots("255.255.252.0");
+break;
+case 21:
+*mask = parse_dots("255.255.248.0");
+break;
+case 20:
+*mask = parse_dots("255.255.240.0");
+break;
+case 19:
+*mask = parse_dots("255.255.224.0");
+break;
+case 18:
+*mask = parse_dots("255.255.192.0");
+break;
+case 17:
+*mask = parse_dots("255.255.128.0");
+break;
+case 16:
+*mask = parse_dots("255.255.0.0");
+break;
+case 15:
+*mask = parse_dots("255.254.0.0");
+break;
+case 14:
+*mask = parse_dots("255.252.0.0");
+break;
+case 13:
+*mask = parse_dots("255.248.0.0");
+break;
+case 12:
+*mask = parse_dots("255.240.0.0");
+break;
+case 11:
+*mask = parse_dots("255.224.0.0");
+break;
+case 10:
+

Re: [OMPI devel] ompi_mpi_init: orte_init failed

2011-01-20 Thread Larry Baker

Francis,

I cannot address your situation specifically, but I can tell you from  
experience that you must pay close attention to the version of Mac OS  
X for 32-bit/64-bit compiling.  gcc/gfortran defaults to 32-bit on OS  
X 10.5.  I am told they default to 64-bit on OS X 10.6.  Thus, to  
compile and link with the proper 64-bit libraries, you may have to  
specify -m64 at every step: when creating the OpenMPI library, when  
compiling your application, and when linking your application  
(presumably, the last two steps are done by an OpenMPI wrapper script  
for you).


On my (desktop) OS X 10.5 Mac, I used the following commands to patch  
(using my own patch script) and make OpenMPI 1.4.2:



# Patch OpenMPI 1.4.2
tar -xjf openmpi-1.4.2.tar.bz2
source patch-openmpi-1.4.2.sh

# Configure OpenMPI 1.4.2 for GNU compilers
cd openmpi-1.4.2
./configure >configure.log 2>&1 \
   --prefix=$HOME/Desktop/Software/OpenMPI/gnu \
   CC="gcc -m64" \
   CFLAGS="-g -fast" \
   CXX="g++ -m64" \
   CXXFLAGS="-g -fast" \
   FC="gfortran -m64" \
   FCFLAGS="-g -fast" \
   F77="gfortran -m64" \
   FFLAGS="-g -fast"

# Make the library
make >make.log 2>&1

# Validate the library
make check >check.log 2>&1


Then, I used the following commands to install it:


# Install OpenMPI 1.4.2
cd openmpi-1.4.2
make install >install.log 2>&1


Maybe the first thing you could try is to run make check on your  
laptop.  I don't have a laptop to try to replicate your failure.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Jan 20, 2011, at 2:00 PM, Francis Poulin wrote:


Hello,

I'm trying to build OpenMPI with fortran on my Mac OS machines using  
gfortran.  I'm using the 64-bit option and trying to install 1.4.2.   
It seems to build ok and when I compile and run simple programs it  
works.  When I try a more complicated case it works on my desktop  
but not my laptop.  The error that I get is shown below.   The fact  
that it works on my desktop shows me there's a problem with my build  
on my laptop.


Any ideas where I can look to fix it?

Thanks,
Francis

[[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/ 
orte_init.c at line 125

--
It looks like orte_init failed for some reason; your parallel  
process is

likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_base_select failed
 --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
--
It looks like MPI_INIT failed for some reason; your parallel process  
is

likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or  
environment

problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 ompi_mpi_init: orte_init failed
 --> Returned "Not found" (-13) instead of "Success" (0)
--
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.5rc5 has been posted

2010-09-03 Thread Larry Baker
Using MPI-2 (Gropp et al.) says MPI_SIZEOF() only supports numeric  
intrinsic data types.  So, I patched OpenMPI 1.4.2 to remove the  
declarations of the undefined Logical and Character specific  
procedures in ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh:



  output_197 MPI_Sizeof ${rank} CH "character${dim}"
  output_197 MPI_Sizeof ${rank} L "logical${dim}"


I also changed all the dummy array declarations in the INTERFACE  
declarations to use assumed-shape arrays, which is the correct Fortran  
90 method to declare the rank and extent of any actual array arguments.


I simplified both ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh and  
ompi/mpi/f90/scripts/mpi_sizeof.f90.sh.  In mpi-f90-interfaces.h.sh, I  
defined an array, array_dims, with the DIMENSION declarations, then  
replaced all the copies of dim= throughout the code with a reference  
to array_dims by rank:



array_dims[0]=''
array_dims[1]=', dimension(:)'
array_dims[2]=', dimension(:,:)'
array_dims[3]=', dimension(:,:,:)'
array_dims[4]=', dimension(:,:,:,:)'
array_dims[5]=', dimension(:,:,:,:,:)'
array_dims[6]=', dimension(:,:,:,:,:,:)'
array_dims[7]=', dimension(:,:,:,:,:,:,:)'

for rank in $allranks
do
  dim=${array_dims[${rank}]}


In mpi_sizeof.f90.sh, I copied the method to enumerate rank 0 with all  
the other ranks from the code in mpi-f90-interfaces.h.sh:



allranks="0 $ranks"

for rank in $allranks
do
  case "$rank" in  0)  dim=''  ;  esac
  case "$rank" in  1)  dim=', dimension(:)'  ;  esac
  case "$rank" in  2)  dim=', dimension(:,:)'  ;  esac
  case "$rank" in  3)  dim=', dimension(:,:,:)'  ;  esac
  case "$rank" in  4)  dim=', dimension(:,:,:,:)'  ;  esac
  case "$rank" in  5)  dim=', dimension(:,:,:,:,:)'  ;  esac
  case "$rank" in  6)  dim=', dimension(:,:,:,:,:,:)'  ;  esac
  case "$rank" in  7)  dim=', dimension(:,:,:,:,:,:,:)'  ;  esac


Here's the patch I used for OpenMPI 1.4.2:

# Remove declarations of Logical and Character specific procedures  
from
# Generic Subroutine MPI_SIZEOF and fix dummy arrays to be assumed- 
shape
mv openmpi-1.4.2/ompi/mpi/f90/scripts/mpi-f90- 
interfaces.h.sh{,.original}

sed -e $'34{p;
s/^.*$/array_dims[0]=\'\'/;p;
s/^.*$/array_dims[1]=\', dimension(:)\'/;p;
s/^.*$/array_dims[2]=\', dimension(:,:)\'/;p;
s/^.*$/array_dims[3]=\', dimension(:,:,:)\'/;p;
s/^.*$/array_dims[4]=\', dimension(:,:,:,:)\'/;p;
s/^.*$/array_dims[5]=\', dimension(:,:,:,:,:)\'/;p;
s/^.*$/array_dims[6]=\', dimension(:,:,:,:,:,:)\'/;p;
s/^.*$/array_dims[7]=\', dimension(:,:,:,:,:,:,:)\'/;p;
s/^.*$//;}' \
-e '/case "$rank" in  [0-6])  dim=/d' \
-e '/case "$rank" in  7)  dim=.*$/s//dim=${array_dims[$ 
{rank}]}/' \

-e '7129,7130d' \
openmpi-1.4.2/ompi/mpi/f90/scripts/mpi-f90- 
interfaces.h.sh.original \

>openmpi-1.4.2/ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh
chmod +x openmpi-1.4.2/ompi/mpi/f90/scripts/mpi-f90-interfaces.h.sh
mv openmpi-1.4.2/ompi/mpi/f90/scripts/mpi_sizeof.f90.sh{,.original}
sed -e '25,84d' \
-e '85s/^.*$/allranks="0 $ranks"/' \
-e '87s/\$ranks/$allranks/' \
-e $'88{p;s/^.*$/  case "$rank" in  0)  dim=\'\'  ;  esac/;}' \
-e $'89,95{s/dim=\'/dim=\', dimension(/;s/1,/:,/g;s/\*\'/:) 
\'/;}' \

-e '97,110d' \
-e '118s/, dimension(\${dim})/${dim}/' \
-e '133s/, dimension(\${dim})/${dim}/' \
-e '148s/, dimension(\${dim})/${dim}/' \
openmpi-1.4.2/ompi/mpi/f90/scripts/mpi_sizeof.f90.sh.original \
>openmpi-1.4.2/ompi/mpi/f90/scripts/mpi_sizeof.f90.sh
chmod +x openmpi-1.4.2/ompi/mpi/f90/scripts/mpi_sizeof.f90.sh


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Sep 1, 2010, at 5:09 PM, Larry Baker wrote:

OpenMPI 1.4.x and 1.5x fail to link a program that calls Subroutine  
MPI_SIZEOF using the PGI 10.3 compilers:



$ cat junk.f90
 Use MPI
 Implicit None
 Integer var, size, err
 Call MPI_SIZEOF( var, size, err )
 Write (*,*) 'Size of Integer var is ', size, ' bytes.'
 Stop
 End

$ /opt/pgi/linux86-64/current/openmpi/bin/mpif90 -o junk junk.f90
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof1dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof4dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof3dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof4dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof2dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof2dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof3dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: 

Re: [OMPI devel] 1.5rc5 has been posted

2010-09-01 Thread Larry Baker
OpenMPI 1.4.x and 1.5x fail to link a program that calls Subroutine  
MPI_SIZEOF using the PGI 10.3 compilers:



$ cat junk.f90
  Use MPI
  Implicit None
  Integer var, size, err
  Call MPI_SIZEOF( var, size, err )
  Write (*,*) 'Size of Integer var is ', size, ' bytes.'
  Stop
  End

$ /opt/pgi/linux86-64/current/openmpi/bin/mpif90 -o junk junk.f90
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof1dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof4dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof3dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof4dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof2dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof2dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof3dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof1dch_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof0dl_'
/opt/pgi/linux86-64/10.3/openmpi/lib/libmpi_f90.so: undefined  
reference to `mpi_sizeof0dch_'

make: *** [junk] Error 2


This is because the specific procedures declared in ompi/mpi/f90/mpi- 
f90-interfaces.h are not all implemented in ompi/mpi/f90/ 
mpi_sizeof.f90.  It does not seem to matter to gfortran, but PGI  
Fortran 90 cares.


The root of the problem is the inconsistency between the methods used  
to enumerate the specific procedures in ompi/mpi/f90/scripts/mpi-f90- 
interfaces.h.sh (the declarations) and ompi/mpi/f90/scripts/ 
mpi_sizeof.f90.sh (the implementations).  The Character and Logical  
implementations are missing.  mpi_sizeof.f90.sh generates lkinds  
specific Logical procedure implementations.  However, since lkinds is  
not defined in ompi/mpi/f90/fortran_kinds.sh, there are none.  Even if  
lkinds were defined, mpi-f90-interfaces.h.sh declares a single  
(nameless) kind of Logical procedure, while mpi_sizeof.f90.sh (would,  
if there were any) decorates the name of each Logical procedure  
implementation with the kind.  And, mpi_sizeof.f90.sh completely  
leaves out the Character procedure implementations.


I will work out a fix for this in the next few days, unless the author  
wants to.  Is that okay?


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 17, 2010, at 2:18 PM, Jeff Squyres wrote:


We still have one known possible regression:

https://svn.open-mpi.org/trac/ompi/ticket/2530

But we posted rc5 anyway (there's a bunch of stuff that has been  
pending for a while that is now in).  Please test!


http://www.open-mpi.org/software/ompi/v1.5/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.5rc5 has been posted

2010-09-01 Thread Larry Baker
I managed to compile OpenMPI 1.5rc5 on Linux x86_64 using the PGI 10.3  
compilers.  All validation tests passed.  I have attached the  
procedure I followed and the patches I applied to 1.5rc5.  I did not  
spend the time to find out how to fix configure to include -pthread in  
the LIBS Makefile variable definition; I made a brute-force change to  
all the Makefiles after configure ran.  (FYI: make recreates all the  
Makefiles -- I don't know why that is.)  Also, my patch to otfprofile  
will require fixes to configure/libtool to determine the proper  
selection of the -mp option for pre-10.x PGI compilers.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov


Development environment:


[baker@hydra ~]$ cat /etc/redhat-release
CentOS release 4.5 (Final)

[baker@hydra ~]$ uname -a
Linux hydra.wr.usgs.gov 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 26  
14:14:47 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux


[baker@hydra ~]$ rpm -q -a | grep gcc
gcc-3.4.6-8
gcc4-4.1.1-53.EL4
compat-libgcc-296-2.96-132.7.2
libgcc-3.4.6-8
globus_scheduler_event_generator_sge_gcc64_rtl-1.1-0
gcc-c++-3.4.6-8
libgcc-3.4.6-8
globus_scheduler_event_generator_sge_gcc64_dev-1.1-0
gcc-g77-3.4.6-8
gcc4-gfortran-4.1.1-53.EL4


Login as root, then type:


[root@hydra ~]# cd /usr/local/src
[root@hydra src]# sh make-openmpi-1.5rc5-pgi.sh


make-openmpi-1.5rc5-pgi.sh:


# Patch OpenMPI 1.5rc5
cd /usr/local/src
tar -xjf openmpi-1.5rc5.tar.bz2
source patch-openmpi-1.5rc5.sh

# Configure OpenMPI 1.5rc5 for PGI 10.3 compilers
cd openmpi-1.5rc5
export PGI_DIR=/opt/pgi/linux86-64/10.3
export PATH=$PGI_DIR/bin:$PATH
./configure >configure.log 2>&1 \
   --prefix=$PGI_DIR/openmpi --with-sge \
   --with-wrapper-cflags="-DNO_PGI_OFFSET" \
   --with-wrapper-cxxflags="-DNO_PGI_OFFSET" \
   CC="pgcc -m64" \
   CFLAGS="-g -O3 -tp amd64 -DNO_PGI_OFFSET" \
   CXX="pgcpp -m64" \
   CXXFLAGS="-g -O3 -tp amd64 -DNO_PGI_OFFSET" \
   FC="pgf90 -m64" \
   FCFLAGS="-g -O3 -tp amd64" \
   F77="pgf90 -m64" \
   FFLAGS="-g -O3 -tp amd64"

# Would like to do Makefile fixups here, but make recreates them

# Do the make until the link fails
make >make_part1.log 2>&1

# Apply Makefile fixups to add -lpthread to the LIBS variable
find . -name Makefile -exec cp {} {}.original ';' \
-exec sh -c "sed -e '/^LIBS = -lnsl  -lutil$/s/$/  - 
lpthread/' {}.original >{}" ';'


# Finish the make
make >make_part2.log 2>&1

# Validate the library
make check >check.log 2>&1



patch-openmpi-1.5rc5.sh:


# Fixes to correctly identify PGI compiler versions 1 through 5
mv openmpi-1.5rc5/config/libtool.m4{,.original}
sed -e '5899s/\[\[1-5\]\]\*/\[\[1-5\]\].\*/g' \
openmpi-1.5rc5/config/libtool.m4.original \
>openmpi-1.5rc5/config/libtool.m4
mv openmpi-1.5rc5/opal/libltdl/m4/libtool.m4{,.original}
sed -e '5899s/\[\[1-5\]\]\*/\[\[1-5\]\].\*/g' \
openmpi-1.5rc5/opal/libltdl/m4/libtool.m4.original \
>openmpi-1.5rc5/opal/libltdl/m4/libtool.m4

# Disable inline assembly for PGI C++, as is done for PGI C (26246),  
and

# Fix PGI C compiler warning (11146, 19215): Pragma ignored - string
# expected after #pragma ident
mv openmpi-1.5rc5/configure{,.original}
sed -e '26246{x;s/^.*$/if test "$ompi_cv_cxx_compiler_vendor" =  
"portland group" ; then/;p;
s/^.*$/# PGI seems to have some issues with  
our inline assembly./;p;

s/^.*$/# Disable for now./;p;
s/^.*$/asm_result="no (Portland Group)"/;p;
s/^.*$/else/;G;}' \
-e '26369{x;s/^.*$/fi/;G;}' \
-e '11146{s/#pragma ident/#define IDENT/;p;
  s/^.*$/#pragma ident \$IDENT/;}' \
-e '19215{s/#pragma ident/#define IDENT/;p;
  s/^.*$/#pragma ident \$IDENT/;}' \
openmpi-1.5rc5/configure.original \
>openmpi-1.5rc5/configure
chmod +x openmpi-1.5rc5/configure

# Fix PGI compiler warning: Array name used in logical expression
mv openmpi-1.5rc5/opal/libltdl/ltdl.h{,.original}
sed -e '44s/((s) && (s)\[0\])/(s!=NULL)/' \
openmpi-1.5rc5/opal/libltdl/ltdl.h.original \
>openmpi-1.5rc5/opal/libltdl/ltdl.h

# Fix PGI compiler warning: Redefinition of symbol assert (364-370)  
and
# Pointer value created from a nonlong integral type (444, 459,  
3446, 3664, 3789)

mv openmpi-1.5rc5/opal/mca/memory/ptmalloc2/hooks.c{,.original}
sed -e '444s/: 0;/: NULL;/' \
-e '459s/: 0;/: NULL;/' \
openmpi-1.5rc5/opal/mca/memory/ptmalloc2/hooks.c.original \
>openmpi-1.5rc5/opal/mca/memory/ptmalloc2/hooks.c
mv openmpi-1.5rc5/opal/mca/memory/ptmalloc2/malloc.c{,.original}
sed -e '364,369d' \
-e '370{s/^.*$/#if MALLOC_DEBUG \&\& defined( NDEBUG )/;p;
s/^.*$/#error -DMALLOC_DEBUG is inconsistent with - 
DNDEBUG/;p;

s/^.*$/#endif/;p;
s/^.*$//;p;
s/

[OMPI devel] Fwd: Fwd: 1.5rc5 has been posted

2010-09-01 Thread Larry Baker

I found the bug in otfprofile.

When ompi/contrib/vt/vt/extlib/otf/tools/otfprofile/otfprofile.cpp is  
compiled with the PGI C++ compiler, two "expected an identifier"  
errors occur:



"/opt/pgi/linux86-64/10.3/include/omp.h", line 41: error: expected an
 identifier
 extern int omp_get_thread_num(void);
^

"/opt/pgi/linux86-64/10.3/include/omp.h", line 43: error: expected an
 identifier
 extern int omp_get_num_threads(void);



I saved the preprocessor output for otfprofile.cpp.  The code in /opt/ 
pgi/linux86-64/10.3/include/omp.h:



#ifdef __cplusplus
extern "C" {
#endif

extern void omp_set_num_threads(int n);
extern int omp_get_thread_num(void);
extern int omp_get_num_procs(void);
extern int omp_get_num_threads(void);
extern int omp_get_max_threads(void);


expands to:



extern "C" {


extern void omp_set_num_threads(int n);
extern int 0;
extern int omp_get_num_procs(void);
extern int 1;
extern int omp_get_max_threads(void);


It is easy to see why the compiler issued the error.  The root of the  
problem is the definition of the OpenMP function proxys when the PGI  
compiler is used:


/* Disable OpenMP if the PGI compiler is used to work around the  
following errors:


compiler version  compiler error
< 9.0-3   PGCC-S--Internal compiler error.  
calc_dw_tag:no tag

(see Technical Problem Report 4337 at 
http://www.pgroup.com/support/release_tprs_90.htm)

10.1 - 10.6   this kind of pragma may not be used here
  #pargma omp barrier
*/
#if defined(_OPENMP) && defined(__PGI)
#   undef _OPENMP
#endif


#ifdef _OPENMP
#   include 
#else
#   define omp_get_thread_num() 0
#   define omp_get_num_threads() 1
#endif


Later in otfprofile.cpp, the #include "Summary.h" eventually causes / 
opt/pgi/linux86-64/10.3/include/omp.h to be included, which leads to  
the syntax error.


This is not the way to enable/disable OpenMP.  _OPENMP is  
informational only.  In fact, the PGI C++ compiler does not use  
_OPENMP internally to control whether  is included, which is  
why #undef _OPENMP is ineffective.  The proper way to deal with this  
is using configure/libtool.  I changed the code to ignore the __PGI  
macro:


/* Disable OpenMP if the PGI compiler is used to work around the  
following errors:


compiler version  compiler error
< 9.0-3   PGCC-S--Internal compiler error.  
calc_dw_tag:no tag

(see Technical Problem Report 4337 at 
http://www.pgroup.com/support/release_tprs_90.htm)

*/

#ifdef _OPENMP
#   include 
#endif


The code compiles fine for PGI C++ 10.3.  I believe the comment about  
10.1-10.6 not working is possibly due to the (previously reported)  
mistaken identification of the 10.x compilers by configure/libtool,  
which I fixed.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

Begin forwarded message:


From: Larry Baker <ba...@usgs.gov>
Date: August 31, 2010 5:21:13 PM PDT
To: Open MPI Developers <de...@open-mpi.org>
Subject: [OMPI devel] Fwd:  1.5rc5 has been posted
Reply-To: Open MPI Developers <de...@open-mpi.org>

My head hurts from working on this!  I just realized  is for  
OpenMP, not OpenMPI.  So, of course the PGI  is used.  I  
still don't know why otfprofile is failing, but at least that  
explains why OpenMPI-1.5rc5 has no .


Sorry for the noise.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

Begin forwarded message:


From: Larry Baker <ba...@usgs.gov>
Date: August 31, 2010 10:04:35 AM PDT
To: Open MPI Developers <de...@open-mpi.org>
Subject: Re: [OMPI devel] 1.5rc5 has been posted
Reply-To: Open MPI Developers <de...@open-mpi.org>

The make of OpenMPI 1.5rc5 fails for PGI 10.3 in otfprofile:


Making all in otfprofile
make[9]: Entering directory `/usr/local/src/openmpi-1.5rc5/ompi/ 
contrib/vt/vt/extlib/otf/tools/otfprofile'

 CXXotfprofile-otfprofile.o
"/opt/pgi/linux86-64/10.3/include/omp.h", line 41: error: expected  
an

 identifier
 extern int omp_get_thread_num(void);
^

"/opt/pgi/linux86-64/10.3/include/omp.h", line 43: error: expected  
an

 identifier
 extern int omp_get_num_threads(void);
^

2 errors detected in the compilation of "otfprofile.cpp".


The errors are coming from an  file that comes with the PGI  
compiler.  I would think OpenMPI would use its own.  The problem  
is, there isn't one (yet?):


[root@hydra otfprofile]# find /usr/local/src/openmpi-1.5rc5 -name  
omp.h


The C++ file that is using the PGI  file is  ompi/contrib/vt/ 
vt/extlib/otf/tools/otfprofile/otfprofile.cpp:


[root@hydra otfprofile]# cd ompi/contrib/vt/vt/extlib/otf/tools/ 
otfprofile

[root@hydra otfprofile]# grep omp.h *.cpp
otfprofile.cpp:#include 


I ran the compile from make -n to verify that:

[root@hydra otfprofile]# pgcpp -m64 -DHAVE_CONFIG_H -I. -I../.. - 

[OMPI devel] Fwd: 1.5rc5 has been posted

2010-08-31 Thread Larry Baker
My head hurts from working on this!  I just realized  is for  
OpenMP, not OpenMPI.  So, of course the PGI  is used.  I still  
don't know why otfprofile is failing, but at least that explains why  
OpenMPI-1.5rc5 has no .


Sorry for the noise.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

Begin forwarded message:


From: Larry Baker <ba...@usgs.gov>
Date: August 31, 2010 10:04:35 AM PDT
To: Open MPI Developers <de...@open-mpi.org>
Subject: Re: [OMPI devel] 1.5rc5 has been posted
Reply-To: Open MPI Developers <de...@open-mpi.org>

The make of OpenMPI 1.5rc5 fails for PGI 10.3 in otfprofile:


Making all in otfprofile
make[9]: Entering directory `/usr/local/src/openmpi-1.5rc5/ompi/ 
contrib/vt/vt/extlib/otf/tools/otfprofile'

 CXXotfprofile-otfprofile.o
"/opt/pgi/linux86-64/10.3/include/omp.h", line 41: error: expected an
 identifier
 extern int omp_get_thread_num(void);
^

"/opt/pgi/linux86-64/10.3/include/omp.h", line 43: error: expected an
 identifier
 extern int omp_get_num_threads(void);
^

2 errors detected in the compilation of "otfprofile.cpp".


The errors are coming from an  file that comes with the PGI  
compiler.  I would think OpenMPI would use its own.  The problem is,  
there isn't one (yet?):


[root@hydra otfprofile]# find /usr/local/src/openmpi-1.5rc5 -name  
omp.h


The C++ file that is using the PGI  file is  ompi/contrib/vt/ 
vt/extlib/otf/tools/otfprofile/otfprofile.cpp:


[root@hydra otfprofile]# cd ompi/contrib/vt/vt/extlib/otf/tools/ 
otfprofile

[root@hydra otfprofile]# grep omp.h *.cpp
otfprofile.cpp:#include 


I ran the compile from make -n to verify that:

[root@hydra otfprofile]# pgcpp -m64 -DHAVE_CONFIG_H -I. -I../.. - 
I../../otflib -I../../otflib  -DINSIDE_OPENMPI  -D_REENTRANT -mp -g  
-O3 -tp amd64 -DNO_PGI_OFFSET -c -o otfprofile-otfprofile.o `test - 
f 'otfprofile.cpp' || echo './'`otfprofile.cpp

"/opt/pgi/linux86-64/10.3/include/omp.h", line 41: error: expected an
 identifier
 extern int omp_get_thread_num(void);
^

"/opt/pgi/linux86-64/10.3/include/omp.h", line 43: error: expected an
 identifier
 extern int omp_get_num_threads(void);
^

2 errors detected in the compilation of "otfprofile.cpp".


I don't know how to fix this.  Where is otfprofile.cpp expecting to  
get ?  Why isn't it there?  I'm beginning to think this  
contrib/vt stuff should not be enabled by default.  I don't know  
that it is needed in general.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 17, 2010, at 2:18 PM, Jeff Squyres wrote:


We still have one known possible regression:

   https://svn.open-mpi.org/trac/ompi/ticket/2530

But we posted rc5 anyway (there's a bunch of stuff that has been  
pending for a while that is now in).  Please test!


   http://www.open-mpi.org/software/ompi/v1.5/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.5rc5 has been posted

2010-08-31 Thread Larry Baker

The make of OpenMPI 1.5rc5 fails for PGI 10.3 in otfprofile:


Making all in otfprofile
make[9]: Entering directory `/usr/local/src/openmpi-1.5rc5/ompi/ 
contrib/vt/vt/extlib/otf/tools/otfprofile'

  CXXotfprofile-otfprofile.o
"/opt/pgi/linux86-64/10.3/include/omp.h", line 41: error: expected an
  identifier
  extern int omp_get_thread_num(void);
 ^

"/opt/pgi/linux86-64/10.3/include/omp.h", line 43: error: expected an
  identifier
  extern int omp_get_num_threads(void);
 ^

2 errors detected in the compilation of "otfprofile.cpp".


The errors are coming from an  file that comes with the PGI  
compiler.  I would think OpenMPI would use its own.  The problem is,  
there isn't one (yet?):


[root@hydra otfprofile]# find /usr/local/src/openmpi-1.5rc5 -name  
omp.h


The C++ file that is using the PGI  file is  ompi/contrib/vt/vt/ 
extlib/otf/tools/otfprofile/otfprofile.cpp:


[root@hydra otfprofile]# cd ompi/contrib/vt/vt/extlib/otf/tools/ 
otfprofile

[root@hydra otfprofile]# grep omp.h *.cpp
otfprofile.cpp:#include 


I ran the compile from make -n to verify that:

[root@hydra otfprofile]# pgcpp -m64 -DHAVE_CONFIG_H -I. -I../.. - 
I../../otflib -I../../otflib  -DINSIDE_OPENMPI  -D_REENTRANT -mp -g - 
O3 -tp amd64 -DNO_PGI_OFFSET -c -o otfprofile-otfprofile.o `test -f  
'otfprofile.cpp' || echo './'`otfprofile.cpp

"/opt/pgi/linux86-64/10.3/include/omp.h", line 41: error: expected an
  identifier
  extern int omp_get_thread_num(void);
 ^

"/opt/pgi/linux86-64/10.3/include/omp.h", line 43: error: expected an
  identifier
  extern int omp_get_num_threads(void);
 ^

2 errors detected in the compilation of "otfprofile.cpp".


I don't know how to fix this.  Where is otfprofile.cpp expecting to  
get ?  Why isn't it there?  I'm beginning to think this contrib/ 
vt stuff should not be enabled by default.  I don't know that it is  
needed in general.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 17, 2010, at 2:18 PM, Jeff Squyres wrote:


We still have one known possible regression:

https://svn.open-mpi.org/trac/ompi/ticket/2530

But we posted rc5 anyway (there's a bunch of stuff that has been  
pending for a while that is now in).  Please test!


http://www.open-mpi.org/software/ompi/v1.5/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Fwd: 1.5rc5 has been posted

2010-08-30 Thread Larry Baker
The same problem (LIBS = is missing -lpthread) occurs in orte/tools/ 
{orte-clean,orte-iof,orte-ps,orted,orterun,orte-top}/Makefile.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

Begin forwarded message:


From: Larry Baker <ba...@usgs.gov>
Date: August 30, 2010 4:48:01 PM PDT
To: Open MPI Developers <de...@open-mpi.org>
Subject: Re: [OMPI devel] 1.5rc5 has been posted

To follow up on http://www.open-mpi.org/community/lists/devel/2010/08/8417.php 
: OpenMPI 1.5rc5 fails in opal/tools/wrappers for PGI 10.3.


The problem appears to be a missing -lpthread in the definition of  
most of the *LIBS variables in OpenMPI 1.5rc5 opal/tools/wrappers/ 
Makefile:


[root@hydra src]# diff openmpi-{1.4.2,1.5rc5}/opal/tools/wrappers/ 
Makefile | grep lutil

< LIBS = -lnsl -lutil  -lpthread
> LIBS = -lnsl  -lutil
< OMPI_WRAPPER_EXTRA_LIBS =   -ldl   -Wl,--export-dynamic -lnsl - 
lutil -lpthread -ldl
> OMPI_WRAPPER_EXTRA_LIBS =   -ldl   -Wl,--export-dynamic -lnsl - 
lutil -ldl
< OPAL_WRAPPER_EXTRA_LIBS = -ldl   -Wl,--export-dynamic -lnsl - 
lutil -lpthread -ldl
> OPAL_WRAPPER_EXTRA_LIBS = -ldl   -Wl,--export-dynamic -lnsl - 
lutil -ldl
< ORTE_WRAPPER_EXTRA_LIBS =  -ldl   -Wl,--export-dynamic -lnsl - 
lutil -lpthread -ldl
> ORTE_WRAPPER_EXTRA_LIBS =  -ldl   -Wl,--export-dynamic -lnsl - 
lutil -ldl
< WRAPPER_EXTRA_LIBS =   -ldl   -Wl,--export-dynamic -lnsl -lutil - 
lpthread -ldl

> WRAPPER_EXTRA_LIBS = -ldl   -Wl,--export-dynamic -lnsl -lutil -ldl
< crs_blcr_LIBS = -lnsl -lutil  -lpthread
> crs_blcr_LIBS = -lnsl  -lutil -lpthread


[root@hydra src]# diff openmpi-{1.4.2,1.5rc5}/opal/tools/wrappers/ 
Makefile | grep LINK

< LINK = $(LIBTOOL) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
> LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \
<$(LINK) $(opal_wrapper_OBJECTS) $(opal_wrapper_LDADD) $(LIBS)
> 	$(AM_V_CCLD)$(LINK) $(opal_wrapper_OBJECTS) $ 
(opal_wrapper_LDADD) $(LIBS)


I don't know anything about automake, so I don't know what code to  
look at that changed between 1.4.2 and 1.5rc5 that defines the *LIBS  
Makefile variables.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 17, 2010, at 2:18 PM, Jeff Squyres wrote:


We still have one known possible regression:

   https://svn.open-mpi.org/trac/ompi/ticket/2530

But we posted rc5 anyway (there's a bunch of stuff that has been  
pending for a while that is now in).  Please test!


   http://www.open-mpi.org/software/ompi/v1.5/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






Re: [OMPI devel] 1.5rc5 has been posted

2010-08-30 Thread Larry Baker
The fix I posted in http://www.open-mpi.org/community/lists/devel/2010/08/8311.php 
 for the Redefinition of symbol assert causes a link failure of  
opal_wrapper.  This is because there are assert() calls in opal/mca/ 
memory/ptmalloc2/arena.c, which is included in opal/mca/memory/ 
ptmalloc2/malloc.c before the conditional on MALLOC_DEBUG, which is  
where I moved #include .  arena.c does not contain its own  
#include .  I changed the patch to opal/mca/memory/ptmalloc2/ 
malloc.c to define assert where it was before, but in such a way that  
it always uses the system  header file to define the assert  
macro.


In opal/mca/memory/ptmalloc2/malloc.c, change lines 364-369 from:


#if MALLOC_DEBUG
#include 
#else
#undef assert
#define assert(x) ((void)0)
#endif


to:


#if MALLOC_DEBUG && defined( NDEBUG )
#error -DMALLOC_DEBUG is inconsistent with -DNDEBUG
#endif

#include 


The reason the conditional uses the value of MALLOC_DEBUG, but  
defined( NDEBUG ), is that the code that depends on MALLOC_DEBUG uses  
#if MALLOC_DEBUG conditionals, while  uses #ifdef NDEBUG to  
define the assert macro.  I used the same semantics to detect the  
inconsistency between MALLOC_DEBUG and NDEBUG.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 23, 2010, at 5:29 PM, Larry Baker wrote:

The PGI C compiler complains (issues a warning) for the redefinition  
of the assert macro in opal/mca/memory/ptmalloc2/malloc.c:


Making all in mca/memory/ptmalloc2
make[2]: Entering directory `/home/baker/openmpi-1.5rc5/opal/mca/ 
memory/ptmalloc2'

 CC opal_ptmalloc2_component.lo
 CC opal_ptmalloc2_munmap.lo
 CC malloc.lo
PGC-W-0221-Redefinition of symbol assert (/usr/include/assert.h: 51)
PGC-W-0258-Argument 1 in macro assert is not identical to previous  
definition (/usr/include/assert.h: 51)


FYI.  assert.h is an unusual include file -- it does not use an  
ifdef guard macro in the usual way, but undef's assert if the guard  
macro is defined (NOT if assert is defined, which is the root cause  
of this warning), define's the guard macro, then (re)define's  
assert() based on the current value of NDEBUG.


opal/mca/memory/ptmalloc2/malloc.c did not change from OpenMPI  
1.4.2.  malloc.c include's opal/mca/memory/ptmalloc2/hooks.c, which  
did change in OpenMPI 1.5rc5.  hooks.c indirectly include's  
 through opal/mca/base/mca_base_param.h.  This is where  
the warning occurs.


malloc.c define's its own assert macro in lines 364-369:

364 #if MALLOC_DEBUG
365 #include 
366 #else
367 #undef assert
368 #define assert(x) ((void)0)
369 #endif

The warning occurs because the definition of assert in line 368 is  
not the same as the definition in :


# define assert(expr)   (__ASSERT_VOID_CAST (0))

However, there is no reason to define assert here -- the only code  
in malloc.c that needs assert is already inside an #if !  
MALLOC_DEBUG conditional at line 2450.


The fix is to delete lines 364-396 in opal/mca/memory/ptmalloc2/ 
malloc.c and move the #include  to be inside the  
conditional between lines 2459 and 2461:


2459 #else

#include 

2461 #define check_chunk(A,P)  do_check_chunk(A,P)


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 17, 2010, at 2:18 PM, Jeff Squyres wrote:


We still have one known possible regression:

   https://svn.open-mpi.org/trac/ompi/ticket/2530

But we posted rc5 anyway (there's a bunch of stuff that has been  
pending for a while that is now in).  Please test!


   http://www.open-mpi.org/software/ompi/v1.5/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.5rc5 has been posted

2010-08-30 Thread Larry Baker
To follow up on http://www.open-mpi.org/community/lists/devel/2010/08/8417.php 
: OpenMPI 1.5rc5 fails in opal/tools/wrappers for PGI 10.3.


The problem appears to be a missing -lpthread in the definition of  
most of the *LIBS variables in OpenMPI 1.5rc5 opal/tools/wrappers/ 
Makefile:


[root@hydra src]# diff openmpi-{1.4.2,1.5rc5}/opal/tools/wrappers/ 
Makefile | grep lutil

< LIBS = -lnsl -lutil  -lpthread
> LIBS = -lnsl  -lutil
< OMPI_WRAPPER_EXTRA_LIBS =   -ldl   -Wl,--export-dynamic -lnsl - 
lutil -lpthread -ldl
> OMPI_WRAPPER_EXTRA_LIBS =   -ldl   -Wl,--export-dynamic -lnsl - 
lutil -ldl
< OPAL_WRAPPER_EXTRA_LIBS = -ldl   -Wl,--export-dynamic -lnsl -lutil  
-lpthread -ldl
> OPAL_WRAPPER_EXTRA_LIBS = -ldl   -Wl,--export-dynamic -lnsl -lutil  
-ldl
< ORTE_WRAPPER_EXTRA_LIBS =  -ldl   -Wl,--export-dynamic -lnsl - 
lutil -lpthread -ldl
> ORTE_WRAPPER_EXTRA_LIBS =  -ldl   -Wl,--export-dynamic -lnsl - 
lutil -ldl
< WRAPPER_EXTRA_LIBS =   -ldl   -Wl,--export-dynamic -lnsl -lutil - 
lpthread -ldl

> WRAPPER_EXTRA_LIBS = -ldl   -Wl,--export-dynamic -lnsl -lutil -ldl
< crs_blcr_LIBS = -lnsl -lutil  -lpthread
> crs_blcr_LIBS = -lnsl  -lutil -lpthread


[root@hydra src]# diff openmpi-{1.4.2,1.5rc5}/opal/tools/wrappers/ 
Makefile | grep LINK

< LINK = $(LIBTOOL) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) \
> LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \
<$(LINK) $(opal_wrapper_OBJECTS) $(opal_wrapper_LDADD) $(LIBS)
> 	$(AM_V_CCLD)$(LINK) $(opal_wrapper_OBJECTS) $(opal_wrapper_LDADD)  
$(LIBS)


I don't know anything about automake, so I don't know what code to  
look at that changed between 1.4.2 and 1.5rc5 that defines the *LIBS  
Makefile variables.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 17, 2010, at 2:18 PM, Jeff Squyres wrote:


We still have one known possible regression:

https://svn.open-mpi.org/trac/ompi/ticket/2530

But we posted rc5 anyway (there's a bunch of stuff that has been  
pending for a while that is now in).  Please test!


http://www.open-mpi.org/software/ompi/v1.5/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.5rc5 has been posted

2010-08-30 Thread Larry Baker
/../opal/libopen- 
pal.la -lnsl  -lutil -lpthread


It looks like the changes in the opal/tools/wrappers/Makefile  
(configure/automake?) from 1.4.2 to 1.5rc5 are not supplying the  
pthreads library correctly to the link step.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 17, 2010, at 2:18 PM, Jeff Squyres wrote:


We still have one known possible regression:

https://svn.open-mpi.org/trac/ompi/ticket/2530

But we posted rc5 anyway (there's a bunch of stuff that has been  
pending for a while that is now in).  Please test!


http://www.open-mpi.org/software/ompi/v1.5/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] OpenMPI 1.4.2 and PathScale 3.2 C++

2010-08-27 Thread Larry Baker
The PathScale 3.2 C++ compiler segment faults for optimization levels  
higher than -O1 (-O2 is the default).  This is for OpenMPI 1.4.2 — my  
first attempt to compile using the PathScale compilers.  I could not  
find any -WOPT options to eliminate the error.  I don't understand the  
current state of the PathScale compilers.  I think the company changed  
hands since we bought the product and the last owner went bankrupt.  I  
think the name has been resurrected by the people that wrote the  
compiler.  They use the same name, but I get the impression that our  
license is with a company that is gone now, not with them, so they  
want us to buy a new license.  Anyway, there may be a newer compiler  
than 3.2 that does not have this problem.


[root@hydra vtfilter]# pathCC -v
PathScale(TM) Compiler Suite: Version 3.2
Built on: 2008-06-16 16:45:36 -0700
Thread model: posix
GNU gcc version 3.3.1 (PathScale 3.2 driver)

[root@hydra vtfilter]# ./configure >configure.log 2>&1 \
   --prefix=$PATHSCALE_DIR/openmpi --with-sge \
   CC="pathcc -m64" \
   CFLAGS="-g -O3 -march=anyx86" \
   CXX="pathCC -m64" \
   CXXFLAGS="-g -O1 -march=anyx86" \
   FC="pathf90 -m64" \
   FCFLAGS="-g -O3 -march=anyx86 -fno-second-underscore" \
   F77="pathf90 -m64" \
   FFLAGS="-g -O3 -march=anyx86 -fno-second-underscore"

[root@hydra vtfilter]# make



Making all in vtfilter
make[6]: Entering directory `/usr/local/src/openmpi-1.4.2/ompi/contrib/ 
vt/vt/tools/vtfilter'




pathCC -m64 -DHAVE_CONFIG_H -I. -I../.. -I../../extlib/otf/otflib - 
I../../extlib/otf/otflib -I../../vtlib/ -I../../vtlib  - 
DINSIDE_OPENMPI   -D_GNU_SOURCE -mp -DVT_OMP -g -O3 -march=anyx86 -MT  
vtfilter-vt_tracefilter.o -MD -MP -MF .deps/vtfilter- 
vt_tracefilter.Tpo -c -o vtfilter-vt_tracefilter.o `test -f  
'vt_tracefilter.cc' || echo './'`vt_tracefilter.cc
Signal: Segmentation fault in Global Optimization -- Dead Store  
Elimination phase.
Error: Signal Segmentation fault in phase Global Optimization -- Dead  
Store Elimination -- processing aborted

*** Internal stack backtrace:
pathCC INTERNAL ERROR: /opt/pathscale/lib/3.2/be died due to signal 4
Please report this problem to <supp...@pathscale.com>.
Problem report saved as /root/.ekopath-bugs/pathCC_error_weMqHF.ii
Please review the above file and, if possible, attach it to your  
problem report.

make[6]: *** [vtfilter-vt_tracefilter.o] Error 1



Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov




Re: [OMPI devel] 1.5rc5 has been posted

2010-08-19 Thread Larry Baker
The PGI C compiler complains (issues a warning) for the #pragma ident  
in opal/runtime/opal_init.c:


PGC-W-0281-Pragma ignored - string expected after #pragma ident  (../ 
opal/util/sys_limits.h: 58)


This is because the PCI C compiler does not (whereas, curiously, the  
PGI C++ compiler does) support macro substitution in a #pragma ident,  
which is the form of all the #pragma ident statements in OpenMPI:


[baker@hydra openmpi-1.5rc5]$ find . -name \*.c -exec grep -w "#pragma  
ident" {} ';' -print

#pragma ident OMPI_IDENT_STRING
./ompi/runtime/ompi_mpi_init.c
#pragma ident OMPI_IDENT_STRING
./ompi/mpi/f77/init_f.c
#pragma ident OPAL_IDENT_STRING
./opal/runtime/opal_init.c
#pragma ident ORTE_IDENT_STRING
./orte/runtime/orte_init.c

The test for support of the #pragma ident feature in configure (which  
appears twice: once for the C compiler, and once for the C++ compiler)  
tests only the case when the pragma has a literal string argument:


ompi_ident="string_not_coincidentally_inserted_by_the_compiler"
cat > conftest.c <However, that is not the form used by OpenMPI.  The code in configure  
should be changed to test the actual form of #pragma ident used by  
OpenMPI:


ompi_ident="string_not_coincidentally_inserted_by_the_compiler"
cat > conftest.c <When this code is used instead (twice), the PGI C compiler fails the  
configure test for #prgma ident support, as it should.


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 17, 2010, at 2:18 PM, Jeff Squyres wrote:


We still have one known possible regression:

https://svn.open-mpi.org/trac/ompi/ticket/2530

But we posted rc5 anyway (there's a bunch of stuff that has been  
pending for a while that is now in).  Please test!


http://www.open-mpi.org/software/ompi/v1.5/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] 1.5rc5 has been posted

2010-08-19 Thread Larry Baker
The PGI compiler version number parsing in the libtool.m4 files is  
incorrect.  It should be like the parser in configure, i.e., there  
should be a period between "]]" and "*"


[baker@hydra openmpi-1.5rc5]$ grep '1-5' configure
*pgCC\ [1-5].* | *pgcpp\ [1-5].*)

[baker@hydra openmpi-1.5rc5]$ find . -name libtool.m4 -exec grep  
'1-5' {} ';' -print

*pgCC\ [[1-5]]* | *pgcpp\ [[1-5]]*)
./config/libtool.m4
*pgCC\ [[1-5]]* | *pgcpp\ [[1-5]]*)
./opal/libltdl/m4/libtool.m4

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

On Aug 17, 2010, at 2:18 PM, Jeff Squyres wrote:


We still have one known possible regression:

https://svn.open-mpi.org/trac/ompi/ticket/2530

But we posted rc5 anyway (there's a bunch of stuff that has been  
pending for a while that is now in).  Please test!


http://www.open-mpi.org/software/ompi/v1.5/

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Fortran mpi module missing MPI_CART_SHIFT

2009-10-15 Thread Larry Baker
OpenMPI 1.3.3 (ompi_info output below), Intel 11.1 compilers, Mac OS X  
10.5.8.


With Fortran 90 "use mpi", mpif90 compile error:

../src/run_nnlsqs_mpi.f(39): error #6285: There is no matching  
specific subroutine for this generic subroutine call.   [MPI_CART_SHIFT]

Call MPI_CART_SHIFT( mpi_comm_grid, direction, amount,
-^

With Fortran 90 "Include 'mpif.h'", no errors.

Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov

$ /usr/local/openmpi/bin/ompi_info
 Package: Open MPI r...@savaii.wr.usgs.gov Distribution
Open MPI: 1.3.3
   Open MPI SVN revision: r21666
   Open MPI release date: Jul 14, 2009
Open RTE: 1.3.3
   Open RTE SVN revision: r21666
   Open RTE release date: Jul 14, 2009
OPAL: 1.3.3
   OPAL SVN revision: r21666
   OPAL release date: Jul 14, 2009
Ident string: 1.3.3
  Prefix: /usr/local/openmpi
 Configured architecture: i386-apple-darwin9.8.0
  Configure host: savaii.wr.usgs.gov
   Configured by: root
   Configured on: Thu Oct 15 16:51:18 PDT 2009
  Configure host: savaii.wr.usgs.gov
Built by: baker
Built on: Thu Oct 15 17:14:29 PDT 2009
  Built host: savaii.wr.usgs.gov
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (single underscore)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: icc
 C compiler absolute: /opt/intel/Compiler/11.1/067/bin/intel64/icc
C++ compiler: icpc
   C++ compiler absolute: /opt/intel/Compiler/11.1/067/bin/intel64/icpc
  Fortran77 compiler: ifort
  Fortran77 compiler abs: /opt/intel/Compiler/11.1/067/bin/intel64/ 
ifort

  Fortran90 compiler: ifort
  Fortran90 compiler abs: /opt/intel/Compiler/11.1/067/bin/intel64/ 
ifort

 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
   Sparse Groups: no
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
 MPI I/O support: yes
   MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: no  (checkpoint thread: no)
   MCA backtrace: execinfo (MCA v2.0, API v2.0, Component  
v1.3.3)

   MCA paffinity: darwin (MCA v2.0, API v2.0, Component v1.3.3)
   MCA carto: auto_detect (MCA v2.0, API v2.0, Component  
v1.3.3)

   MCA carto: file (MCA v2.0, API v2.0, Component v1.3.3)
   MCA maffinity: first_use (MCA v2.0, API v2.0, Component  
v1.3.3)

   MCA timer: darwin (MCA v2.0, API v2.0, Component v1.3.3)
 MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.3)
 MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.3)
 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.3)
  MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.3)
   MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.3)
   MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.3)
MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.3)
MCA coll: hierarch (MCA v2.0, API v2.0, Component  
v1.3.3)

MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.3)
MCA coll: self (MCA v2.0, API v2.0, Component v1.3.3)
MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.3)
MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.3)
MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.3)
  MCA io: romio (MCA v2.0, API v2.0, Component v1.3.3)
   MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.3)
   MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.3)
   MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.3)
 MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.3)
 MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.3)
 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.3)
 MCA pml: v (MCA v2.0, API v2.0, Component v1.3.3)
 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.3)
  MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.3)
 MCA btl: self (MCA v2.0, API v2.0, Component v1.3.3)
 MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.3)
 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.3)
MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.3)
 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.3)
 MCA osc: rdma (MCA v2.0, API v2