Re: [OMPI users] [External] Re: OMPI returns error 63 on AMD 7742 when utilizing 100+ processors per node

2020-01-27 Thread Ray Sheppard via users

Hi All,
  Just my two cents, I think error code 63 is saying it is running out 
of streams to use.  I think you have only 64 cores, so at 100, you are 
overloading most of them.  It feels like you are running out of 
resources trying to swap in and out ranks on physical cores.

   Ray

On 1/27/2020 11:29 AM, Collin Strassburger via users wrote:
This message was sent from a non-IU address. Please exercise caution 
when clicking links or opening attachments from external sources.


Hello Howard,

To remove potential interactions, I have found that the issue persists 
without ucx and hcoll support.


Run command: mpirun -np 128 bin/xhpcg

Output:

--

mpirun was unable to start the specified application as it encountered an

error:

Error code: 63

Error name: (null)

Node: Gen2Node4

when attempting to start process rank 0.

--

128 total processes failed to start

It returns this error for any process I initialize with >100 processes 
per node.  I get the same error message for multiple different codes, 
so the error code is mpi related rather than being program specific.


Collin

*From:* Howard Pritchard 
*Sent:* Monday, January 27, 2020 11:20 AM
*To:* Open MPI Users 
*Cc:* Collin Strassburger 
*Subject:* Re: [OMPI users] OMPI returns error 63 on AMD 7742 when 
utilizing 100+ processors per node


Hello Collen,

Could you provide more information about the error.  Is there any 
output from either Open MPI or, maybe, UCX, that could provide more 
information about the problem you are hitting?


Howard

Am Mo., 27. Jan. 2020 um 08:38 Uhr schrieb Collin Strassburger via 
users mailto:users@lists.open-mpi.org>>:


Hello,

I am having difficulty with OpenMPI versions 4.0.2 and 3.1.5. 
Both of these versions cause the same error (error code 63) when
utilizing more than 100 cores on a single node.  The processors I
am utilizing are AMD Epyc “Rome” 7742s.  The OS is CentOS 8.1.  I
have tried compiling with both the default gcc 8 and locally
compiled gcc 9.  I have already tried modifying the maximum name
field values with no success.

My compile options are:

./configure

--prefix=${HPCX_HOME}/ompi

--with-platform=contrib/platform/mellanox/optimized

Any assistance would be appreciated,

Collin

Collin Strassburger

Bihrle Applied Research Inc.





Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread Ray Sheppard

Hi Diego,
  if they are float/reals, the error (overflow) bits will likely make 
them unique.  If you are looking at integers, I would use isends and 
just capture the first one.  You could make a little round robin and 
poll everyone, saving the ranks that match, but if you are using 
hundreds/thousands of ranks, that could slow everything down a little.

   Ray

On 8/10/2018 11:19 AM, Diego Avesani wrote:

Deal all,
I do not understand how MPI_MINLOC works. it seem locate the maximum 
in a vector and not the CPU to which the valur belongs to.


@ray: and if two has the same value?

thanks


Diego


On 10 August 2018 at 17:03, Ray Sheppard <mailto:rshep...@iu.edu>> wrote:


As a dumb scientist, I would just bcast the value I get back to
the group and ask whoever owns it to kindly reply back with its rank.
 Ray


On 8/10/2018 10:49 AM, Reuti wrote:

Hi,

Am 10.08.2018 um 16:39 schrieb Diego Avesani
mailto:diego.aves...@gmail.com>>:

Dear all,

I have a problem:
In my parallel program each CPU compute a value, let's say
eff.

First of all, I would like to know the maximum value. This
for me is quite simple,
I apply the following:

CALL MPI_ALLREDUCE(eff, effmaxWorld, 1,
MPI_DOUBLE_PRECISION, MPI_MAX, MPI_MASTER_COMM, MPIworld%iErr)

Would MPI_MAXLOC be sufficient?

-- Reuti


However, I would like also to know to which CPU that value
belongs. Is it possible?

I have set-up a strange procedure but it works only when
all the CPUs has different values but fails when two of
then has the same eff value.

Is there any intrinsic MPI procedure?
in anternative,
do you have some idea?

really, really thanks.
Diego


Diego

___
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>

___
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>


___
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
<https://lists.open-mpi.org/mailman/listinfo/users>




___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread Ray Sheppard
As a dumb scientist, I would just bcast the value I get back to the 
group and ask whoever owns it to kindly reply back with its rank.

 Ray

On 8/10/2018 10:49 AM, Reuti wrote:

Hi,


Am 10.08.2018 um 16:39 schrieb Diego Avesani :

Dear all,

I have a problem:
In my parallel program each CPU compute a value, let's say eff.

First of all, I would like to know the maximum value. This for me is quite 
simple,
I apply the following:

CALL MPI_ALLREDUCE(eff, effmaxWorld, 1, MPI_DOUBLE_PRECISION, MPI_MAX, 
MPI_MASTER_COMM, MPIworld%iErr)

Would MPI_MAXLOC be sufficient?

-- Reuti



However, I would like also to know to which CPU that value belongs. Is it 
possible?

I have set-up a strange procedure but it works only when all the CPUs has 
different values but fails when two of then has the same eff value.

Is there any intrinsic MPI procedure?
in anternative,
do you have some idea?

really, really thanks.
Diego


Diego

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] latest Intel CPU bug

2018-01-05 Thread Ray Sheppard

Hello All,
  Please people, just drop it.  I appreciated the initial post in 
response to to the valid question of how these bugs might impact OMPI 
and message passing in general.  At this point, y'all are beating the 
proverbial dead horse.  If you wish to debate, please mail each other 
directly.  Thank you.

   Ray

On 1/5/2018 9:09 AM, John Chludzinski wrote:

I believe this snippet sums it up pretty well:

"Now you have a bit more context about why Intel’s response was, well, 
a non-response. They blamed others, correctly, for having the same 
problem but their blanket statement avoided the obvious issue of the 
others aren’t crippled by the effects of the patches like Intel. Intel 
screwed up, badly, and are facing a 30% performance hit going forward 
for it. AMD did right and are probably breaking out the champagne at 
HQ about now."


On Fri, Jan 5, 2018 at 5:38 AM, Matthieu Brucher 
mailto:matthieu.bruc...@gmail.com>> wrote:


Hi,

I think, on the contrary, that he did notice the AMD/ARM issue. I
suppose you haven't read the text (and I like the fact that there
are different opinions on this issue).

Matthieu

2018-01-05 8:23 GMT+01:00 Gilles Gouaillardet mailto:gil...@rist.or.jp>>:

John,


The technical assessment so to speak is linked in the article
and is available at

https://googleprojectzero.blogspot.jp/2018/01/reading-privileged-memory-with-side.html

.

The long rant against Intel PR blinded you and you did not
notice AMD and ARM (and though not mentionned here, Power and
Sparc too) are vulnerable to some bugs.


Full disclosure, i have no affiliation with Intel, but i am
getting pissed with the hysteria around this issue.

Gilles


On 1/5/2018 3:54 PM, John Chludzinski wrote:

That article gives the best technical assessment I've
seen of Intel's architecture bug. I noted the discussion's
subject and thought I'd add some clarity. Nothing more.

For the TL;DR crowd: get an AMD chip in your computer.

On Thursday, January 4, 2018, r...@open-mpi.org
 > mailto:r...@open-mpi.org> >> wrote:

    Yes, please - that was totally inappropriate for this
mailing list.
    Ralph


    On Jan 4, 2018, at 4:33 PM, Jeff Hammond
mailto:jeff.scie...@gmail.com>
    >> wrote:

    Can we restrain ourselves to talk about Open-MPI
or at least
    technical aspects of HPC communication on this
list and leave the
    stock market tips for Hacker News and Twitter?

    Thanks,

    Jeff

    On Thu, Jan 4, 2018 at 3:53 PM, John
    Chludzinskimailto:john.chludzin...@gmail.com>
    >>wrote:

       

Fromhttps://semiaccurate.com/2018/01/04/kaiser-security-holes-will-devastate-intels-marketshare/


       

>



          Kaiser security holes will devastate Intel’s
marketshare


              Analysis: This one tips the balance
toward AMD in a big way


              Jan 4, 2018 by Charlie Demerjian
             
>



        This latest decade-long critical security hole
in Intel CPUs
        is going to cost the company significant
market share.
        SemiAccurate thinks it is not only
consequential but will
        shift the balance of power away from Intel
CPUs for at least
        the next several years.

        Today’s latest crop of gaping security flaws
have three sets
        of holes across Intel, AMD, and ARM processors
along with a
        slew of official statements and de

Re: [OMPI users] Basic build trouble on RHEL7

2017-04-27 Thread Ray Sheppard

The -dev packages were missing. It works now. Thanks again.
Ray

On 4/27/2017 5:05 PM, Ray Sheppard wrote:
Ha ha,  Most likely not.  Theses racks have only been out of single 
user mode about 24 hours.  I thought something simple might be 
missing.  Thanks.

   Ray

On 4/27/2017 5:02 PM, John Hearns via users wrote:
Ray, probably a stupid question but do you have the hwloc-devel 
package installed?

And also the libxml2-devel package?



On 27 April 2017 at 21:54, Ray Sheppard <mailto:rshep...@iu.edu>> wrote:


Hi All,
  I have searched the mail archives because I think this issue
was addressed earlier, but I can not find anything useful.
  We are standing up a few racks of RHEL-7 on Intel to slowly
migrate the cluster from RHEL6.   I downloaded 2.1.0 to install.
All goes well until  about "CCLD libopen-rte.la
<http://libopen-rte.la>." Then it cannot find -lhwloc or -lxml2. 
There are copies of both in /usr/lib64.  I tried many variations

of fixes. The most extreme is:

#!/bin/bash
export LT_SYS_LIBRARY_PATH=/usr/lib64
export CC="gcc -L/usr/lib64 "
export CXX="g++ -L/usr/lib64 "
export FC="gfortran -L/usr/lib64 "
./configure CC="gcc -L/usr/lib64 " CXX="g++ -L/usr/lib64 "
FC="gfortran -L/usr/lib64 " --enable-static
--with-hwloc-libdir=/usr/lib64 --with-threads=posix --disable-vt
--prefix=/N/soft/rhel7/openmpi/gnu/2.1.0
#

Nothing worked.  I thought maybe the older 1.X might not use
HWLOC and  I see you still support it at 1.10.6.  I downloaded
that and gave it a try. The  -lhwloc message was gone but -lxml2
was still there.  For fun, I tried the build on the rhel6 side. 
With only a "regular' configure (./configure CC=gcc CXX=g++

FC=gfortran --enable-static --with-threads=posix  --disable-vt
--prefix=/N/soft/rhel6/openmpi/gnu/2.1.0 )  it worked just fine.
I would appreciate knowing what I am missing. Thanks.
Ray

___
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Basic build trouble on RHEL7

2017-04-27 Thread Ray Sheppard
Ha ha,  Most likely not.  Theses racks have only been out of single user 
mode about 24 hours.  I thought something simple might be missing.  Thanks.

   Ray

On 4/27/2017 5:02 PM, John Hearns via users wrote:
Ray, probably a stupid question but do you have the hwloc-devel 
package installed?

And also the libxml2-devel package?



On 27 April 2017 at 21:54, Ray Sheppard <mailto:rshep...@iu.edu>> wrote:


Hi All,
  I have searched the mail archives because I think this issue was
addressed earlier, but I can not find anything useful.
  We are standing up a few racks of RHEL-7 on Intel to slowly
migrate the cluster from RHEL6.   I downloaded 2.1.0 to install.
All goes well until  about "CCLD libopen-rte.la
<http://libopen-rte.la>." Then it cannot find -lhwloc or -lxml2. 
There are copies of both in /usr/lib64.  I tried many variations

of fixes.  The most extreme is:

#!/bin/bash
export LT_SYS_LIBRARY_PATH=/usr/lib64
export CC="gcc -L/usr/lib64 "
export CXX="g++ -L/usr/lib64 "
export FC="gfortran -L/usr/lib64 "
./configure CC="gcc -L/usr/lib64 " CXX="g++ -L/usr/lib64 "
FC="gfortran -L/usr/lib64 " --enable-static
--with-hwloc-libdir=/usr/lib64 --with-threads=posix --disable-vt
--prefix=/N/soft/rhel7/openmpi/gnu/2.1.0
#

Nothing worked.  I thought maybe the older 1.X might not use HWLOC
and  I see you still support it at 1.10.6.  I downloaded that and
gave it a try. The  -lhwloc message was gone but -lxml2 was still
there.  For fun, I tried the build on the rhel6 side.  With only a
"regular' configure (./configure CC=gcc CXX=g++ FC=gfortran
--enable-static --with-threads=posix  --disable-vt
--prefix=/N/soft/rhel6/openmpi/gnu/2.1.0 )  it worked just fine. I
would appreciate knowing what I am missing. Thanks.
Ray

___
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
<https://rfd.newmexicoconsortium.org/mailman/listinfo/users>




___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

[OMPI users] Basic build trouble on RHEL7

2017-04-27 Thread Ray Sheppard

Hi All,
  I have searched the mail archives because I think this issue was 
addressed earlier, but I can not find anything useful.
  We are standing up a few racks of RHEL-7 on Intel to slowly migrate 
the cluster from RHEL6.   I downloaded 2.1.0 to install. All goes well 
until  about "CCLD libopen-rte.la."  Then it cannot find -lhwloc or 
-lxml2.  There are copies of both in /usr/lib64.  I tried many 
variations of fixes.  The most extreme is:


#!/bin/bash
export LT_SYS_LIBRARY_PATH=/usr/lib64
export CC="gcc -L/usr/lib64 "
export CXX="g++ -L/usr/lib64 "
export FC="gfortran -L/usr/lib64 "
./configure CC="gcc -L/usr/lib64 " CXX="g++ -L/usr/lib64 " FC="gfortran 
-L/usr/lib64 " --enable-static --with-hwloc-libdir=/usr/lib64 
--with-threads=posix  --disable-vt --prefix=/N/soft/rhel7/openmpi/gnu/2.1.0

#

Nothing worked.  I thought maybe the older 1.X might not use HWLOC and  
I see you still support it at 1.10.6.  I downloaded that and gave it a 
try. The  -lhwloc message was gone but -lxml2 was still there.  For fun, 
I tried the build on the rhel6 side.  With only a "regular' configure 
(./configure CC=gcc CXX=g++ FC=gfortran --enable-static 
--with-threads=posix  --disable-vt 
--prefix=/N/soft/rhel6/openmpi/gnu/2.1.0 )  it worked just fine. I would 
appreciate knowing what I am missing.  Thanks.

Ray

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] mpi_f08 Question: set comm on declaration error, and other questions

2016-08-21 Thread Ray Sheppard

Hi All,
  My two cents, in the old days of SGI, their compiler used to 
initialize declared variables, usually to zero.  When users moved to a 
new machine, like a Cray or SP, all hell broke loose because other 
compilers don't.  As a result, almost everyone now gives you a switch to 
have the compiler do it for you.  For example, with Intel (ifort) it is 
-init=zero. They are pretty good about dealing with default 
initializations too.   -fzero-initialized-in-bss will put zero 
intialized variables back in the data section.  With gfortran you pick 
what you want -finit-[integer/real/etc.]= [many choices]. So. the bottom 
line is that if you want a particular behavior, you just need to tell 
the compiler.

  Ray

On 8/21/2016 11:21 PM, Jeff Hammond wrote:



On Sunday, August 21, 2016, Ben Menadue > wrote:


Hi,

In Fortran, using uninitialised variables is undefined behaviour.
In this case, it’s being initialised to zero (either by the
compiler or by virtue of being in untouched memory), and so
equivalent to MPI_COMM_WORLD in OpenMPI. Other MPI libraries don’t
have MPI_COMM_WORLD .eq. 0 and so the same program would fail.
Similarly, if the same memory has previously been stored to (and
so non-zero) and the compiler doesn’t zero-initialise the variable
(most won’t unless you explicitly ask for it), it will fail with
OpenMPI.


Such false successes are a (the?) reason why MPI libraries should not 
define valid handles to default initializers, whether they 
be standardized or implementation-specific...


Jeff

Just keep in mind that the same is true for *all* variables in
Fortran; even integers and reals have undefined value until
they’re first stored to. This can be quite annoying as specifying
the initial value when declaring them also gives them the SAVE
attribute…

Cheers,

Ben

*From:*users [mailto:users-boun...@lists.open-mpi.org
]
*On Behalf Of *Matt Thompson
*Sent:* Sunday, 21 August 2016 3:07 AM
*To:* Open MPI Users >
*Subject:* Re: [OMPI users] mpi_f08 Question: set comm on
declaration error, and other questions

On Fri, Aug 19, 2016 at 8:54 PM, Jeff Squyres (jsquyres)
> wrote:

On Aug 19, 2016, at 6:32 PM, Matt Thompson > wrote:

> > that the comm == MPI_COMM_WORLD evaluates to .TRUE.? I
discovered that once when I was printing some stuff.
>
> That might well be a coincidence. type(MPI_Comm) is not a
boolean type, so I'm not sure how you compared it to .true.
>
> Well, I made a program like:
>
> (208) $ cat test2.F90
> program whoami
>use mpi_f08
>implicit none
>type(MPI_Comm) :: comm
>if (comm == MPI_COMM_WORLD) write (*,*) "I am MPI_COMM_WORLD"
>if (comm == MPI_COMM_NULL) write (*,*) "I am MPI_COMM_NULL"
> end program whoami
> (209) $ mpifort test2.F90
> (210) $ mpirun -np 4 ./a.out
>  I am MPI_COMM_WORLD
>  I am MPI_COMM_WORLD
>  I am MPI_COMM_WORLD
>  I am MPI_COMM_WORLD
>
> I think if you print comm, you get 0 and MPI_COMM_WORLD=0
and MPI_COMM_NULL=2 so...I guess I'm surprised. I'd have
thought MPI_Comm would have been undefined until defined.

I don't know the rules here for what happens in Fortran when
comparing an uninitialized derived type.  The results could be
undefined...?


> Instead you can write a program like this:
>
> (226) $ cat helloWorld.mpi3.F90
> program hello_world
>
>use mpi_f08
>
>implicit none
>
>type(MPI_Comm) :: comm
>integer :: myid, npes, ierror
>integer :: name_length
>
>character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
>
>call mpi_init(ierror)
>
>call MPI_Comm_Rank(comm,myid,ierror)
>write (*,*) 'ierror: ', ierror
>call MPI_Comm_Size(comm,npes,ierror)
>call
MPI_Get_Processor_Name(processor_name,name_length,ierror)
>
>write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)') "Process", myid,
"of", npes, "is on", trim(processor_name)
>
>call MPI_Finalize(ierror)
>
> end program hello_world
> (227) $ mpifort helloWorld.mpi3.F90
> (228) $ mpirun -np 4 ./a.out
>  ierror:0
>  ierror:0
>  ierror:0
>  ierror:0
> Process2 of4 is on compy
> Process1 of4 is on compy
> Process3 of4 is on compy

> Process0 of4 is on copy

That does seem to be odd output.  What is the hostname on your
machine?

Oh well, I (badly) munged the hostname on th

Re: [OMPI users] Odd pipe error

2015-11-12 Thread Ray Sheppard

Hi All,
  I thought I would follow up with a solution.  It turns out there is a 
bug in glibc that is now exposed in old versions (pre 1.8) of OpenMPI on 
Cray XE systems.  Thanks to Justin Davis of Cray for figuring it out.  
The simple solution is to remount /dev/pts with the gid=5 option turned 
on.  Then all works well again.

 Ray

On 10/17/2015 12:20 PM, Ralph Castain wrote:

I’m not sure there is a way to do it - that’s a pretty old version, and the RTE 
in it is completely different. So entirely possible that something in the 
update exposed a problem that no longer works.

Out of curiosity: I’m unaware of any changes in the MPI definitions (there were 
extensions, but no breakage). So why can’t you just build the old packages 
against 1.8.4?


On Oct 17, 2015, at 7:29 AM, Ray Sheppard  wrote:

Hi All,
  We run a Cray XE/XT-7.  For normal (ESM) use, Cray supplies integrated MPI 
libraries.  However, for cluster compatibility mode, we build OpenMPI to use.  
Generally we use 1.8.4 but some old packages, like Jaguar, are tied to an old 
version (1.4.5).  At the last maint, they all started breaking so I rebuilt 
them.  Version 1.8.4 rebuilt fine and runs fine.  However, even a simple 
application, recompiled by the new package, fails in 1.4.5 with the error 
below.  I have tried a number of different configure options.   The current one 
follows this note.  I am hoping someone could tell what needs to be done to 
1.4.5 to build the way 1.8.4 did (i.e. without the pipe error).  Thanks in 
advance for any insights.
   Ray

./configure CXX=g++ CC=gcc FC=gfortran CFLAGS="-O2" F77=gfortran FCFLAGS="-O2" 
--enable-shared --enable-static  --with-tm=no --with-threads=posix  --without-openib 
--enable-mca-no-build=btl-openib --with-gnu-ld --prefix=/N/soft/cle5/openmpi/gnu/1.4.5


:~/testdir> !mpirun
mpirun -np 8 -machinefile test_machine hellompi
--
mpirun was unable to launch the specified application as it encountered an 
error:

Error: pipe function call failed when setting up I/O forwarding subsystem
Node: nid00819

while attempting to start process rank 0.
--

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27890.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27891.php




Re: [OMPI users] Odd pipe error

2015-10-17 Thread Ray Sheppard

Thanks Ralph,
   I was afraid that there wasn't a "oh we started..."  For what it is 
worth, everything up to and including 1.7.x shows the same sort of 
failure.  The trouble with Jaguar is that they actually embed their own 
version of everything inside their code.  I once took on the task of 
swapping out the embedded version with a newer one. That turned out to 
be much more difficult than one would think.  So, now I support a 
version for the general public (currently 1.8.4) and one for them.

   Ray

On 10/17/2015 12:20 PM, Ralph Castain wrote:

I’m not sure there is a way to do it - that’s a pretty old version, and the RTE 
in it is completely different. So entirely possible that something in the 
update exposed a problem that no longer works.

Out of curiosity: I’m unaware of any changes in the MPI definitions (there were 
extensions, but no breakage). So why can’t you just build the old packages 
against 1.8.4?


On Oct 17, 2015, at 7:29 AM, Ray Sheppard  wrote:

Hi All,
  We run a Cray XE/XT-7.  For normal (ESM) use, Cray supplies integrated MPI 
libraries.  However, for cluster compatibility mode, we build OpenMPI to use.  
Generally we use 1.8.4 but some old packages, like Jaguar, are tied to an old 
version (1.4.5).  At the last maint, they all started breaking so I rebuilt 
them.  Version 1.8.4 rebuilt fine and runs fine.  However, even a simple 
application, recompiled by the new package, fails in 1.4.5 with the error 
below.  I have tried a number of different configure options.   The current one 
follows this note.  I am hoping someone could tell what needs to be done to 
1.4.5 to build the way 1.8.4 did (i.e. without the pipe error).  Thanks in 
advance for any insights.
   Ray

./configure CXX=g++ CC=gcc FC=gfortran CFLAGS="-O2" F77=gfortran FCFLAGS="-O2" 
--enable-shared --enable-static  --with-tm=no --with-threads=posix  --without-openib 
--enable-mca-no-build=btl-openib --with-gnu-ld --prefix=/N/soft/cle5/openmpi/gnu/1.4.5


:~/testdir> !mpirun
mpirun -np 8 -machinefile test_machine hellompi
--
mpirun was unable to launch the specified application as it encountered an 
error:

Error: pipe function call failed when setting up I/O forwarding subsystem
Node: nid00819

while attempting to start process rank 0.
--

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27890.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27891.php




[OMPI users] Odd pipe error

2015-10-17 Thread Ray Sheppard

Hi All,
  We run a Cray XE/XT-7.  For normal (ESM) use, Cray supplies 
integrated MPI libraries.  However, for cluster compatibility mode, we 
build OpenMPI to use.  Generally we use 1.8.4 but some old packages, 
like Jaguar, are tied to an old version (1.4.5).  At the last maint, 
they all started breaking so I rebuilt them.  Version 1.8.4 rebuilt fine 
and runs fine.  However, even a simple application, recompiled by the 
new package, fails in 1.4.5 with the error below.  I have tried a number 
of different configure options.   The current one follows this note.  I 
am hoping someone could tell what needs to be done to 1.4.5 to build the 
way 1.8.4 did (i.e. without the pipe error).  Thanks in advance for any 
insights.

   Ray

./configure CXX=g++ CC=gcc FC=gfortran CFLAGS="-O2" F77=gfortran 
FCFLAGS="-O2" --enable-shared --enable-static  --with-tm=no 
--with-threads=posix  --without-openib --enable-mca-no-build=btl-openib 
--with-gnu-ld --prefix=/N/soft/cle5/openmpi/gnu/1.4.5



:~/testdir> !mpirun
mpirun -np 8 -machinefile test_machine hellompi
--
mpirun was unable to launch the specified application as it encountered 
an error:


Error: pipe function call failed when setting up I/O forwarding subsystem
Node: nid00819

while attempting to start process rank 0.
--



Re: [OMPI users] Undefined ompi_mpi_info_null issue

2015-06-12 Thread Ray Sheppard
Just a follow-up.  RPATH was the trouble.  All is well now in the land 
of the climatologists again.  Thanks again for the help.

Ray


On 6/12/2015 8:00 AM, Ray Sheppard wrote:

Thanks again Gilles,
  You might be on to something.  Dynamic libraries sound like 
something a Python developer might love (no offense intended to the 
stereotype). It would also explain why the build went smoothly but the 
test run crashed.  I am going to try putting an RPATH variable in the 
environment and rebuilding.

  Ray

On 6/12/2015 7:15 AM, Gilles Gouaillardet wrote:

Ray,

one possibility is one of the loaded library was built with -rpath 
and this causes the mess


an other option is you have to link _error.so with libmpi.so

Cheers,

Gilles

On Friday, June 12, 2015, Ray Sheppard <mailto:rshep...@iu.edu>> wrote:


Hi Gilles,
  Thanks for the reply. I completely forgot that lived in the
main library.  ldd doesn't show that it read my LD_LIBRARY_PATH
(I also push out an LPATH variable just for fun).  I force
modules to echoed when users initialize them.  You can see
OpenMPI was visible to H5py.  Now I wonder why it didn't pick it
up...  Thanks again.
  Ray
GMP arithmetic library version 5.1.1 loaded.
MPFR version 3.1.1 loaded.
Mpc version 1.0.1 loaded.
gcc version 4.9.2 loaded.
Moab Workload Manager scheduling and management system version
7.1.1 loaded.
Python programming language version 2.7.3 loaded.
Perl programming language version 5.16.2 loaded.
Intel compiler suite version 15.0.1 loaded.
OpenMPI libraries (Intel) version 1.8.4 loaded.
TotalView version 8.15.0-15 loaded.
FFTW (Intel, Double precision) version 3.3.3 loaded.
hdf4 version 4.2.10 loaded.
Curl version 7.28.1 loaded.
HDF5 (MPI) version 1.8.14 loaded.
netcdf-c version 4.3.3 loaded.
netcdf-fortran version 4.4.1 loaded.
Gnuplot graphing utility version 4.6.1 loaded.
[rsheppar@h2 ~]$ ldd

/N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py/_errors.so
linux-vdso.so.1 =>  (0x7fff39db7000)
libpthread.so.0 => /lib64/libpthread.so.0
(0x7facfe887000)
libc.so.6 => /lib64/libc.so.6 (0x7facfe4f3000)
/lib64/ld-linux-x86-64.so.2 (0x7facff049000)


On 6/11/2015 8:09 PM, Gilles Gouaillardet wrote:

Ray,

this symbol is defined in libmpi.so.

can you run
ldd

//N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py//_errors.so
and make sure this is linked with openmpi 1.8.4 ?

Cheers,

Gilles

    On 6/12/2015 1:29 AM, Ray Sheppard wrote:

Hi List,
  I know I saw this issue years ago but have forgotten the
details. I looked through old posts but only found about half a
dozen pertaining to WinDoze.  I am trying to build a Python
(2.7.3) extension (h5py) that calls HDF5 (1.8.14).  I built
both the OpenMPI (1.8.4) and the HDF5 modules so I know they
are consistent.  All goes well until I try to run the tests.
Then I get:

ImportError:

/N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py/_errors.so:
undefined symbol: ompi_mpi_info_null

I am not sure I completely trust the h5py package but I don't
have a real good reason for believing that way.  I would
appreciate it if someone could explain where ompi_mpi_info_null
is defined and possibly a way to tell Python about it.  Thanks!
Ray





___
users mailing list
us...@open-mpi.org  
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/06/27117.php




___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/06/27119.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27121.php


--
 Respectfully,
   Ray Sheppard
   rshep...@iu.edu
   http://rt.uits.iu.edu/systems/SciAPT
   317-274-0016

   Principal Analyst
   Senior Technical Lead
   Scientific Applications and Performance Tuning
   Research Technologies
   University Information Technological Services
   IUPUI campus
   Indiana University

   My "pithy" saying:  Science is the art of translating the world
   int

Re: [OMPI users] Undefined ompi_mpi_info_null issue

2015-06-12 Thread Ray Sheppard

Thanks again Gilles,
  You might be on to something.  Dynamic libraries sound like something 
a Python developer might love (no offense intended to the stereotype). 
It would also explain why the build went smoothly but the test run 
crashed.  I am going to try putting an RPATH variable in the environment 
and rebuilding.

  Ray

On 6/12/2015 7:15 AM, Gilles Gouaillardet wrote:

Ray,

one possibility is one of the loaded library was built with -rpath and 
this causes the mess


an other option is you have to link _error.so with libmpi.so

Cheers,

Gilles

On Friday, June 12, 2015, Ray Sheppard <mailto:rshep...@iu.edu>> wrote:


Hi Gilles,
  Thanks for the reply. I completely forgot that lived in the main
library.  ldd doesn't show that it read my LD_LIBRARY_PATH (I also
push out an LPATH variable just for fun).  I force modules to
echoed when users initialize them.  You can see OpenMPI was
visible to H5py.  Now I wonder why it didn't pick it up...  Thanks
again.
  Ray
GMP arithmetic library version 5.1.1 loaded.
MPFR version 3.1.1 loaded.
Mpc version 1.0.1 loaded.
gcc version 4.9.2 loaded.
Moab Workload Manager scheduling and management system version
7.1.1 loaded.
Python programming language version 2.7.3 loaded.
Perl programming language version 5.16.2 loaded.
Intel compiler suite version 15.0.1 loaded.
OpenMPI libraries (Intel) version 1.8.4 loaded.
TotalView version 8.15.0-15 loaded.
FFTW (Intel, Double precision) version 3.3.3 loaded.
hdf4 version 4.2.10 loaded.
Curl version 7.28.1 loaded.
HDF5 (MPI) version 1.8.14 loaded.
netcdf-c version 4.3.3 loaded.
netcdf-fortran version 4.4.1 loaded.
Gnuplot graphing utility version 4.6.1 loaded.
[rsheppar@h2 ~]$ ldd

/N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py/_errors.so
linux-vdso.so.1 =>  (0x7fff39db7000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7facfe887000)
libc.so.6 => /lib64/libc.so.6 (0x7facfe4f3000)
/lib64/ld-linux-x86-64.so.2 (0x7facff049000)


On 6/11/2015 8:09 PM, Gilles Gouaillardet wrote:

Ray,

this symbol is defined in libmpi.so.

can you run
ldd

//N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py//_errors.so
and make sure this is linked with openmpi 1.8.4 ?

Cheers,

Gilles

    On 6/12/2015 1:29 AM, Ray Sheppard wrote:

Hi List,
  I know I saw this issue years ago but have forgotten the
details. I looked through old posts but only found about half a
dozen pertaining to WinDoze.  I am trying to build a Python
(2.7.3) extension (h5py) that calls HDF5 (1.8.14).  I built both
the OpenMPI (1.8.4) and the HDF5 modules so I know they are
consistent.  All goes well until I try to run the tests. Then I
get:

ImportError:

/N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py/_errors.so:
undefined symbol: ompi_mpi_info_null

I am not sure I completely trust the h5py package but I don't
have a real good reason for believing that way.  I would
appreciate it if someone could explain where ompi_mpi_info_null
is defined and possibly a way to tell Python about it.  Thanks!
Ray





___
users mailing list
us...@open-mpi.org  
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/06/27117.php




___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27119.php




Re: [OMPI users] Undefined ompi_mpi_info_null issue

2015-06-12 Thread Ray Sheppard

Hi Gilles,
  Thanks for the reply. I completely forgot that lived in the main 
library.  ldd doesn't show that it read my LD_LIBRARY_PATH (I also push 
out an LPATH variable just for fun).  I force modules to echoed when 
users initialize them.  You can see OpenMPI was visible to H5py.  Now I 
wonder why it didn't pick it up...  Thanks again.

  Ray
GMP arithmetic library version 5.1.1 loaded.
MPFR version 3.1.1 loaded.
Mpc version 1.0.1 loaded.
gcc version 4.9.2 loaded.
Moab Workload Manager scheduling and management system version 7.1.1 loaded.
Python programming language version 2.7.3 loaded.
Perl programming language version 5.16.2 loaded.
Intel compiler suite version 15.0.1 loaded.
OpenMPI libraries (Intel) version 1.8.4 loaded.
TotalView version 8.15.0-15 loaded.
FFTW (Intel, Double precision) version 3.3.3 loaded.
hdf4 version 4.2.10 loaded.
Curl version 7.28.1 loaded.
HDF5 (MPI) version 1.8.14 loaded.
netcdf-c version 4.3.3 loaded.
netcdf-fortran version 4.4.1 loaded.
Gnuplot graphing utility version 4.6.1 loaded.
[rsheppar@h2 ~]$ ldd 
/N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py/_errors.so

linux-vdso.so.1 =>  (0x7fff39db7000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7facfe887000)
libc.so.6 => /lib64/libc.so.6 (0x7facfe4f3000)
/lib64/ld-linux-x86-64.so.2 (0x7facff049000)


On 6/11/2015 8:09 PM, Gilles Gouaillardet wrote:

Ray,

this symbol is defined in libmpi.so.

can you run
ldd 
/N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py/_errors.so

and make sure this is linked with openmpi 1.8.4 ?

Cheers,

Gilles

On 6/12/2015 1:29 AM, Ray Sheppard wrote:

Hi List,
  I know I saw this issue years ago but have forgotten the details. I 
looked through old posts but only found about half a dozen pertaining 
to WinDoze.  I am trying to build a Python (2.7.3) extension (h5py) 
that calls HDF5 (1.8.14).  I built both the OpenMPI (1.8.4) and the 
HDF5 modules so I know they are consistent.  All goes well until I 
try to run the tests. Then I get:


ImportError: 
/N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py/_errors.so: 
undefined symbol: ompi_mpi_info_null


I am not sure I completely trust the h5py package but I don't have a 
real good reason for believing that way.  I would appreciate it if 
someone could explain where ompi_mpi_info_null is defined and 
possibly a way to tell Python about it.  Thanks!

Ray





___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27117.php




[OMPI users] Undefined ompi_mpi_info_null issue

2015-06-11 Thread Ray Sheppard

Hi List,
  I know I saw this issue years ago but have forgotten the details. I 
looked through old posts but only found about half a dozen pertaining to 
WinDoze.  I am trying to build a Python (2.7.3) extension (h5py) that 
calls HDF5 (1.8.14).  I built both the OpenMPI (1.8.4) and the HDF5 
modules so I know they are consistent.  All goes well until I try to run 
the tests. Then I get:


ImportError: 
/N/dc2/projects/ray/quarry/h5py/h5py-2.5.0/build/lib.linux-x86_64-2.7/h5py/_errors.so: 
undefined symbol: ompi_mpi_info_null


I am not sure I completely trust the h5py package but I don't have a 
real good reason for believing that way.  I would appreciate it if 
someone could explain where ompi_mpi_info_null is defined and possibly a 
way to tell Python about it.  Thanks!

Ray

--
 Respectfully,
       Ray Sheppard
   rshep...@iu.edu
   http://rt.uits.iu.edu/systems/SciAPT
   317-274-0016

   Principal Analyst
   Senior Technical Lead
   Scientific Applications and Performance Tuning
   Research Technologies
   University Information Technological Services
   IUPUI campus
   Indiana University

   My "pithy" saying:  Science is the art of translating the world
   into language. Unfortunately, that language is mathematics.
   Bumper sticker wisdom: Make it idiot-proof and they will make a
   better idiot.



Re: [OMPI users] Simple openmpi-mca-params.conf question

2015-04-06 Thread Ray Sheppard

Thanks again!
Ray

On 4/6/2015 8:58 PM, Ralph Castain wrote:

Yep - it will automatically pick it up. The file should be in the /etc 
directory.


On Apr 6, 2015, at 5:49 PM, Ray Sheppard  wrote:

Thanks Ralph,
  The FAQ had me putting in prefixes to that line and I just never figured it 
out.  I have just dumbly added these things to my mpirun line.  I have one 
other question. When I write into the system conf file, will the mpirun know to 
look there (which seems what the file says) or should I explicitly add the 
.../etc directory to a variable like CPATH?  Thanks again,
Ray

On 4/6/2015 8:14 PM, Ralph Castain wrote:

btl_tcp_if_exclude=eth2

should work


On Apr 6, 2015, at 5:09 PM, Ray Sheppard  wrote:

Hello list,
  I have been given permission to impose my usual defaults on the system.  I have been 
reading documentation for the openmpi-mca-params.conf file. "ompi_info --param all 
all" did not help.  All the FAQ's seem to do was confuse me. I can not seem to 
understand how to instantiate a simple switch like:

-mca btl_tcp_if_exclude eth2

I have tried various ways but always seem to get:
keyval parser: error 2 reading file 
/N/u/rsheppar/Karst/.openmpi/mca-params.conf at line 1:

I would really appreciate a simple example of a proper entry. Thanks.
  Ray

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/04/26626.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/04/26627.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/04/26628.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/04/26629.php




Re: [OMPI users] Simple openmpi-mca-params.conf question

2015-04-06 Thread Ray Sheppard

Thanks Ralph,
  The FAQ had me putting in prefixes to that line and I just never 
figured it out.  I have just dumbly added these things to my mpirun 
line.  I have one other question. When I write into the system conf 
file, will the mpirun know to look there (which seems what the file 
says) or should I explicitly add the .../etc directory to a variable 
like CPATH?  Thanks again,

Ray

On 4/6/2015 8:14 PM, Ralph Castain wrote:

btl_tcp_if_exclude=eth2

should work


On Apr 6, 2015, at 5:09 PM, Ray Sheppard  wrote:

Hello list,
  I have been given permission to impose my usual defaults on the system.  I have been 
reading documentation for the openmpi-mca-params.conf file. "ompi_info --param all 
all" did not help.  All the FAQ's seem to do was confuse me. I can not seem to 
understand how to instantiate a simple switch like:

-mca btl_tcp_if_exclude eth2

I have tried various ways but always seem to get:
keyval parser: error 2 reading file 
/N/u/rsheppar/Karst/.openmpi/mca-params.conf at line 1:

I would really appreciate a simple example of a proper entry. Thanks.
  Ray

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/04/26626.php

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/04/26627.php




[OMPI users] Simple openmpi-mca-params.conf question

2015-04-06 Thread Ray Sheppard

Hello list,
  I have been given permission to impose my usual defaults on the 
system.  I have been reading documentation for the 
openmpi-mca-params.conf file. "ompi_info --param all all" did not help.  
All the FAQ's seem to do was confuse me. I can not seem to understand 
how to instantiate a simple switch like:


 -mca btl_tcp_if_exclude eth2

I have tried various ways but always seem to get:
 keyval parser: error 2 reading file 
/N/u/rsheppar/Karst/.openmpi/mca-params.conf at line 1:


I would really appreciate a simple example of a proper entry. Thanks.
  Ray



Re: [OMPI users] Problems compiling OpenMPI 1.8.4 with GCC 4.9.2

2015-01-14 Thread Ray Sheppard

Gilles,
  The issue you pointed Ryan to was with GCC 4.8.2 not 4.9.2.  I just 
built version 1.8.4 on a RHEL6 machine yesterday without special 
switches but with GCC 4.9.2.

Ray

On 1/14/2015 11:13 AM, Novosielski, Ryan wrote:
Thank you. I did a search, but somehow did not turn that up. I guess I 
had looked for GCC 4.9.


 *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS  |-*O*-
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novos...@rutgers.edu <mailto:novos...@rutgers.edu>- 
973/972.0922 (2x0922)

||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'

On Jan 14, 2015, at 03:20, Gilles Gouaillardet 
mailto:gilles.gouaillar...@iferc.org>> 
wrote:



Ryan,

this issue has already been reported.

please refer to 
http://www.open-mpi.org/community/lists/users/2015/01/26134.php for a 
workaround


Cheers,

Gilles

On 2015/01/14 16:35, Novosielski, Ryan wrote:

OpenMPI 1.8.4 does not appear to be buildable with GCC 4.9.2. The output, as 
requested by the Getting Help page, is attached.

I believe I tried GCC 4.9.0 too and it didn't work.

I did successfully build it with Intel's compiler suite v15.0.1, so I do appear 
to know what I'm doing.

Thanks in advance for your help.

--
 *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
  || \\UTGERS  |-*O*-
  ||_// Biomedical | Ryan Novosielski - Senior Technologist
  || \\ and Health |novos...@rutgers.edu  - 973/972.0922 (2x0922)
  ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
   `'


___
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/01/26173.php


___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/01/26174.php



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/01/26177.php


--
         Respectfully,
   Ray Sheppard
   rshep...@iu.edu
   http://rt.uits.iu.edu/systems/SciAPT
   317-274-0016

   Principal Analyst
   Senior Technical Lead
   Scientific Applications and Performance Tuning
   Research Technologies
   University Information Technological Services
   IUPUI campus
   Indiana University

   My "pithy" saying:  Science is the art of translating the world
   into language. Unfortunately, that language is mathematics.
   Bumper sticker wisdom: Make it idiot-proof and they will make a
   better idiot.



Re: [OMPI users] 1.8.4

2014-11-12 Thread Ray Sheppard
Thanks, and sorry to blast my little note out to the list.  I guess your 
mail address is now aliased to the mailing list in my mail client.

Ray


On 11/12/2014 9:41 AM, Jeff Squyres (jsquyres) wrote:

We have 2 critical issues left that need fixing (a THREAD_MULTIPLE/locking 
issue and a shmem issue).  There's active work progressing on both.

I think we'd love to say it would be ready by SC, but I know that a lot of us 
-- myself included -- are fighting to meet our own SC deadlines.

Ralph Castain is the release manager of the v1.8 series -- Ralph, can you 
comment?




On Nov 12, 2014, at 9:38 AM, Ray Sheppard  wrote:


Sorry to bother you directly, but do you know when y'all will release the 
stable version of 1.8.4?  I have users asking for it and really would like to 
build it for them before I leave for SC.  But, either way, it would be great to 
be able to help manage their expectations.  Thanks.
   Ray

--
 Respectfully,
       Ray Sheppard
   rshep...@iu.edu
   http://rt.uits.iu.edu/systems/SciAPT
   317-274-0016

   Principal Analyst
   Senior Technical Lead
   Scientific Applications and Performance Tuning
   Research Technologies
   University Information Technological Services
   IUPUI campus
   Indiana University

   My "pithy" saying:  Science is the art of translating the world
   into language. Unfortunately, that language is mathematics.
   Bumper sticker wisdom: Make it idiot-proof and they will make a
   better idiot.

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/11/25764.php




[OMPI users] 1.8.4

2014-11-12 Thread Ray Sheppard

Hi Jeff,
 Sorry to bother you directly, but do you know when y'all will release 
the stable version of 1.8.4?  I have users asking for it and really 
would like to build it for them before I leave for SC.  But, either way, 
it would be great to be able to help manage their expectations.  Thanks.

   Ray

--
 Respectfully,
       Ray Sheppard
   rshep...@iu.edu
   http://rt.uits.iu.edu/systems/SciAPT
   317-274-0016

   Principal Analyst
   Senior Technical Lead
   Scientific Applications and Performance Tuning
   Research Technologies
   University Information Technological Services
   IUPUI campus
   Indiana University

   My "pithy" saying:  Science is the art of translating the world
   into language. Unfortunately, that language is mathematics.
   Bumper sticker wisdom: Make it idiot-proof and they will make a
   better idiot.



Re: [OMPI users] compilation problem with ifort

2014-09-03 Thread Ray Sheppard

Hi Elio and everyone,
  I went to the EPW website and their instructions seem lacking with 
respect to The Quantum-Expresso 4.0.3 requirement.  The EPW folks want 
to leverage Quantum Expresso intermediate object files.  By knowing how 
it builds and telling you where to put their package, they can then 
navigate relative to their make to link the files they want.
  Unfortunately, their instructions end with ./configure.  I think if 
you look, you will see the Expresso object files were never built.  
Instead, you should look up the complete installation instructions from 
the Quantum Expresso folks. It might be as simple as "make all" but I 
can guarantee there is more to be done.  Once you check that it  
actually works, you can finish with the EPW specific instructions.  Of 
course, these are just my two cents :)

Ray

On 9/3/2014 7:10 PM, Jonathan Dursi (SN) wrote:




  Original Message

*From: *Elio Physics
*Sent: *Wednesday, September 3, 2014 6:48 PM
*To: *Open MPI Users
*Reply To: *Open MPI Users
*Subject: *Re: [OMPI users] compilation problem with ifort


I have already done all of the steps you mentioned. I have installed 
the older version of quantum espresso, configured it and followed all 
the steps on the EPW website when I got that error in the last steo; 
In fact I do have the latest version of quantum espresso but since I 
work with electron phonon and EPW seemed really promising and less 
time consuming, I decided to give it a try.


The reason I have asked my question in this forum because once I had a 
similar "compiler" issue (not the same as this one) and when i asked 
on the Quantum Espresso (QE) website, one of the answers was, this is 
not the right since this is a compiler problem not a QE issue so I was 
really trying to avoid such answers.


Anyhow, I guess you are absolutely right. I will try to e-mail the EPW 
people and explain the situation; after all they should be able to 
help. Thanks for your advice and time.


ELIO MOUJAESS
University of Rondonia
Brasil

> Date: Wed, 3 Sep 2014 18:19:25 -0400
> From: g...@ldeo.columbia.edu
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] compilation problem with ifort
>
> It is hard to tell why, but the object files (yes a2f.o, etc)
> seem not to have been compiled from the corresponding source files
> (a2f.f90 or similar).
>
> In general the executable (your epw.x) is compiled only after all
> the pre-requisite object files (the .o) and modules (the .mod)
> have been compiled already.
> In many cases there is not only one Makefile, but a chain/tree of
> them, to compile the code in the source directory tree (under src).
>
> Also, it is a bit awkward that you don't seem to
> have configured anything,
> i.e., telling where MPI was installed, etc,
> but that may just not be in your email.
>
> Phonons is not my league, just trying to help, but afraid I may
> not take you in the right direction.
>
> Did you do the installation as per the EPW web site? (I just found it):
> http://epw.org.uk/Main/DownloadAndInstall
> It seems to require QuantumExpresso.
> Did you get it, configure it, etc?
>
> Do they have a mailing list or bulletin board where you could get
> specific help for their software?
> (Either on EPW or on QuantumExpresso (which seems to be required):
> http://www.quantum-espresso.org/)
> That would probably be the right forum to ask your questions.
>
> My two cents,
> Gus Correa
>
>
> On 09/03/2014 05:51 PM, Elio Physics wrote:
> > This was the first error yes. What do you mean other files are 
missing?

> > Do you mean the atom.o, basic_algebra_routines.o...? Well the f90
> > files present in the src subdirectory start from a2f.90
> > , allocate_epwq.o,...and so on... I am not also sure why there is that
> > slash "\" just before the "a2f.o" Is there a way to know what is
> > going on? I mean what are the first steps?
> >
> > Thank you
> >
> > ELIO MOUJAES
> > Univeristy of Rondonia
> > Brazil
> >
> > > Date: Wed, 3 Sep 2014 17:43:44 -0400
> > > From: g...@ldeo.columbia.edu
> > > To: us...@open-mpi.org
> > > Subject: Re: [OMPI users] compilation problem with ifort
> > >
> > > Was the error that you listed the *first* error?
> > >
> > > Apparently various object files are missing from the
> > > ../../Modules/ directory, and were not compiled,
> > > suggesting something is amiss even before the
> > > compilation of the executable (epw.x).
> > >
> > > On 09/03/2014 05:20 PM, Elio Physics wrote:
> > > > Dear all,
> > > >
> > > > I am really a beginner in Fortran and Linux. I was trying to 
compile a

> > > > software (EPW). Everything was going fine (or maybe this is what I
> > think):
> > > >
> > > > mpif90 -o epw.x ../../Modules/atom.o
> > > > ../../Modules/basic_algebra_routines.o ../../Modules/cell_base.o
> > > > ../../Modules/check_stop.o ../../Modules/clocks.o
> > > > ../../Modules/constraints_module.o ../../Modules/control_flags.o
> > > > ../../Modules/descriptors.o ../../Modules/dspev_drv.o
> >

Re: [OMPI users] OpenMPI with Gemini Interconnect

2014-04-16 Thread Ray Sheppard

Hello,
  Big Red 2 provides its own MPICH based MPI.  The only case where the 
provided OpenMPI module becomes relevant is when you create a CCMLogin 
instance in Cluster Compatibility Mode (CCM).  For most practical uses, 
those sorts of needs are better addressed on the Quarry or Mason machines.
  When in CCM, the Gemini network is not directly accessible.  The 
proposed idea is for middleware to present an interface resembling an IB 
interface and use it to create a subset of the Gemini network for the 
CCM virtual cluster.  Unfortunately, due to a Cray bug, case 80503, that 
has not yet worked.

Ray

On 4/16/2014 4:44 PM, Saliya Ekanayake wrote:

Hi,

We have a Cray XE6/XK7 supercomputer (BigRed II) and I was trying to 
get OpenMPI Java binding working on it. I couldn't find a way to 
utilize its Gemini interconnect, instead was running on TCP, which is 
inefficient.


I see some work has been done along these lines in [1] and wonder if 
you could give some suggestions on how to build OpenMPI with Gemini 
support.


[1] 
https://www.open-mpi.org/papers/cug-2012/cug_2012_open_mpi_for_cray_xe_xk.pdf


Thank you,
Saliya

--
Saliya Ekanayake esal...@gmail.com 
http://saliya.org


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] btl_tcp_if_include setting

2013-10-14 Thread Ray Sheppard

Hi Jeff,
  On the Cray are two modes. ESM is the preferred method but some 
packages require the CCM (cluster compatibility mode).  In ESM, MPI is 
transparent and works great. In CCM, an external MPI is needed.  There 
is supposed to a translator to make the Gemini switch look like IB, but 
it does not exactly work.  So, I am stuck with the TCP interface for 
message passing.

  Ray


On 10/14/2013 11:28 AM, Jeff Squyres (jsquyres) wrote:

Just curious -- why are you using the TCP transport on a Cray?


On Oct 14, 2013, at 11:00 AM, Ray Sheppard  wrote:


Thanks Ralph, Thanks Jeff,
  I should have written sooner. I spent the weekend trying to set it as a 
configure option.
   Ray





Re: [OMPI users] btl_tcp_if_include setting

2013-10-14 Thread Ray Sheppard

Thanks Ralph, Thanks Jeff,
  I should have written sooner. I spent the weekend trying to set it as 
a configure option.

   Ray


On 10/14/2013 10:42 AM, Jeff Squyres (jsquyres) wrote:

More info on Ralph's comment is available here:

 http://www.open-mpi.org/faq/?category=tuning#setting-mca-params


On Oct 14, 2013, at 10:36 AM, Ralph Castain  wrote:


It won't be a configure switch, but you can put the following in your default 
system mca param file:

btl_tcp_if_include=ipogif0

You'll find that file in /etc - it is called openmpi-mca-params.conf. 
Users won't have to enter it after that since we pick that file up by default.


On Oct 14, 2013, at 7:27 AM, Ray Sheppard  wrote:


Hi,
  I am setting up version 1.7.2 for a Cray XE-6.  The build nodes have 
different interfaces than the compute nodes. I have been able to set it up, but 
users need to embed the following into their mpirun command:
--mca btl_tcp_if_include ipogif0





[OMPI users] btl_tcp_if_include setting

2013-10-14 Thread Ray Sheppard

Hi,
  I am setting up version 1.7.2 for a Cray XE-6.  The build nodes have 
different interfaces than the compute nodes. I have been able to set it 
up, but users need to embed the following into their mpirun command:

--mca btl_tcp_if_include ipogif0

Currently, The ompi_info shows as below.  I have been trying to figure 
out the correct configure switch to add to the build to remove the need 
for a permanent mpirun switch, but seem to be not getting it right.  Any 
suggestions are welcomed. Thanks,

 Ray


 ompi_info --param btl all
 MCA btl: parameter "btl_tcp_if_include" (current 
value: "",

  data source: default, level: 1 user/basic, type:
  string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to use for MPI communication
  (e.g., "eth0,192.168.0.0/16").  Mutually 
exclusive

  with btl_tcp_if_exclude.
 MCA btl: parameter "btl_tcp_if_exclude" (current value:
  "127.0.0.1/8,sppp", data source: default, 
level: 1

  user/basic, type: string)
  Comma-delimited list of devices and/or CIDR
  notation of networks to NOT use for MPI
  communication -- all devices not matching these
  specifications will be used (e.g.,
  "eth0,192.168.0.0/16").  If set to a non-default
  value, it is mutually exclusive with
  btl_tcp_if_include.




Re: [OMPI users] Trouble configuring 1.7.2 for Cuda 5.0.35

2013-08-14 Thread Ray Sheppard

Thank you for the quick reply Rolf,
  I personally don't know the Cuda libraries. I was hoping there had 
been a name change.  I am on a Cray XT-7.

Here is my configure command:

./configure CC=gcc FC=gfortran CFLAGS="-O2" F77=gfortran FCFLAGS="-O2" 
--enable-static --disable-shared  --disable-vt --with-threads=posix 
--with-gnu-ld --with-alps --with-cuda=/opt/nvidia/cudatoolkit/5.0.35 
--with-cuda-libdir=/opt/nvidia/cudatoolkit/5.0.35/lib64 
--prefix=/N/soft/cle4/openmpi/gnu/1.7.2/cuda


Ray

On 8/14/2013 2:50 PM, Rolf vandeVaart wrote:

It is looking for the libcuda.so file, not the libcudart.so file.   So, maybe 
--with-libdir=/usr/lib64
You need to be on a machine with the CUDA driver installed.  What was your 
configure command?

http://www.open-mpi.org/faq/?category=building#build-cuda

Rolf


-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ray
Sheppard
Sent: Wednesday, August 14, 2013 2:49 PM
To: Open MPI Users
Subject: [OMPI users] Trouble configuring 1.7.2 for Cuda 5.0.35

Hello,
   When I try to run my configure script, it dies with the following.
Below it are the actual libraries in the directory. Could the solution be as
simple as adding "rt" somewhere in the configure script?  Thanks.
  Ray

checking if --with-cuda-libdir is set... not found
configure: WARNING: Expected file
/opt/nvidia/cudatoolkit/5.0.35/lib64/libcuda.* not found
configure: error: Cannot continue
rsheppar@login1:/N/dc/projects/ray/br2/openmpi-1.7.2> ls -l
/opt/nvidia/cudatoolkit/5.0.35/lib64/
total 356284
lrwxrwxrwx 1 root root16 Mar 18 14:35 libcublas.so ->
libcublas.so.5.0
lrwxrwxrwx 1 root root19 Mar 18 14:35 libcublas.so.5.0 ->
libcublas.so.5.0.35
-rwxr-xr-x 1 root root  58852880 Sep 26  2012 libcublas.so.5.0.35
-rw-r--r-- 1 root root  21255400 Sep 26  2012 libcublas_device.a
-rw-r--r-- 1 root root456070 Sep 26  2012 libcudadevrt.a
lrwxrwxrwx 1 root root16 Mar 18 14:35 libcudart.so ->
libcudart.so.5.0
lrwxrwxrwx 1 root root19 Mar 18 14:35 libcudart.so.5.0 ->
libcudart.so.5.0.35
-rwxr-xr-x 1 root root375752 Sep 26  2012 libcudart.so.5.0.35
lrwxrwxrwx 1 root root15 Mar 18 14:35 libcufft.so -> libcufft.so.5.0
lrwxrwxrwx 1 root root18 Mar 18 14:35 libcufft.so.5.0 ->
libcufft.so.5.0.35
-rwxr-xr-x 1 root root  30787712 Sep 26  2012 libcufft.so.5.0.35
lrwxrwxrwx 1 root root17 Mar 18 14:35 libcuinj64.so ->
libcuinj64.so.5.0
lrwxrwxrwx 1 root root20 Mar 18 14:35 libcuinj64.so.5.0 ->
libcuinj64.so.5.0.35
-rwxr-xr-x 1 root root   1306496 Sep 26  2012 libcuinj64.so.5.0.35
lrwxrwxrwx 1 root root16 Mar 18 14:35 libcurand.so ->
libcurand.so.5.0
lrwxrwxrwx 1 root root19 Mar 18 14:35 libcurand.so.5.0 ->
libcurand.so.5.0.35
-rwxr-xr-x 1 root root  25281224 Sep 26  2012 libcurand.so.5.0.35
lrwxrwxrwx 1 root root18 Mar 18 14:35 libcusparse.so ->
libcusparse.so.5.0
lrwxrwxrwx 1 root root21 Mar 18 14:35 libcusparse.so.5.0 ->
libcusparse.so.5.0.35
-rwxr-xr-x 1 root root 132455240 Sep 26  2012 libcusparse.so.5.0.35
lrwxrwxrwx 1 root root13 Mar 18 14:35 libnpp.so -> libnpp.so.5.0
lrwxrwxrwx 1 root root16 Mar 18 14:35 libnpp.so.5.0 ->
libnpp.so.5.0.35
-rwxr-xr-x 1 root root  93602912 Sep 26  2012 libnpp.so.5.0.35
lrwxrwxrwx 1 root root20 Mar 18 14:35 libnvToolsExt.so ->
libnvToolsExt.so.5.0
lrwxrwxrwx 1 root root23 Mar 18 14:35 libnvToolsExt.so.5.0 ->
libnvToolsExt.so.5.0.35
-rwxr-xr-x 1 root root 31280 Sep 26  2012 libnvToolsExt.so.5.0.35



--
  Respectfully,
Ray Sheppard
rshep...@iu.edu
http://pti.iu.edu/sciapt
317-274-0016

Principal Analyst
Senior Technical Lead
Scientific Applications and Performance Tuning
Research Technologies
University Information Technological Services
IUPUI campus
Indiana University

My "pithy" saying:  Science is the art of translating the world
into language. Unfortunately, that language is mathematics.
Bumper sticker wisdom: Make it idiot-proof and they will make a
better idiot.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended rec

[OMPI users] Trouble configuring 1.7.2 for Cuda 5.0.35

2013-08-14 Thread Ray Sheppard

Hello,
  When I try to run my configure script, it dies with the following.  
Below it are the actual libraries in the directory. Could the solution 
be as simple as adding "rt" somewhere in the configure script?  Thanks.

 Ray

checking if --with-cuda-libdir is set... not found
configure: WARNING: Expected file 
/opt/nvidia/cudatoolkit/5.0.35/lib64/libcuda.* not found

configure: error: Cannot continue
rsheppar@login1:/N/dc/projects/ray/br2/openmpi-1.7.2> ls -l 
/opt/nvidia/cudatoolkit/5.0.35/lib64/

total 356284
lrwxrwxrwx 1 root root16 Mar 18 14:35 libcublas.so -> 
libcublas.so.5.0
lrwxrwxrwx 1 root root19 Mar 18 14:35 libcublas.so.5.0 -> 
libcublas.so.5.0.35

-rwxr-xr-x 1 root root  58852880 Sep 26  2012 libcublas.so.5.0.35
-rw-r--r-- 1 root root  21255400 Sep 26  2012 libcublas_device.a
-rw-r--r-- 1 root root456070 Sep 26  2012 libcudadevrt.a
lrwxrwxrwx 1 root root16 Mar 18 14:35 libcudart.so -> 
libcudart.so.5.0
lrwxrwxrwx 1 root root19 Mar 18 14:35 libcudart.so.5.0 -> 
libcudart.so.5.0.35

-rwxr-xr-x 1 root root375752 Sep 26  2012 libcudart.so.5.0.35
lrwxrwxrwx 1 root root15 Mar 18 14:35 libcufft.so -> libcufft.so.5.0
lrwxrwxrwx 1 root root18 Mar 18 14:35 libcufft.so.5.0 -> 
libcufft.so.5.0.35

-rwxr-xr-x 1 root root  30787712 Sep 26  2012 libcufft.so.5.0.35
lrwxrwxrwx 1 root root17 Mar 18 14:35 libcuinj64.so -> 
libcuinj64.so.5.0
lrwxrwxrwx 1 root root20 Mar 18 14:35 libcuinj64.so.5.0 -> 
libcuinj64.so.5.0.35

-rwxr-xr-x 1 root root   1306496 Sep 26  2012 libcuinj64.so.5.0.35
lrwxrwxrwx 1 root root16 Mar 18 14:35 libcurand.so -> 
libcurand.so.5.0
lrwxrwxrwx 1 root root19 Mar 18 14:35 libcurand.so.5.0 -> 
libcurand.so.5.0.35

-rwxr-xr-x 1 root root  25281224 Sep 26  2012 libcurand.so.5.0.35
lrwxrwxrwx 1 root root18 Mar 18 14:35 libcusparse.so -> 
libcusparse.so.5.0
lrwxrwxrwx 1 root root21 Mar 18 14:35 libcusparse.so.5.0 -> 
libcusparse.so.5.0.35

-rwxr-xr-x 1 root root 132455240 Sep 26  2012 libcusparse.so.5.0.35
lrwxrwxrwx 1 root root13 Mar 18 14:35 libnpp.so -> libnpp.so.5.0
lrwxrwxrwx 1 root root16 Mar 18 14:35 libnpp.so.5.0 -> 
libnpp.so.5.0.35

-rwxr-xr-x 1 root root  93602912 Sep 26  2012 libnpp.so.5.0.35
lrwxrwxrwx 1 root root20 Mar 18 14:35 libnvToolsExt.so -> 
libnvToolsExt.so.5.0
lrwxrwxrwx 1 root root23 Mar 18 14:35 libnvToolsExt.so.5.0 -> 
libnvToolsExt.so.5.0.35

-rwxr-xr-x 1 root root 31280 Sep 26  2012 libnvToolsExt.so.5.0.35



--
     Respectfully,
   Ray Sheppard
   rshep...@iu.edu
   http://pti.iu.edu/sciapt
   317-274-0016

   Principal Analyst
   Senior Technical Lead
   Scientific Applications and Performance Tuning
   Research Technologies
   University Information Technological Services
   IUPUI campus
   Indiana University

   My "pithy" saying:  Science is the art of translating the world
   into language. Unfortunately, that language is mathematics.
   Bumper sticker wisdom: Make it idiot-proof and they will make a
   better idiot.



Re: [OMPI users] LDBL_MANT_DIG declaration trouble

2013-04-12 Thread Ray Sheppard

Thanks!
  That explains why I could not find it in the package :)  Yes it is in 
float.h.  This Cray is screwy.  Y'all put the include at the top of the 
file but it seems to have forgotten a few hundred lines later.  Thanks agan.

  Ray

  On 4/12/2013 10:20 AM, Ralph Castain wrote:

It should have been defined in . Is that include file not found? You 
might check to ensure it was defined there.


On Apr 12, 2013, at 7:09 AM, Ray Sheppard  wrote:


Hi,
  I am sorry to bother everyone.  I have had no trouble building 1.6.3 with the 
Intel compiler. Now I am having to repeat the exercise for GNU.  In 
opal/util/arch.h (about line 260) is the function below. I am getting an error 
that LDBL_MANT_DIG is not declared.  I can not seem to find where it is 
declared.  Any hints would be appreciated.  Thanks.
 Ray


static inline int32_t opal_arch_ldisintel( void )
{
long double ld = 2.0;
int i, j;
uint32_t* pui = (uint32_t*)(void*)&ld;

j = LDBL_MANT_DIG / 32;
i = (LDBL_MANT_DIG % 32) - 1;
if( opal_arch_isbigendian() ) { /* big endian */
j = (sizeof(long double) / sizeof(unsigned int)) - j;
if( i < 0 ) {
i = 31;
j = j+1;
}
} else {
if( i < 0 ) {
i = 31;
j = j-1;
}
}
return (pui[j] & (1 << i) ? 1 : 0);
}


Function is described:
/* we must find which representation of long double is used
* intel or sparc. Both of them represent the long doubles using a close to
* IEEE representation (seee..emmm...m) where the mantissa look like
* 1.. For the intel representaion the 1 is explicit, and for the sparc
* the first one is implicit. If we take the number 2.0 the exponent is 1
* and the mantissa is 1.0 (the sign of course should be 0). So if we check
* for the first one in the binary representation of the number, we will
* find the bit from the exponent, so the next one should be the begining
* of the mantissa. If it's 1 then we have an intel representaion, if not
* we have a sparc one. QED
*/

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] LDBL_MANT_DIG declaration trouble

2013-04-12 Thread Ray Sheppard

Hi,
  I am sorry to bother everyone.  I have had no trouble building 1.6.3 
with the Intel compiler. Now I am having to repeat the exercise for 
GNU.  In opal/util/arch.h (about line 260) is the function below. I am 
getting an error that LDBL_MANT_DIG is not declared.  I can not seem to 
find where it is declared.  Any hints would be appreciated.  Thanks.

 Ray


static inline int32_t opal_arch_ldisintel( void )
{
long double ld = 2.0;
int i, j;
uint32_t* pui = (uint32_t*)(void*)&ld;

j = LDBL_MANT_DIG / 32;
i = (LDBL_MANT_DIG % 32) - 1;
if( opal_arch_isbigendian() ) { /* big endian */
j = (sizeof(long double) / sizeof(unsigned int)) - j;
if( i < 0 ) {
i = 31;
j = j+1;
}
} else {
if( i < 0 ) {
i = 31;
j = j-1;
}
}
return (pui[j] & (1 << i) ? 1 : 0);
}


Function is described:
/* we must find which representation of long double is used
 * intel or sparc. Both of them represent the long doubles using a close to
 * IEEE representation (seee..emmm...m) where the mantissa look like
 * 1.. For the intel representaion the 1 is explicit, and for the sparc
 * the first one is implicit. If we take the number 2.0 the exponent is 1
 * and the mantissa is 1.0 (the sign of course should be 0). So if we check
 * for the first one in the binary representation of the number, we will
 * find the bit from the exponent, so the next one should be the begining
 * of the mantissa. If it's 1 then we have an intel representaion, if not
 * we have a sparc one. QED
 */



Re: [OMPI users] Can not turn off C++ build.

2012-11-29 Thread Ray Sheppard

Thanks Jeff,
  Of course you were right. I had thought the lost function was 
something internal to y'alls build. It is pretty scary that they have 
been building and porting for weeks (while I was running around SC and 
the holidays) and it takes an old fortran guy to notice they don't have 
a working C++ compiler.  Well, truth be told, you did the noticing. 
Thanks again.

   Ray

On 11/28/2012 5:09 PM, Jeff Squyres wrote:

According to config.log, your icpc is broken -- it won't compile a trivial C++ 
program.  Try it yourself -- try compiling

-
#include 
#include 
using namespace std;
int main(int argc, char* argv[]) {
 cout << "Hello, world" << endl;
 return 0;
}
-

Do you need to set some environment variables before you invoke the Intel 
compilers?


On Nov 28, 2012, at 5:03 PM, Ray Sheppard wrote:


Hi Jeff,
  Thanks.  I am just running the Intel 13.0.1 compiler on the Quarry cluster at 
IU. It would be very odd to have a serious issue without users complaining.  I 
tried running it again with C++ turned on with:

./configure CC=icc CFLAGS="-xT -O2" F77=ifort FFLAGS="-xT -O2" FC=ifort FCFLAGS="-xT 
-O2" CXX=icpc --enable-static --disable-shared --with-threads=posix 
--prefix=/N/soft/rhel6/openmpi/intel/openmpi-1.6.3

The block ends very similarly to how it acted with g++:

*** C++ compiler and preprocessor
checking whether we are using the GNU C++ compiler... yes
checking whether icpc accepts -g... yes
checking dependency style of icpc... gcc3
checking how to run the C++ preprocessor... icpc -E
checking for the C++ compiler vendor... intel
checking if icpc supports -finline-functions... yes
configure: WARNING:  -finline-functions has been added to CXXFLAGS
checking if C and C++ are link compatible... yes
checking for C++ optimization flags... -O3 -DNDEBUG -finline-functions
checking size of bool... 0
checking alignment of bool... configure: WARNING: *** Problem running configure 
test!
configure: WARNING: *** See config.log for details.
configure: error: *** Cannot continue.


Checking the config.log and picking it up around config 16462:

configure:16462: checking for the C++ compiler vendor
configure:16491: icpc -c -DNDEBUG   conftest.cpp >&5
configure:16491: $? = 0
configure:17030: result: intel
configure:17283: checking if icpc supports -finline-functions
configure:17299: icc -c -DNDEBUG -xT -O2 -finline-functions -fno-strict-aliasing 
-restrict  conftest.c >&5
icc: command line remark #10279: option '-xT' is deprecated and will be removed 
in a future release. See '-help deprecated'
configure:17299: $? = 0
configure:17306: result: yes
configure:17393: WARNING:  -finline-functions has been added to CXXFLAGS
configure:17404: checking if C and C++ are link compatible
configure:17430: icc -c -DNDEBUG -xT -O2 -finline-functions 
-fno-strict-aliasing -restrict  conftest_c.c
icc: command line remark #10279: option '-xT' is deprecated and will be removed 
in a future release. See '-help deprecated'
configure:17437: $? = 0
configure:17468: icpc -o conftest -DNDEBUG -finline-functions conftest.cpp 
conftest_c.o  >&5
configure:17468: $? = 0
configure:17494: result: yes
configure:17589: checking for C++ optimization flags
configure:17591: result: -O3 -DNDEBUG -finline-functions
configure:17606: checking size of bool
configure:17611: icpc -o conftest -O3 -DNDEBUG -finline-functions   conftest.cpp  
>&5
/usr/include/bits/stdio.h(118): error: identifier "__getdelim" is undefined
return __getdelim (__lineptr, __n, '\n', __stream);
   ^

compilation aborted for conftest.cpp (code 2)
configure:17611: $? = 2
configure: program exited with status 2
configure: failed program was:
| /* confdefs.h */
|


So, I am lost.  Thanks again
Ray










Re: [OMPI users] Can not turn off C++ build.

2012-11-28 Thread Ray Sheppard

Hi Jeff,
  Thanks.  I am just running the Intel 13.0.1 compiler on the Quarry 
cluster at IU. It would be very odd to have a serious issue without 
users complaining.  I tried running it again with C++ turned on with:


 ./configure CC=icc CFLAGS="-xT -O2" F77=ifort FFLAGS="-xT -O2" 
FC=ifort FCFLAGS="-xT -O2" CXX=icpc --enable-static --disable-shared 
--with-threads=posix --prefix=/N/soft/rhel6/openmpi/intel/openmpi-1.6.3


The block ends very similarly to how it acted with g++:

*** C++ compiler and preprocessor
checking whether we are using the GNU C++ compiler... yes
checking whether icpc accepts -g... yes
checking dependency style of icpc... gcc3
checking how to run the C++ preprocessor... icpc -E
checking for the C++ compiler vendor... intel
checking if icpc supports -finline-functions... yes
configure: WARNING:  -finline-functions has been added to CXXFLAGS
checking if C and C++ are link compatible... yes
checking for C++ optimization flags... -O3 -DNDEBUG -finline-functions
checking size of bool... 0
checking alignment of bool... configure: WARNING: *** Problem running 
configure test!

configure: WARNING: *** See config.log for details.
configure: error: *** Cannot continue.


Checking the config.log and picking it up around config 16462:

configure:16462: checking for the C++ compiler vendor
configure:16491: icpc -c -DNDEBUG   conftest.cpp >&5
configure:16491: $? = 0
configure:17030: result: intel
configure:17283: checking if icpc supports -finline-functions
configure:17299: icc -c -DNDEBUG -xT -O2 -finline-functions 
-fno-strict-aliasing -restrict  conftest.c >&5
icc: command line remark #10279: option '-xT' is deprecated and will be 
removed in a future release. See '-help deprecated'

configure:17299: $? = 0
configure:17306: result: yes
configure:17393: WARNING:  -finline-functions has been added to CXXFLAGS
configure:17404: checking if C and C++ are link compatible
configure:17430: icc -c -DNDEBUG -xT -O2 -finline-functions 
-fno-strict-aliasing -restrict  conftest_c.c
icc: command line remark #10279: option '-xT' is deprecated and will be 
removed in a future release. See '-help deprecated'

configure:17437: $? = 0
configure:17468: icpc -o conftest -DNDEBUG -finline-functions 
conftest.cpp conftest_c.o  >&5

configure:17468: $? = 0
configure:17494: result: yes
configure:17589: checking for C++ optimization flags
configure:17591: result: -O3 -DNDEBUG -finline-functions
configure:17606: checking size of bool
configure:17611: icpc -o conftest -O3 -DNDEBUG -finline-functions   
conftest.cpp  >&5

/usr/include/bits/stdio.h(118): error: identifier "__getdelim" is undefined
return __getdelim (__lineptr, __n, '\n', __stream);
   ^

compilation aborted for conftest.cpp (code 2)
configure:17611: $? = 2
configure: program exited with status 2
configure: failed program was:
| /* confdefs.h */
|


So, I am lost.  Thanks again
Ray







On 11/28/2012 4:17 PM, Jeff Squyres wrote:

I'll bet we're not disabling the C++ test properly when you disable the C++ 
bindings.  Bummer.  I'll file a bug, but I don't know when that will be fixed.

However, this kind of error typically only occurs when your C++ compiler fails 
altogether (e.g., it's broken).  Check the config.log file and see what it says 
happened for this specific test -- it may well be that your C++ compiler is 
faulty and needs to be fixed anyway.


On Nov 28, 2012, at 2:19 PM, Ray Sheppard wrote:



Hello,
  I am trying to build OpenMPI 1.6.3 on an IBM/Intel RHEL-6 cluster.
  I tried building with variations (meaning enable-...=no, disable-,
changing switch order, etc.) of this:

./configure CC=icc CFLAGS="-xT -O2" F77=ifort FFLAGS="-xT -O2"
FC=ifort FCFLAGS="-xT -O2" --enable-mpi-cxx=no --disable-mpi-cxx-seek
--enable-static --disable-shared --with-threads=posix
--prefix=/N/soft/rhel6/openmpi/intel/openmpi-1.6.3

I first tried using icpc as a CXX compiler but it dies shortly after
checking the alignment of bool.  C++ bindings are not that popular so I
decided to just turn them off.  Now, it just picks up g++ and tries
building the C++ bindings anyway:

** C++ compiler and preprocessor
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
checking how to run the C++ preprocessor... g++ -E
checking for the C++ compiler vendor... gnu
checking if g++ supports -finline-functions... yes
configure: WARNING:  -finline-functions has been added to CXXFLAGS
checking if C and C++ are link compatible... yes
checking for C++ optimization flags... -O3 -DNDEBUG -finline-functions
checking size of bool... 0
checking alignment of bool... configure: WARNING: *** Problem running
configure test!
configure: WARNING: *** See config.log

[OMPI users] Can not turn off C++ build.

2012-11-28 Thread Ray Sheppard



Hello,
  I am trying to build OpenMPI 1.6.3 on an IBM/Intel RHEL-6 cluster.
  I tried building with variations (meaning enable-...=no, disable-,
changing switch order, etc.) of this:

 ./configure CC=icc CFLAGS="-xT -O2" F77=ifort FFLAGS="-xT -O2"
FC=ifort FCFLAGS="-xT -O2" --enable-mpi-cxx=no --disable-mpi-cxx-seek
--enable-static --disable-shared --with-threads=posix
--prefix=/N/soft/rhel6/openmpi/intel/openmpi-1.6.3

I first tried using icpc as a CXX compiler but it dies shortly after
checking the alignment of bool.  C++ bindings are not that popular so I
decided to just turn them off.  Now, it just picks up g++ and tries
building the C++ bindings anyway:

** C++ compiler and preprocessor
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
checking how to run the C++ preprocessor... g++ -E
checking for the C++ compiler vendor... gnu
checking if g++ supports -finline-functions... yes
configure: WARNING:  -finline-functions has been added to CXXFLAGS
checking if C and C++ are link compatible... yes
checking for C++ optimization flags... -O3 -DNDEBUG -finline-functions
checking size of bool... 0
checking alignment of bool... configure: WARNING: *** Problem running
configure test!
configure: WARNING: *** See config.log for details.
configure: error: *** Cannot continue.


It still errors.  I am happy to just kill C++ but it won't.  What is
wrong?  Thanks.
  Ray

--
 Respectfully,
   Ray Sheppard
rshep...@iu.edu
http://pti.iu.edu/sciapt
   317-274-0016

   Principal Analyst
   Scientific Applications and Performance Tuning
   Research Technologies
   University Information Technological Services
   IUPUI campus
   Indiana University

   My "pithy" saying:  Science is the art of translating the world
   into language. Unfortunately, that language is mathematics.
   Bumper sticker wisdom: Make it idiot-proof and they will make a
   better idiot.