Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Gregory John Orris

All,
I really didn't want to start a new thread discussing the virtues and  
vices of every compiler, since this is hardly my forte and the  
opportunity to offend someone is fairly high, whilst making myself  
look clownish
What I should have said was that "for my organization one cannot  
justify the cost of buying, for example Intel's compiler."
Aside from some political/economic/legal reasons for this being true,  
we have a lot of very specific code that has been optimized over the  
years by hand.  While there are clearly speed advantages for Intel's  
compilers for many many problems, in our view, for our problems and  
our "in house" software, the speedup has not been sufficient to  
warrant the cost of a compiler for several hundred to several thousand  
computers (yes there are that many developers). Besides, timing claims  
and speedups are also dependent  upon the version of gcc/gfortran one  
uses, and what memory allocation routine, etc, etc There have been  
significant improvements in gcc from 4.1 through 4.3 that have been  
well documented. And there have also been accusations that intel  
"chooses" problems to accentuate their claims of supremacy. But  
doesn't every vendor do this? I admit that given an infinite amount of  
money I would go with the Intel compiler for all of our development  
work. Since we are more of a "proof or concept" organization and ship  
out production runs to real centers (with intel compilers), even a  
factor of 2 (which is our typical experience) is not significant  
enough. If it were to become, as you state roughly 8 times faster, it  
might be a different story. As it is, only legacy code is using  
fortran and if one moves to C/C++ the differences we've seen have been  
miniscule.


Bottom line, it works for my configuration right now and both me and  
the other users are happy.


Thanks all, for help, advice, and a provocative discussion.

Regards,
Greg

On Mar 6, 2008, at 5:11 PM, Michael wrote:



On Mar 6, 2008, at 12:49 PM, Doug Reeder wrote:

Greg,

I would disagree with your statement that the available fortran
options can't pass a cost-benefit analysis. I have found that for
scientific programming (e.g., Livermore Fortran Kernels and actual
PDE solvers) that code produced by the intel compiler runs 25 to 55%
faster than code from gfortran or g95. Looking at the cost of adding
processors with g95/gfortran to get the same throughput as with
ifort you recover the $549 compiler cost real quickly.

Doug Reeder



I've a big fan of g95, but actually I'm seeing even greater
differences in a small code I'm using for some lengthy calculations.

With 14 MB of data being read into memory and processed:

Intel ifort  is 7.7x faster then Linux g95 on MacPro 3.0 GHz
Intel ifort  is 2.9x faster then Linux g95 on Dual Opteron 1.4 GHz
Intel ifort  is 1.8x faster then Linux g95 on SGI Altix 350 dual
Itanium2 1.4 GHz
OS X g95 is 2.7x faster then Linux g95 on a MacPro 2.66 GHz (same
hardware exactly)

The complete data set is very large, 56 GB, but that is 42 individual
frequencies, where as the 14 MB is a single frequency, data averaged
over areas, so get a favor of the answer but not exactly the right
answer.  I played around with compiler options, specified the exact
processor type within the limits of gcc and I gained only factions of
a percent.

A co-worker saw factor 2 differences between Intel's compiler and g95
with a very complicated code.

Michael

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Terry Frankcombe

> Intel ifort  is 7.7x faster then Linux g95 on MacPro 3.0 GHz
> Intel ifort  is 2.9x faster then Linux g95 on Dual Opteron 1.4 GHz
> Intel ifort  is 1.8x faster then Linux g95 on SGI Altix 350 dual  
> Itanium2 1.4 GHz
> OS X g95 is 2.7x faster then Linux g95 on a MacPro 2.66 GHz (same  
> hardware exactly)

That ordering makes little sense to me.  The Intel compilers should be
the most effective on Itanium, where a lot of functionality has been
moved out of hardware and foisted onto the compiler!

Have you done these tests with a recent gfortran?  (Certainly the gcc
people would want to know about it as these must be missed
optimisations.  But only for the gfortran case.)



Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Michael


On Mar 6, 2008, at 12:49 PM, Doug Reeder wrote:

Greg,

I would disagree with your statement that the available fortran  
options can't pass a cost-benefit analysis. I have found that for  
scientific programming (e.g., Livermore Fortran Kernels and actual  
PDE solvers) that code produced by the intel compiler runs 25 to 55%  
faster than code from gfortran or g95. Looking at the cost of adding  
processors with g95/gfortran to get the same throughput as with  
ifort you recover the $549 compiler cost real quickly.


Doug Reeder



I've a big fan of g95, but actually I'm seeing even greater  
differences in a small code I'm using for some lengthy calculations.


With 14 MB of data being read into memory and processed:

Intel ifort  is 7.7x faster then Linux g95 on MacPro 3.0 GHz
Intel ifort  is 2.9x faster then Linux g95 on Dual Opteron 1.4 GHz
Intel ifort  is 1.8x faster then Linux g95 on SGI Altix 350 dual  
Itanium2 1.4 GHz
OS X g95 is 2.7x faster then Linux g95 on a MacPro 2.66 GHz (same  
hardware exactly)


The complete data set is very large, 56 GB, but that is 42 individual  
frequencies, where as the 14 MB is a single frequency, data averaged  
over areas, so get a favor of the answer but not exactly the right  
answer.  I played around with compiler options, specified the exact  
processor type within the limits of gcc and I gained only factions of  
a percent.


A co-worker saw factor 2 differences between Intel's compiler and g95  
with a very complicated code.


Michael



[OMPI users] FW: slurm and all-srun orterun

2008-03-06 Thread Sacerdoti, Federico
Ralph, here is Moe's response. The srun options he mentions look
promising: they can signal an otherwise happy orted daemon (sitting on a
waitpid) that something is amiss elsewhere in the job. Do orteds change
their session ID?

Thanks Moe,
Federico

-Original Message-
From: jet...@llnl.gov [mailto:jet...@llnl.gov] 
Sent: Wednesday, March 05, 2008 2:21 PM
To: Sacerdoti, Federico; Open MPI Users
Subject: RE: [OMPI users] slurm and all-srun orterun

Slurm and its APIs are available under the GPL license.
Since Open MPI is not available under the GPL license it
can not link with the Slurm APIs, however virtually all
of that API functionality is available through existing
Slurm commands. The commands are clearly not as simple to
use as the APIs, but if you encounter any problems using
the commands we can certainly make changes to facilitate
their use. For example, Slurm communicates with the Maui
and Moab schedulers using an interface that loosely
resembles XML. We are also prepared to provide additional
functionality as needed by OpenMPI.

Regarding premature termination of processes that Slurm
spawns, the srun command has a couple of option that may
prove useful:

-K, --kill-on-bad-exit
  Terminate a job if any task exits with a non-zero exit code.

-W, --wait=seconds
  Specify how long to wait after the first task terminates before
  terminating  all  remaining  tasks.  A  value of 0 indicates an
  unlimited wait (a warning will be issued after 60 seconds). The
  default  value  is  set  by the WaitTime parameter in the slurm
  configuration file (see slurm.conf(5)). This option can be use-
  ful  to  insure that a job is terminated in a timely fashion in
  the event that one or more tasks terminate prematurely.

Any tasks launched outside of Slurm's control (e.g. rsh) are not
purged on job termination. Slurm locates spawned tasks and any of
their children using the configured ProcTrack plugin, of which
several are available. If you use the SID (session ID) plugin
and spawned tasks change their SID, Slurm will no longer track
them. Several reliable process tracking mechanisms are available,
but some do require kernel changes. See "man slurm.conf" for more
information.

Moe



At 11:16 AM -0500 3/5/08, Sacerdoti, Federico wrote:
>Thanks Ralph,
>
>First, we would be happy to test the slurm direct launch capability.
>Regarding the failure case, I realize that the IB errors do not
directly
>affect the orted daemons. This is what we observed:
>
>1. Parallel job started
>2. IB errors caused some processes to fail (but not all)
>3. slurm tears down entire job, attempting to kill all orted and their
>children
>
>We want this behavior: if any process of a parallel job dies, all
>processes should be stopped. The orted daemons in charge of processes
>that did not fail are the problem, as slurm was not able to kill them.
>Sounds like this is a known issue in openmpi 1.2.x.
>
>In any case, the new direct launching methods sound promising. I am
>surprised there are licensing issues with Slurm, is this a GPL-and-BSD
>issue? I am CC'ing slurm author Moe; he may be able to help.
>
>Thanks again and I look forward to testing the direct launch,
>Federico
>
>
>-Original Message-
>From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
>Behalf Of Ralph Castain
>Sent: Monday, March 03, 2008 8:19 PM
>To: Open MPI Users 
>Cc: Ralph Castain
>Subject: Re: [OMPI users] slurm and all-srun orterun
>
>Hello
>
>I don't monitor the user list any more, but a friendly elf sent this
>along
>to me.
>
>I'm not entirely sure what problem might be causing the behavior you
are
>seeing. Neither mpirun nor any orted should be impacted by IB problems
>as
>they aren't MPI processes and thus never interact with IB. Only
>application
>procs touch the IB subsystem - if an application proc fails, the orted
>should see that and correctly order the shutdown of the job. So if you
>are
>having IB problems, that wouldn't explain daemons failing.
>
>If a daemon is aborting, that will cause problems in 1.2.x. We have
>noted
>that SLURM (even though the daemons are launched via srun) doesn't
>always
>tell us when this happens, leaving Open MPI vulnerable to "hangs" as it
>attempts to cleanup and finds it can't do it. I'm not sure why you
would
>see
>a daemon die, though - the fact that an application process failed
>shouldn't
>cause that to happen. Likewise, it would seem strange that the
>application
>process would fail and the daemon not notice - this has nothing to do
>with
>slurm, but is just a standard Linux "waitpid" method.
>
>The most likely reason for the behavior you describe is that an
>application
>process encounters an IB problem which blocks communication - but the
>process doesn't actually abort or terminate, it just hangs there. In
>this
>case, the orted doesn't see the process exit, so the system doesn't
know
>it
>should take any action.
>
>That said, we know 

Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Doug Reeder

Greg,

I would disagree with your statement that the available fortran  
options can't pass a cost-benefit analysis. I have found that for  
scientific programming (e.g., Livermore Fortran Kernels and actual  
PDE solvers) that code produced by the intel compiler runs 25 to 55%  
faster than code from gfortran or g95. Looking at the cost of adding  
processors with g95/gfortran to get the same throughput as with ifort  
you recover the $549 compiler cost real quickly.


Doug Reeder
On Mar 6, 2008, at 9:20 AM, Gregory John Orris wrote:


Sorry for the long delay in response.

Let's get back to the beginning:
My original compiler configuration was gcc from the standard  
Leopard Developer Tools supplied off the installation DVD. This  
version was 4.0.1. However, it has been significantly modified by  
Apple to work with Leopard. If you haven't used Apple's Developer  
Environment, you're missing out on something. It's pretty sweet.  
But the price you pay for it is no fortran support (not usually a  
problem for me but it is relevant here) and usually a somewhat time- 
lagged compiler. I'm not as plugged into Apple as perhaps I should  
be, but I can only imagine that their philosophy is to really over  
test their compiler. Gratis, Apple throws into it's "frameworks" a  
shared library called vecLib, that includes machine optimized BLAS  
and CLAPACK routines. Also, with Leopard, Apple has integrated open- 
mpi (yea!). But they have once again not included fortran support  
(boo!).


Now, to get fortran on a Mac you have several options (most of  
which cannot really survive the cost-benefit analysis of a  
competent manager), but a perfectly fine freeware option is to get  
it off of hpc.sourceforge.net. This version is based on gcc 4.3.0.  
There are a few legitimate reasons to stick with Apple's older gcc.  
As it's not really a good idea to try an mix libraries from one  
compiler version with another. Especially here, because (without  
knowing precisely what Apple has done) there is a tremendous  
difference in execution speed of code written with gcc 4.0 and 4.1  
as opposed to 4.2 and later. (This has been well documented on many  
systems.) Also, out of a bit of laziness, I really didn't want to  
go to the trouble of re-writing (or finding) all of the compiler  
scripts in the Developer Environment to use the new gcc.


So, I compiled open-mpi-1.2.5 with gcc, g++ 4.0.1, and gfortran  
4.3. Then, I compiled BLACS and ScaLAPACK using the configuration  
from the open-mpi FAQ page. Everything compiles perfectly ok,  
independent of whether you choose 32 or 64 bit addressing. First  
problem was that I was still calling mpicc from the Apple supplied  
openmpi and mpif77 from the newly installed distribution. Once  
again, I've not a clue what Apple has done, but while the two would  
compile items together, they DO NOT COMMUNICATE properly in 64-bit  
mode. MPI_COMM_WORLD even in the test routines of openMPI would  
fail! This is the point at which I originated the message asking if  
anyone had gotten a 64-bit version to actually work. The errors  
were in libSystem and were not what I'd expect from a simple  
openmpi error. I believe this problem is caused by a difference in  
how pointers were/are treated within gcc from version to version.  
Thus mixing versions essentially caused failure within the Apple  
supplied openmpi distribution and the new one I installed.


How to get over this hurdle? Install the complete gcc 4.3.0 from  
the hpc.sourceforge.net site and recompile EVERYTHING!


You might think you were done here, but there is one (or actually  
four) additional problem(s). Now NONE of the complex routines  
worked. All of the test routines returned failure. And I tracked it  
down the the fact that pzdotc, pzdotu, pcdotc, and pcdotu inside of  
the PBLAS routines were failing. Potentially this was a much more  
difficult problem, since rewriting these codes is really not what  
I'm paid to do. Tracing down these errors further I found that the  
actual problem is with the zdotc, zdotu, cdotc, and cdotu BLAS  
routines inside of Apple's vecLib. So, the problem seemed as though  
a faulty manufacturer supplied and optimized library was not  
functioning properly. Well, as it turns out there is a peculiar  
difference (again) between versions of the gcc suite in how it  
regards, returned values from complex fortran functions (I'm only  
assuming this since the workaround was successful). This problem  
has been know for some time now (perhaps 4 years or more). See,   
http://developer.apple.com/hardware/ve/errata.html#fortran_conventions


How to get over this hurdle? Install ATLAS, CLAPACK, and CBLAS off  
the netlib.org web site, and compile them with the gcc 4.3.0 suite.


So, where am I now? BLACS and ScaLAPACK, and PBLAS work in 64-bit  
mode with CLAPACK-3.1.1, ATLAS 3.8.1, Open-MPI-1.2.5, and GCC 4.3.0  
and link with ATLAS and CLAPACK and NOT vecLib!


Long way of saying that 

Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Gregory John Orris

Sorry for the long delay in response.

Let's get back to the beginning:
My original compiler configuration was gcc from the standard Leopard  
Developer Tools supplied off the installation DVD. This version was  
4.0.1. However, it has been significantly modified by Apple to work  
with Leopard. If you haven't used Apple's Developer Environment,  
you're missing out on something. It's pretty sweet. But the price you  
pay for it is no fortran support (not usually a problem for me but it  
is relevant here) and usually a somewhat time-lagged compiler. I'm not  
as plugged into Apple as perhaps I should be, but I can only imagine  
that their philosophy is to really over test their compiler. Gratis,  
Apple throws into it's "frameworks" a shared library called vecLib,  
that includes machine optimized BLAS and CLAPACK routines. Also, with  
Leopard, Apple has integrated open-mpi (yea!). But they have once  
again not included fortran support (boo!).


Now, to get fortran on a Mac you have several options (most of which  
cannot really survive the cost-benefit analysis of a competent  
manager), but a perfectly fine freeware option is to get it off of  
hpc.sourceforge.net. This version is based on gcc 4.3.0. There are a  
few legitimate reasons to stick with Apple's older gcc. As it's not  
really a good idea to try an mix libraries from one compiler version  
with another. Especially here, because (without knowing precisely what  
Apple has done) there is a tremendous difference in execution speed of  
code written with gcc 4.0 and 4.1 as opposed to 4.2 and later. (This  
has been well documented on many systems.) Also, out of a bit of  
laziness, I really didn't want to go to the trouble of re-writing (or  
finding) all of the compiler scripts in the Developer Environment to  
use the new gcc.


So, I compiled open-mpi-1.2.5 with gcc, g++ 4.0.1, and gfortran 4.3.  
Then, I compiled BLACS and ScaLAPACK using the configuration from the  
open-mpi FAQ page. Everything compiles perfectly ok, independent of  
whether you choose 32 or 64 bit addressing. First problem was that I  
was still calling mpicc from the Apple supplied openmpi and mpif77  
from the newly installed distribution. Once again, I've not a clue  
what Apple has done, but while the two would compile items together,  
they DO NOT COMMUNICATE properly in 64-bit mode. MPI_COMM_WORLD even  
in the test routines of openMPI would fail! This is the point at which  
I originated the message asking if anyone had gotten a 64-bit version  
to actually work. The errors were in libSystem and were not what I'd  
expect from a simple openmpi error. I believe this problem is caused  
by a difference in how pointers were/are treated within gcc from  
version to version. Thus mixing versions essentially caused failure  
within the Apple supplied openmpi distribution and the new one I  
installed.


How to get over this hurdle? Install the complete gcc 4.3.0 from the  
hpc.sourceforge.net site and recompile EVERYTHING!


You might think you were done here, but there is one (or actually  
four) additional problem(s). Now NONE of the complex routines worked.  
All of the test routines returned failure. And I tracked it down the  
the fact that pzdotc, pzdotu, pcdotc, and pcdotu inside of the PBLAS  
routines were failing. Potentially this was a much more difficult  
problem, since rewriting these codes is really not what I'm paid to  
do. Tracing down these errors further I found that the actual problem  
is with the zdotc, zdotu, cdotc, and cdotu BLAS routines inside of  
Apple's vecLib. So, the problem seemed as though a faulty manufacturer  
supplied and optimized library was not functioning properly. Well, as  
it turns out there is a peculiar difference (again) between versions  
of the gcc suite in how it regards, returned values from complex  
fortran functions (I'm only assuming this since the workaround was  
successful). This problem has been know for some time now (perhaps 4  
years or more). See,  http://developer.apple.com/hardware/ve/errata.html#fortran_conventions


How to get over this hurdle? Install ATLAS, CLAPACK, and CBLAS off the  
netlib.org web site, and compile them with the gcc 4.3.0 suite.


So, where am I now? BLACS and ScaLAPACK, and PBLAS work in 64-bit mode  
with CLAPACK-3.1.1, ATLAS 3.8.1, Open-MPI-1.2.5, and GCC 4.3.0 and  
link with ATLAS and CLAPACK and NOT vecLib!


Long way of saying that the problem appears to be solved, but not well  
documented (until now)!


Regards,
Greg

On Mar 6, 2008, at 8:25 AM, Terry Dontje wrote:


Ok, I think I found the cause of the SPARC segv when trying to use a
64-bit compiled Open MPI library.  If one does not set the WHATMPI
variable in the Bmake.inc it defaults to UseF77Mpi which assumes all
handles are ints.  This is a correct assumption if you are using the  
F77
interfaces but the way BLACS seems to compile for Open MPI it uses  
the C

versions.  So the handles are stored as 32 bits in 

Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Terry Dontje
Ok, I think I found the cause of the SPARC segv when trying to use a 
64-bit compiled Open MPI library.  If one does not set the WHATMPI 
variable in the Bmake.inc it defaults to UseF77Mpi which assumes all 
handles are ints.  This is a correct assumption if you are using the F77 
interfaces but the way BLACS seems to compile for Open MPI it uses the C 
versions.  So the handles are stored as 32 bits in BLACS and passed to 
the C Open MPI interfaces which expects 64 bits.  In cases where your 
addresses need more than 32 bits this will cause MPI to segv when passed 
an invalid address due to this coersion.


So by setting "WHATMPI= -DUseCMpi" I've gotten the SPARC version of 
BLACS compiled for 64 bits to pass its tests without segv'ing.  I do 
believe this issue actually exists for other platforms (ie AMD64 and 
IA64) with other OSes and compilers.  Just that we've been lucky that 
MPI_COMM_WORLD is allocated such that it has an address that fits in 32 
bits.  I am amazed still that we haven't seen this fail in user codes.  
Note, I have not confirmed this failure with a test case but the code 
stack in dbx looks the same on X64 platforms as the code on SPARC except 
the address is smaller on the former.


Greg, I would be interested in knowing if you are still seeing the 
problem on Leopard and whether the above setting helps any.


--td

*

*Subject:* Re: [OMPI users] ScaLapack and BLACS on Leopard
*From:* Terry Dontje (/Terry.Dontje_at_[hidden]/)
*Date:* 2008-03-03 07:34:17

*


What kind of system lib errors are you seeing and do you have a stack
trace? Note, I was trying something similar with Solaris and 64-bit on
a SPARC machine and was seeing segv's inside the MPI Library due to a
pointer being passed through an integer (thus dropping the upper 32
bits). Funny thing is it all works under Solaris on AMD64 or IA-64
platforms.

--td

> Date: Thu, 28 Feb 2008 17:50:28 -0500
> From: Gregory John Orris 
> Subject: [OMPI users] ScaLapack and BLACS on Leopard
> To: Open MPI Users 
> Message-ID: <528FD4C0-6157-49CB-80E6-1C62684E4545_at_[hidden]>
> Content-Type: text/plain; charset="us-ascii"
>
> Hey Folks,
>
> Anyone got ScaLapack and BLACS working and not just compiled under
> OSX10.5 in 64-bit mode?
> The FAQ site directions were followed and every thing compiles just
> fine. But ALL of the single precision routines and many of the double
> precisions routines in the TESTING directory fail with system lib
> errors.
>
> I've gotten some interesting errors and am wondering what the magic
> touch is.
>
> Regards,
> Greg
> 


Re: [OMPI users] General Design Question

2008-03-06 Thread Jeff Squyres

On Mar 5, 2008, at 1:07 PM, Samir Faci wrote:

the search seems easy enough to parallelize, but I would need to do  
the image analysis split among processors.  Would there be any  
problems with having MPI initiated and finalized within a class?



The only restriction is that you can only initialize and finalize MPI  
*once* within a process.


--
Jeff Squyres
Cisco Systems