Re: [OMPI users] Problem with Openmpi 1.1

2006-07-28 Thread Jeff Squyres
Trolling through some really old mails that never got replies... :-(

I'm afraid that the guy who did the GM code in Open MPI is currently on
vacation, but we have made a small number of changes since 1.1 that may have
fixed your issue.

Could you try one of the 1.1.1 release candidate tarballs and see if you
still have the problem?

http://www.open-mpi.org/software/ompi/v1.1/


On 7/3/06 12:58 PM, "Borenstein, Bernard S"
 wrote:

> I've built and sucessfully run the Nasa Overflow 2.0aa program with
> Openmpi 1.0.2.  I'm running on an opteron linux cluster running SLES 9
> and GM 2.0.24. I built Openmpi 1.1 with the intel 9 compilers and try to
> run Overflow 2.0aa with myrinet, it get what looks like a data
> corruption error and the program dies quickly.
> There are no mpi errors at all.If I run using GIGE (--mca btl self,tcp),
> the program runs to competion correctly.  Here is my ompi_info output :
> 
> bsb3227@mahler:~/openmpi_1.1/bin> ./ompi_info
> Open MPI: 1.1
>Open MPI SVN revision: r10477
> Open RTE: 1.1
>Open RTE SVN revision: r10477
> OPAL: 1.1
>OPAL SVN revision: r10477
>   Prefix: /home/bsb3227/openmpi_1.1
>  Configured architecture: x86_64-unknown-linux-gnu
>Configured by: bsb3227
>Configured on: Fri Jun 30 07:08:54 PDT 2006
>   Configure host: mahler
> Built by: bsb3227
> Built on: Fri Jun 30 07:54:46 PDT 2006
>   Built host: mahler
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>   Fortran90 bindings: yes
>  Fortran90 bindings size: small
>   C compiler: icc
>  C compiler absolute: /opt/intel/cce/9.0.25/bin/icc
> C++ compiler: icpc
>C++ compiler absolute: /opt/intel/cce/9.0.25/bin/icpc
>   Fortran77 compiler: ifort
>   Fortran77 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
>   Fortran90 compiler: /opt/intel/fce/9.0.25/bin/ifort
>   Fortran90 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: yes
>  Fortran90 profiling: yes
>   C++ exceptions: no
>   Thread support: posix (mpi: no, progress: no)
>   Internal debug support: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1)
>MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
>MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
>MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1)
>MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
>MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
>   MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
>MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
>MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1)
>  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
>  MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
>   MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
>  MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
>  MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
>  MCA btl: gm (MCA v1.0, API v1.0, Component v1.1)
>  MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
>  MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
>  MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
>  MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
>  MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
>  MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
>  MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
>   MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
>   MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
>  MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>  MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)
>  MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)
>  MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)
>  

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-11 Thread Galen M. Shipman

Hey Justin,

Any further details?

Thanks,

Galen


On Jul 8, 2006, at 9:10 AM, Justin Bronder wrote:

1.)  Compiling without XL will take a little while, but I have the  
setup

for the
other questions ready now.  I figured I'd answer them right away.

2.)  TCP works fine, and is quite quick compared to mpich-1.2.7p1  
by the

way.
I just reverified this.
WR11C2R45000   160 1 2  10.10   
8.253e+00
||Ax-b||_oo / ( eps * ||A||_1  * N) = 
0.0412956 .. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 
0.0272613 .. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 
0.0053214 .. PASSED



3.)  Exactly same setup, using mpichgm-1.2.6..14b
WR11C2R45000   160 1 2  10.43   
7.994e+00
-- 
--
||Ax-b||_oo / ( eps * ||A||_1  * N) = 
0.0353693 .. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 
0.0233491 .. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 
0.0045577 .. PASSED


It also worked with mpichgm-1.2.6..15  (I believe this is the  
version, I

don't have
a node up with it at the moment).

Obviously mpich-1.2.7p1 works as well over ethernet.


Anyways, I'll begin the build with the standard gcc compilers that are
included
with OS X.  This is powerpc-apple-darwin8-gcc-4.0.1.

Thanks,

Justin.

Jeff Squyres (jsquyres) wrote:

Justin --

Can we eliminate some variables so that we can figure out where the
error is originating?

- Can you try compiling without the XL compilers?
- Can you try running with just TCP (and not Myrinet)?
- With the same support library installation (such as BLAS, etc.,
assumedly also compiled with XL), can you try another MPI (e.g., LAM,
MPICH-gm, whatever)?

Let us know what you find.  Thanks!


 
- 
---

*From:* users-boun...@open-mpi.org
[mailto:users-boun...@open-mpi.org] *On Behalf Of *Justin Bronder
*Sent:* Thursday, July 06, 2006 3:16 PM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] Problem with Openmpi 1.1

With 1.0.3a1r10670 the same problem is occuring.  Again the same
configure arguments
as before.  For clarity, the Myrinet drive we are using is 2.0.21

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$ gm_board_info
GM build ID is "2.0.21_MacOSX_rc20050429075134PDT
r...@node96.meldrew.clusters.umaine.edu:/usr/src/gm-2.0.21_MacOSX
Fri Jun 16 14:39:45 EDT 2006."

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun -np 2 xhpl
This succeeds.
||Ax-b||_oo / ( eps * ||A||_1  * N) =0.1196787
.. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0283195
.. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0063300
.. PASSED

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun -mca btl gm -np 2 xhpl
This fails.
||Ax-b||_oo / ( eps * ||A||_1  * N) =
717370209518881444284334080.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =  
226686309135.4274597

.. FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 2386641249.6518722
.. FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
2037398812542965504.00
||A||_oo . . . . . . . . . . . . . . . . . . . = 
2561.554752
||A||_1  . . . . . . . . . . . . . . . . . . . = 
2558.129237

||x||_oo . . . . . . . . . . . . . . . . . . . =
300175355203841216.00
||x||_1  . . . . . . . . . . . . . . . . . . . =
31645943341479366656.00

Does anyone have a working system with OS X and Myrinet (GM)?  If
so, I'd love to hear
the configure arguments and various versions you are using.   
Bonus

points if you are
using the IBM XL compilers.

Thanks,
Justin.


On 7/6/06, *Justin Bronder* <jsbron...@gmail.com
<mailto:jsbron...@gmail.com>> wrote:

Yes, that output was actually cut and pasted from an OS X
run.  I'm about to test
against 1.0.3a1r10670.

Justin.

On 7/6/06, *Galen M. Shipman* < gship...@lanl.gov
<mailto:gship...@lanl.gov>> wrote:

Justin,

Is the OS X run showing the same residual failure?

- Galen

On Jul 6, 2006, at 10:49 AM, Justin Bronder wrote:

Disregard the failure on Linux, a rebuild from scratch of
HPL and OpenMPI
seems to have resolved the issue.  At least I'm not
getting the errors during
the residual checks.

However, this is persisting under OS X.

Thanks,
Justin.

On 7/6/06, *Justin Bronder* < jsbron...@gmail.com
  

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-08 Thread Justin Bronder
1.)  Compiling without XL will take a little while, but I have the setup
for the
other questions ready now.  I figured I'd answer them right away.

2.)  TCP works fine, and is quite quick compared to mpich-1.2.7p1 by the
way.
I just reverified this.
WR11C2R45000   160 1 2  10.10  8.253e+00
||Ax-b||_oo / ( eps * ||A||_1  * N) =0.0412956 .. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0272613 .. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0053214 .. PASSED


3.)  Exactly same setup, using mpichgm-1.2.6..14b
WR11C2R45000   160 1 2  10.43  7.994e+00

||Ax-b||_oo / ( eps * ||A||_1  * N) =0.0353693 .. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0233491 .. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0045577 .. PASSED

It also worked with mpichgm-1.2.6..15  (I believe this is the version, I
don't have
a node up with it at the moment).

Obviously mpich-1.2.7p1 works as well over ethernet.


Anyways, I'll begin the build with the standard gcc compilers that are
included
with OS X.  This is powerpc-apple-darwin8-gcc-4.0.1.

Thanks,

Justin.

Jeff Squyres (jsquyres) wrote:
> Justin --
>  
> Can we eliminate some variables so that we can figure out where the
> error is originating?
>  
> - Can you try compiling without the XL compilers?
> - Can you try running with just TCP (and not Myrinet)?
> - With the same support library installation (such as BLAS, etc.,
> assumedly also compiled with XL), can you try another MPI (e.g., LAM,
> MPICH-gm, whatever)?
>
> Let us know what you find.  Thanks!
>  
>
> 
> *From:* users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] *On Behalf Of *Justin Bronder
> *Sent:* Thursday, July 06, 2006 3:16 PM
>     *To:* Open MPI Users
> *Subject:* Re: [OMPI users] Problem with Openmpi 1.1
>
> With 1.0.3a1r10670 the same problem is occuring.  Again the same
> configure arguments
> as before.  For clarity, the Myrinet drive we are using is 2.0.21
>
> node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$ gm_board_info
> GM build ID is "2.0.21_MacOSX_rc20050429075134PDT
> r...@node96.meldrew.clusters.umaine.edu:/usr/src/gm-2.0.21_MacOSX
> Fri Jun 16 14:39:45 EDT 2006."
>
> node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
> /usr/local/ompi-xl-1.0.3/bin/mpirun -np 2 xhpl
> This succeeds.
> ||Ax-b||_oo / ( eps * ||A||_1  * N) =0.1196787
> .. PASSED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0283195
> .. PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0063300
> .. PASSED
>
> node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
> /usr/local/ompi-xl-1.0.3/bin/mpirun -mca btl gm -np 2 xhpl
> This fails.
> ||Ax-b||_oo / ( eps * ||A||_1  * N) =
> 717370209518881444284334080.000 .. FAILED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 226686309135.4274597
> .. FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 2386641249.6518722
> .. FAILED
> ||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
> 2037398812542965504.00
> ||A||_oo . . . . . . . . . . . . . . . . . . . =2561.554752
> ||A||_1  . . . . . . . . . . . . . . . . . . . =2558.129237
> ||x||_oo . . . . . . . . . . . . . . . . . . . =
> 300175355203841216.00
> ||x||_1  . . . . . . . . . . . . . . . . . . . =
> 31645943341479366656.00
>
> Does anyone have a working system with OS X and Myrinet (GM)?  If
> so, I'd love to hear
> the configure arguments and various versions you are using.  Bonus
> points if you are
> using the IBM XL compilers.
>
> Thanks,
> Justin.
>
>
> On 7/6/06, *Justin Bronder* <jsbron...@gmail.com
> <mailto:jsbron...@gmail.com>> wrote:
>
> Yes, that output was actually cut and pasted from an OS X
> run.  I'm about to test
> against 1.0.3a1r10670.
>
> Justin.
>
> On 7/6/06, *Galen M. Shipman* < gship...@lanl.gov
> <mailto:gship...@lanl.gov>> wrote:
>
> Justin, 
>
> Is the OS X run showing the same residual failure?
>
> - Galen 
>
> On Jul 6, 2006, at 10:49 AM, Justin Bronder wrote:
>
> Disregard the failure on Linux, a rebuild from scratch of
> HPL and OpenMPI
>

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-08 Thread Jeff Squyres (jsquyres)
Justin --
 
Can we eliminate some variables so that we can figure out where the
error is originating?
 
- Can you try compiling without the XL compilers?
- Can you try running with just TCP (and not Myrinet)?
- With the same support library installation (such as BLAS, etc.,
assumedly also compiled with XL), can you try another MPI (e.g., LAM,
MPICH-gm, whatever)?

Let us know what you find.  Thanks!
 



From: users-boun...@open-mpi.org
[mailto:users-boun...@open-mpi.org] On Behalf Of Justin Bronder
Sent: Thursday, July 06, 2006 3:16 PM
To: Open MPI Users
Subject: Re: [OMPI users] Problem with Openmpi 1.1


With 1.0.3a1r10670 the same problem is occuring.  Again the same
configure arguments
as before.  For clarity, the Myrinet drive we are using is
2.0.21

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$ gm_board_info 
GM build ID is "2.0.21_MacOSX_rc20050429075134PDT
r...@node96.meldrew.clusters.umaine.edu:/usr/src/gm-2.0.21_MacOSX Fri
Jun 16 14:39:45 EDT 2006."

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun -np 2 xhpl
This succeeds.
||Ax-b||_oo / ( eps * ||A||_1  * N) =0.1196787
.. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0283195
.. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0063300
.. PASSED

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun -mca btl gm -np 2 xhpl 
This fails.
||Ax-b||_oo / ( eps * ||A||_1  * N) =
717370209518881444284334080.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =
226686309135.4274597 .. FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 2386641249.6518722
.. FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
2037398812542965504.00
||A||_oo . . . . . . . . . . . . . . . . . . . =
2561.554752
||A||_1  . . . . . . . . . . . . . . . . . . . =
2558.129237
||x||_oo . . . . . . . . . . . . . . . . . . . =
300175355203841216.00
||x||_1  . . . . . . . . . . . . . . . . . . . =
31645943341479366656.00

Does anyone have a working system with OS X and Myrinet (GM)?
If so, I'd love to hear
the configure arguments and various versions you are using.
Bonus points if you are
using the IBM XL compilers.

Thanks,
Justin.



On 7/6/06, Justin Bronder <jsbron...@gmail.com> wrote: 

Yes, that output was actually cut and pasted from an OS
X run.  I'm about to test
against 1.0.3a1r10670.

Justin.


On 7/6/06, Galen M. Shipman < gship...@lanl.gov
<mailto:gship...@lanl.gov> > wrote:


Justin,  

Is the OS X run showing the same residual
failure?


- Galen 



On Jul 6, 2006, at 10:49 AM, Justin Bronder
wrote:


Disregard the failure on Linux, a rebuild from
scratch of HPL and OpenMPI
seems to have resolved the issue.  At least I'm
not getting the errors during 
the residual checks.

However, this is persisting under OS X.

Thanks,
Justin.


On 7/6/06, Justin Bronder < jsbron...@gmail.com
<mailto:jsbron...@gmail.com> > wrote:

For OS X:
/usr/local/ompi-xl/bin/mpirun -mca btl
gm -np 4 ./xhpl 

For Linux:
ARCH=ompi-gnu-1.1.1a
/usr/local/$ARCH/bin/mpiexec -mca btl gm
-np 2 -path /usr/local/$ARCH/bin ./xhpl

Thanks for the speedy response,
Justin.


On 7/6/06, Galen M. Shipman <
gship...@lanl.gov <mailto:gship...@lanl.gov> > wrote:



Hey Justin,  

Please provide us your mca parameters
(if any), these could be in a config file, environment variables or on
the command line. 

Thanks, 
   

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder

With 1.0.3a1r10670 the same problem is occuring.  Again the same configure
arguments
as before.  For clarity, the Myrinet drive we are using is 2.0.21

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$ gm_board_info
GM build ID is "2.0.21_MacOSX_rc20050429075134PDT
r...@node96.meldrew.clusters.umaine.edu:/usr/src/gm-2.0.21_MacOSX Fri Jun 16
14:39:45 EDT 2006."

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun
-np 2 xhpl
This succeeds.
||Ax-b||_oo / ( eps * ||A||_1  * N) =0.1196787 .. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0283195 .. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0063300 .. PASSED

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun
-mca btl gm -np 2 xhpl
This fails.
||Ax-b||_oo / ( eps * ||A||_1  * N) =
717370209518881444284334080.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 226686309135.4274597 ..
FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 2386641249.6518722 ..
FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . = 2037398812542965504.00
||A||_oo . . . . . . . . . . . . . . . . . . . =2561.554752
||A||_1  . . . . . . . . . . . . . . . . . . . =2558.129237
||x||_oo . . . . . . . . . . . . . . . . . . . = 300175355203841216.00
||x||_1  . . . . . . . . . . . . . . . . . . . = 31645943341479366656.00

Does anyone have a working system with OS X and Myrinet (GM)?  If so, I'd
love to hear
the configure arguments and various versions you are using.  Bonus points if
you are
using the IBM XL compilers.

Thanks,
Justin.


On 7/6/06, Justin Bronder  wrote:


Yes, that output was actually cut and pasted from an OS X run.  I'm about
to test
against 1.0.3a1r10670.

Justin.

On 7/6/06, Galen M. Shipman  wrote:

> Justin,
> Is the OS X run showing the same residual failure?
>
> - Galen
>
> On Jul 6, 2006, at 10:49 AM, Justin Bronder wrote:
>
> Disregard the failure on Linux, a rebuild from scratch of HPL and
> OpenMPI
> seems to have resolved the issue.  At least I'm not getting the errors
> during
> the residual checks.
>
> However, this is persisting under OS X.
>
> Thanks,
> Justin.
>
> On 7/6/06, Justin Bronder < jsbron...@gmail.com> wrote:
>
> > For OS X:
> > /usr/local/ompi-xl/bin/mpirun -mca btl gm -np 4 ./xhpl
> >
> > For Linux:
> > ARCH=ompi-gnu-1.1.1a
> > /usr/local/$ARCH/bin/mpiexec -mca btl gm -np 2 -path
> > /usr/local/$ARCH/bin ./xhpl
> >
> > Thanks for the speedy response,
> > Justin.
> >
> > On 7/6/06, Galen M. Shipman < gship...@lanl.gov> wrote:
> >
> > > Hey Justin,
> > Please provide us your mca parameters (if any), these could be in a
> > config file, environment variables or on the command line.
> >
> > Thanks,
> >
> > Galen
> >
> >  On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:
> >
> > As far as the nightly builds go, I'm still seeing what I believe to be
> >
> > this problem in both r10670 and r10652.  This is happening with
> > both Linux and OS X.  Below are the systems and ompi_info for the
> > newest revision 10670.
> >
> > As an example of the error, when running HPL with Myrinet I get the
> > following error.  Using tcp everything is fine and I see the results
> > I'd
> > expect.
> >
> > 
> > ||Ax-b||_oo / ( eps * ||A||_1  * N) =
> > 42820214496954887558164928727596662784.000 .. FAILED
> > ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 156556068835.2711182.. 
FAILED
> > ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558.. 
FAILED
> > ||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
> > 272683853978565028754868928512.00
> > ||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
> > ||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
> > ||x||_oo . . . . . . . . . . . . . . . . . . . =
> > 37037692483529688659798261760.00
> > ||x||_1  . . . . . . . . . . . . . . . . . . . =
> > 4102704048669982798475494948864.00
> > ===
> >
> > Finished  1 tests with the following results:
> >   0 tests completed and passed residual checks,
> >   1 tests completed and failed residual checks,
> >   0 tests skipped because of illegal input values.
> >
> > 
> >
> > Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64
> > PPC970FX, altivec supported GNU/Linux
> > jbronder@node41 ~ $ /usr/local/ompi- gnu-1.1.1a/bin/ompi_info
> > Open MPI: 1.1.1a1r10670
> >Open MPI SVN revision: r10670
> > Open RTE: 1.1.1a1r10670
> >Open RTE SVN revision: r10670
> > OPAL: 1.1.1a1r10670
> >OPAL SVN revision: r10670
> >   Prefix: /usr/local/ompi-gnu-1.1.1a
> >  Configured 

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Galen M. Shipman

Justin,

Is the OS X run showing the same residual failure?

- Galen

On Jul 6, 2006, at 10:49 AM, Justin Bronder wrote:

Disregard the failure on Linux, a rebuild from scratch of HPL and  
OpenMPI
seems to have resolved the issue.  At least I'm not getting the  
errors during

the residual checks.

However, this is persisting under OS X.

Thanks,
Justin.

On 7/6/06, Justin Bronder  wrote:
For OS X:
/usr/local/ompi-xl/bin/mpirun -mca btl gm -np 4 ./xhpl

For Linux:
ARCH=ompi-gnu-1.1.1a
/usr/local/$ARCH/bin/mpiexec -mca btl gm -np 2 -path /usr/local/ 
$ARCH/bin ./xhpl


Thanks for the speedy response,
Justin.

On 7/6/06, Galen M. Shipman < gship...@lanl.gov> wrote:
Hey Justin,

Please provide us your mca parameters (if any), these could be in a  
config file, environment variables or on the command line.


Thanks,

Galen

On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:

As far as the nightly builds go, I'm still seeing what I believe to be
this problem in both r10670 and r10652.  This is happening with
both Linux and OS X.  Below are the systems and ompi_info for the
newest revision 10670.

As an example of the error, when running HPL with Myrinet I get the
following error.  Using tcp everything is fine and I see the  
results I'd

expect.
-- 
--
||Ax-b||_oo / ( eps * ||A||_1  * N) =  
42820214496954887558164928727596662784.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =  
156556068835.2711182 .. FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =  
1156439380.5172558 .. FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =  
272683853978565028754868928512.00

||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
||x||_oo . . . . . . . . . . . . . . . . . . . =  
37037692483529688659798261760.00
||x||_1  . . . . . . . . . . . . . . . . . . . =  
4102704048669982798475494948864.00

===

Finished  1 tests with the following results:
  0 tests completed and passed residual checks,
  1 tests completed and failed residual checks,
  0 tests skipped because of illegal input values.
-- 
--


Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64  
PPC970FX, altivec supported GNU/Linux

jbronder@node41 ~ $ /usr/local/ompi- gnu-1.1.1a/bin/ompi_info
Open MPI: 1.1.1a1r10670
   Open MPI SVN revision: r10670
Open RTE: 1.1.1a1r10670
   Open RTE SVN revision: r10670
OPAL: 1.1.1a1r10670
   OPAL SVN revision: r10670
  Prefix: /usr/local/ompi-gnu-1.1.1a
 Configured architecture: powerpc64-unknown-linux-gnu
   Configured by: root
   Configured on: Thu Jul  6 10:15:37 EDT 2006
  Configure host: node41
Built by: root
Built on: Thu Jul  6 10:28:14 EDT 2006
  Built host: node41
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 
4.1.0/gfortran

  Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 
4.1.0/gfortran

 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component  
v1.1.1)

   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component  
v1.1.1)

   MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Componentv1.1.1)

   MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
   MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
   MCA mpool: sm (MCA v1.0, 

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder

Disregard the failure on Linux, a rebuild from scratch of HPL and OpenMPI
seems to have resolved the issue.  At least I'm not getting the errors
during
the residual checks.

However, this is persisting under OS X.

Thanks,
Justin.

On 7/6/06, Justin Bronder  wrote:


For OS X:
/usr/local/ompi-xl/bin/mpirun -mca btl gm -np 4 ./xhpl

For Linux:
ARCH=ompi-gnu-1.1.1a
/usr/local/$ARCH/bin/mpiexec -mca btl gm -np 2 -path /usr/local/$ARCH/bin
./xhpl

Thanks for the speedy response,
Justin.

On 7/6/06, Galen M. Shipman  wrote:

> Hey Justin,
> Please provide us your mca parameters (if any), these could be in a
> config file, environment variables or on the command line.
>
> Thanks,
>
> Galen
>
> On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:
>
> As far as the nightly builds go, I'm still seeing what I believe to be
> this problem in both r10670 and r10652.  This is happening with
> both Linux and OS X.  Below are the systems and ompi_info for the
> newest revision 10670.
>
> As an example of the error, when running HPL with Myrinet I get the
> following error.  Using tcp everything is fine and I see the results I'd
>
> expect.
>
> 
> ||Ax-b||_oo / ( eps * ||A||_1  * N) =
> 42820214496954887558164928727596662784.000 .. FAILED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 156556068835.2711182.. 
FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558 ..
> FAILED
> ||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
> 272683853978565028754868928512.00
> ||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
> ||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
> ||x||_oo . . . . . . . . . . . . . . . . . . . =
> 37037692483529688659798261760.00
> ||x||_1  . . . . . . . . . . . . . . . . . . . =
> 4102704048669982798475494948864.00
> ===
>
> Finished  1 tests with the following results:
>   0 tests completed and passed residual checks,
>   1 tests completed and failed residual checks,
>   0 tests skipped because of illegal input values.
>
> 
>
> Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64
> PPC970FX, altivec supported GNU/Linux
> jbronder@node41 ~ $ /usr/local/ompi- gnu-1.1.1a/bin/ompi_info
> Open MPI: 1.1.1a1r10670
>Open MPI SVN revision: r10670
> Open RTE: 1.1.1a1r10670
>Open RTE SVN revision: r10670
> OPAL: 1.1.1a1r10670
>OPAL SVN revision: r10670
>   Prefix: /usr/local/ompi-gnu-1.1.1a
>  Configured architecture: powerpc64-unknown-linux-gnu
>Configured by: root
>Configured on: Thu Jul  6 10:15:37 EDT 2006
>   Configure host: node41
> Built by: root
> Built on: Thu Jul  6 10:28:14 EDT 2006
>   Built host: node41
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>   Fortran90 bindings: yes
>  Fortran90 bindings size: small
>   C compiler: gcc
>  C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
>C++ compiler absolute: /usr/bin/g++
>   Fortran77 compiler: gfortran
>   Fortran77 compiler abs:
> /usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
>   Fortran90 compiler: gfortran
>   Fortran90 compiler abs:
> /usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: yes
>  Fortran90 profiling: yes
>   C++ exceptions: no
>   Thread support: posix (mpi: no, progress: no)
>   Internal debug support: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.1.1)
>MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
>MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.1.1)
>MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
>MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Componentv1.1.1)
>
>MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
>   MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
>MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
>   

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Galen M. Shipman

Hey Justin,

Please provide us your mca parameters (if any), these could be in a  
config file, environment variables or on the command line.


Thanks,

Galen

On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:


As far as the nightly builds go, I'm still seeing what I believe to be
this problem in both r10670 and r10652.  This is happening with
both Linux and OS X.  Below are the systems and ompi_info for the
newest revision 10670.

As an example of the error, when running HPL with Myrinet I get the
following error.  Using tcp everything is fine and I see the  
results I'd

expect.
-- 
--
||Ax-b||_oo / ( eps * ||A||_1  * N) =  
42820214496954887558164928727596662784.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =  
156556068835.2711182 .. FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =  
1156439380.5172558 .. FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =  
272683853978565028754868928512.00

||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
||x||_oo . . . . . . . . . . . . . . . . . . . =  
37037692483529688659798261760.00
||x||_1  . . . . . . . . . . . . . . . . . . . =  
4102704048669982798475494948864.00

===

Finished  1 tests with the following results:
  0 tests completed and passed residual checks,
  1 tests completed and failed residual checks,
  0 tests skipped because of illegal input values.
-- 
--


Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64  
PPC970FX, altivec supported GNU/Linux

jbronder@node41 ~ $ /usr/local/ompi-gnu-1.1.1a/bin/ompi_info
Open MPI: 1.1.1a1r10670
   Open MPI SVN revision: r10670
Open RTE: 1.1.1a1r10670
   Open RTE SVN revision: r10670
OPAL: 1.1.1a1r10670
   OPAL SVN revision: r10670
  Prefix: /usr/local/ompi-gnu-1.1.1a
 Configured architecture: powerpc64-unknown-linux-gnu
   Configured by: root
   Configured on: Thu Jul  6 10:15:37 EDT 2006
  Configure host: node41
Built by: root
Built on: Thu Jul  6 10:28:14 EDT 2006
  Built host: node41
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 
4.1.0/gfortran

  Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 
4.1.0/gfortran

 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component  
v1.1.1)

   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component  
v1.1.1)

   MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
   MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
  MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
 MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
 MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: null (MCA v1.0, API 

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder

As far as the nightly builds go, I'm still seeing what I believe to be
this problem in both r10670 and r10652.  This is happening with
both Linux and OS X.  Below are the systems and ompi_info for the
newest revision 10670.

As an example of the error, when running HPL with Myrinet I get the
following error.  Using tcp everything is fine and I see the results I'd
expect.

||Ax-b||_oo / ( eps * ||A||_1  * N) =
42820214496954887558164928727596662784.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 156556068835.2711182 ..
FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558 ..
FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
272683853978565028754868928512.00
||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
||x||_oo . . . . . . . . . . . . . . . . . . . =
37037692483529688659798261760.00
||x||_1  . . . . . . . . . . . . . . . . . . . =
4102704048669982798475494948864.00
===

Finished  1 tests with the following results:
 0 tests completed and passed residual checks,
 1 tests completed and failed residual checks,
 0 tests skipped because of illegal input values.


Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64 PPC970FX,
altivec supported GNU/Linux
jbronder@node41 ~ $ /usr/local/ompi-gnu-1.1.1a/bin/ompi_info
   Open MPI: 1.1.1a1r10670
  Open MPI SVN revision: r10670
   Open RTE: 1.1.1a1r10670
  Open RTE SVN revision: r10670
   OPAL: 1.1.1a1r10670
  OPAL SVN revision: r10670
 Prefix: /usr/local/ompi-gnu-1.1.1a
Configured architecture: powerpc64-unknown-linux-gnu
  Configured by: root
  Configured on: Thu Jul  6 10:15:37 EDT 2006
 Configure host: node41
   Built by: root
   Built on: Thu Jul  6 10:28:14 EDT 2006
 Built host: node41
 C bindings: yes
   C++ bindings: yes
 Fortran77 bindings: yes (all)
 Fortran90 bindings: yes
Fortran90 bindings size: small
 C compiler: gcc
C compiler absolute: /usr/bin/gcc
   C++ compiler: g++
  C++ compiler absolute: /usr/bin/g++
 Fortran77 compiler: gfortran
 Fortran77 compiler abs:
/usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
 Fortran90 compiler: gfortran
 Fortran90 compiler abs:
/usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
C profiling: yes
  C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
 C++ exceptions: no
 Thread support: posix (mpi: no, progress: no)
 Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
 MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.1)
  MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
  MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.1)
  MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
  MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
  MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
   MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
  MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
 MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
  MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
  MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
 MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
   MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1)
MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.1)
MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1)