Re: [OMPI users] flex.exe

2010-01-21 Thread Jeff Squyres
Don't we ship the flex-generated code in the tarball anyway?  If so, why do we 
ship flex.exe?

On Jan 21, 2010, at 12:14 PM, Barrett, Brian W wrote:

> I have to agree with the two requests here. Having either a windows tarball 
> or a windows build tools tarball doesn't seem too burdensom, and could even 
> be done automatically at make dist time.
> 
> Brian
> 
> 
> - Original Message -
> From: users-boun...@open-mpi.org 
> To: us...@open-mpi.org 
> Sent: Thu Jan 21 10:05:03 2010
> Subject: Re: [OMPI users] flex.exe
> 
> Am Donnerstag, den 21.01.2010, 11:52 -0500 schrieb Michael Di Domenico:
> > openmpi-1.4.1/contrib/platform/win32/bin/flex.exe
> >
> > I understand this file might be required for building on windows,
> > since I'm not I can just delete the file without issue.
> >
> > However, for those of us under import restrictions, where binaries are
> > not allowed in, this file causes me to open the tarball and delete the
> > file (not a big deal, i know, i know).
> >
> > But, can I put up a vote for a pure source only tree?
> 
> I'm very much in favor of that since we can't ship this binary in
> Debian. We'd have to delete it from the tarball and repack it with every
> release which is quite cumbersome. If these tools could be shipped in a
> separate tarball that would be great!
> 
> Best regards
> Manuel
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


-- 
Jeff Squyres
jsquy...@cisco.com




Re: [OMPI users] ScaLAPACK and OpenMPI > 1.3.1

2010-01-21 Thread Martin Siegert
Hi Nathan,

On Thu, Jan 21, 2010 at 02:48:51PM -0600, Champagne, Nathan J. (JSC-EV)[Jacobs T
echnology] wrote:
>
>We started having a problem with OpenMPI beginning with version 1.3.2
>where the program output can be correct, junk, or NaNs (result is not
>predictable). The output is the solution of a matrix equation solved by
>ScaLAPACK. We are using the Intel Fortran compiler (version 11.1) and
>the GNU compiler (version 4.1.2) on Gentoo Linux. So far, the problem
>manifests itself for a matrix (N X N) with N ~ 10,000 or more with a
>processor count ~ 64 or more. Note that the problem still occurs using
>OpenMPI 1.4.1.
>
>
>We build the ScaLAPACK and BLACS libraries locally and use the LAPACK
>and BLAS libraries supplied by Intel.
>
>
>We wrote a test program to demonstrate the problem. The matrix is built
>on each processor (no communication). Then, the matrix is factored and
>solved. The solution vector is collected from the processors and
>printed to a file by the master processor. The program and associated
>OpenMPI information (ompi_info --all) are available at:
>
>
>http://www.em-stuff.com/files/files.tar.gz
>
>
>The file "compile" in the "test" directory is used to create the
>executable. Edit it to reflect libraries on your local machine. Data
>created using OpenMPI 1.3.1 and 1.4.1 are in the "output" directory for
>reference.

For what it is worth:
I compiled and ran your code using 64 processors. 

# diff -u output/sol_1.3.1_96.txt test/mkl/solution_vector.txt
--- output/sol_1.3.1_96.txt 2010-01-20 06:46:41.0 -0800
+++ test/mkl/solution_vector.txt2010-01-21 14:41:59.0 -0800
@@ -4786,7 +4786,7 @@
4785 -0.3914681E+00   0.1178753E-03
4786 -0.3913341E+00   0.7695833E-04
4787 -0.3912001E+00   0.3607245E-04
-   4788 -0.3910662E+00  -0.4782369E-05
+   4788 -0.3910662E+00  -0.4782368E-05
4789 -0.3909323E+00  -0.4560614E-04
4790 -0.3907985E+00  -0.8639889E-04
4791 -0.3906647E+00  -0.1271607E-03

In other words: I do not see a problem.

This is with openmpi-1.3.3, scalapack-1.8.0, mpiblacs-1.1p3,
ifort-11.1.038, mkl-10.2.0.013.

Cheers,
Martin

--
Martin Siegert
Head, Research Computing
WestGrid Site Lead
IT Servicesphone: 778 782-4691
Simon Fraser Universityfax:   778 782-4242
Burnaby, British Columbia  email: sieg...@sfu.ca
Canada  V5A 1S6


Re: [OMPI users] ScaLAPACK and OpenMPI > 1.3.1

2010-01-21 Thread Champagne, Nathan J. (JSC-EV)[Jacobs Technology]
First off, thanks for your efforts to help.

>In that case i wonder what version of scalapack/blacs you are using?
We are using ScaLAPACK 1.8.0 with BLACS v1.1 (with patch03).




Re: [OMPI users] ScaLAPACK and OpenMPI > 1.3.1

2010-01-21 Thread Åke Sandgren
On Thu, 2010-01-21 at 15:40 -0600, Champagne, Nathan J. (JSC-EV)[Jacobs
Technology] wrote:
> >What is a correct result then?
> 
> The correct results are output by v1.3.1. The filename in the archive is 
> "sol_1.3.1_96.txt".
> 
> >How often do you get junk or NaNs compared to correct result.
> We haven't been able to quantify it. It almost seems random; similar to using 
> a variable that's unintialized, expecting its initial value to be zero when 
> it may not be.

In that case i wonder what version of scalapack/blacs you are using?

I have run a bunch of tests with openmpi 1.3.3 and 1.4 all yield the
correct result.
Using Intel 10.1 with lapack 3.1.1 built my me + gotoblas, and also
tried mkl.

Also tried with Pathscale 3.2 with lapack 3.1.1/gotoblas still ok

I tried running with 128 cores too but still the same result (except one
small round-off difference)

I know that scalapack versions prior to 1.8.0 had a couple of bugs with
uninitalized vars.



Re: [OMPI users] ScaLAPACK and OpenMPI > 1.3.1

2010-01-21 Thread Champagne, Nathan J. (JSC-EV)[Jacobs Technology]
>What is a correct result then?

The correct results are output by v1.3.1. The filename in the archive is 
"sol_1.3.1_96.txt".

>How often do you get junk or NaNs compared to correct result.
We haven't been able to quantify it. It almost seems random; similar to using a 
variable that's unintialized, expecting its initial value to be zero when it 
may not be.

Nathan

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Åke Sandgren
Sent: Thursday, January 21, 2010 3:23 PM
To: Open MPI Users
Subject: Re: [OMPI users] ScaLAPACK and OpenMPI > 1.3.1

On Thu, 2010-01-21 at 14:48 -0600, Champagne, Nathan J. (JSC-EV)[Jacobs
Technology] wrote:
> We started having a problem with OpenMPI beginning with version 1.3.2
> where the program output can be correct, junk, or NaNs (result is not
> predictable). The output is the solution of a matrix equation solved
> by ScaLAPACK. We are using the Intel Fortran compiler (version 11.1)
> and the GNU compiler (version 4.1.2) on Gentoo Linux. So far, the
> problem manifests itself for a matrix (N X N) with N ~ 10,000 or more
> with a processor count ~ 64 or more. Note that the problem still
> occurs using OpenMPI 1.4.1.
> 
>  
> 
> We build the ScaLAPACK and BLACS libraries locally and use the LAPACK
> and BLAS libraries supplied by Intel.
> 
>  
> 
> We wrote a test program to demonstrate the problem. The matrix is
> built on each processor (no communication). Then, the matrix is
> factored and solved. The solution vector is collected from the
> processors and printed to a file by the master processor. The program
> and associated OpenMPI information (ompi_info --all) are available at:
> 
>  
> 
> http://www.em-stuff.com/files/files.tar.gz
> 
>  
> 
> The file "compile" in the "test" directory is used to create the
> executable. Edit it to reflect libraries on your local machine. Data
> created using OpenMPI 1.3.1 and 1.4.1 are in the "output" directory
> for reference.

What is a correct result then?
Hard to test without knowing.

How often do you get junk or NaNs compared to correct result.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] ScaLAPACK and OpenMPI > 1.3.1

2010-01-21 Thread Åke Sandgren
On Thu, 2010-01-21 at 14:48 -0600, Champagne, Nathan J. (JSC-EV)[Jacobs
Technology] wrote:
> We started having a problem with OpenMPI beginning with version 1.3.2
> where the program output can be correct, junk, or NaNs (result is not
> predictable). The output is the solution of a matrix equation solved
> by ScaLAPACK. We are using the Intel Fortran compiler (version 11.1)
> and the GNU compiler (version 4.1.2) on Gentoo Linux. So far, the
> problem manifests itself for a matrix (N X N) with N ~ 10,000 or more
> with a processor count ~ 64 or more. Note that the problem still
> occurs using OpenMPI 1.4.1.
> 
>  
> 
> We build the ScaLAPACK and BLACS libraries locally and use the LAPACK
> and BLAS libraries supplied by Intel.
> 
>  
> 
> We wrote a test program to demonstrate the problem. The matrix is
> built on each processor (no communication). Then, the matrix is
> factored and solved. The solution vector is collected from the
> processors and printed to a file by the master processor. The program
> and associated OpenMPI information (ompi_info --all) are available at:
> 
>  
> 
> http://www.em-stuff.com/files/files.tar.gz
> 
>  
> 
> The file "compile" in the "test" directory is used to create the
> executable. Edit it to reflect libraries on your local machine. Data
> created using OpenMPI 1.3.1 and 1.4.1 are in the "output" directory
> for reference.

What is a correct result then?
Hard to test without knowing.

How often do you get junk or NaNs compared to correct result.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90 7866126
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se



[OMPI users] ScaLAPACK and OpenMPI > 1.3.1

2010-01-21 Thread Champagne, Nathan J. (JSC-EV)[Jacobs Technology]
We started having a problem with OpenMPI beginning with version 1.3.2 where the 
program output can be correct, junk, or NaNs (result is not predictable). The 
output is the solution of a matrix equation solved by ScaLAPACK. We are using 
the Intel Fortran compiler (version 11.1) and the GNU compiler (version 4.1.2) 
on Gentoo Linux. So far, the problem manifests itself for a matrix (N X N) with 
N ~ 10,000 or more with a processor count ~ 64 or more. Note that the problem 
still occurs using OpenMPI 1.4.1.

We build the ScaLAPACK and BLACS libraries locally and use the LAPACK and BLAS 
libraries supplied by Intel.

We wrote a test program to demonstrate the problem. The matrix is built on each 
processor (no communication). Then, the matrix is factored and solved. The 
solution vector is collected from the processors and printed to a file by the 
master processor. The program and associated OpenMPI information (ompi_info 
--all) are available at:

http://www.em-stuff.com/files/files.tar.gz

The file "compile" in the "test" directory is used to create the executable. 
Edit it to reflect libraries on your local machine. Data created using OpenMPI 
1.3.1 and 1.4.1 are in the "output" directory for reference.

We appreciate any help.

Thanks,
Nathan



Re: [OMPI users] flex.exe

2010-01-21 Thread Barrett, Brian W
I have to agree with the two requests here. Having either a windows tarball or 
a windows build tools tarball doesn't seem too burdensom, and could even be 
done automatically at make dist time.

Brian


- Original Message -
From: users-boun...@open-mpi.org 
To: us...@open-mpi.org 
Sent: Thu Jan 21 10:05:03 2010
Subject: Re: [OMPI users] flex.exe

Am Donnerstag, den 21.01.2010, 11:52 -0500 schrieb Michael Di Domenico:
> openmpi-1.4.1/contrib/platform/win32/bin/flex.exe
> 
> I understand this file might be required for building on windows,
> since I'm not I can just delete the file without issue.
> 
> However, for those of us under import restrictions, where binaries are
> not allowed in, this file causes me to open the tarball and delete the
> file (not a big deal, i know, i know).
> 
> But, can I put up a vote for a pure source only tree?

I'm very much in favor of that since we can't ship this binary in
Debian. We'd have to delete it from the tarball and repack it with every
release which is quite cumbersome. If these tools could be shipped in a
separate tarball that would be great!

Best regards
Manuel

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] flex.exe

2010-01-21 Thread Manuel Prinz
Am Donnerstag, den 21.01.2010, 11:52 -0500 schrieb Michael Di Domenico:
> openmpi-1.4.1/contrib/platform/win32/bin/flex.exe
> 
> I understand this file might be required for building on windows,
> since I'm not I can just delete the file without issue.
> 
> However, for those of us under import restrictions, where binaries are
> not allowed in, this file causes me to open the tarball and delete the
> file (not a big deal, i know, i know).
> 
> But, can I put up a vote for a pure source only tree?

I'm very much in favor of that since we can't ship this binary in
Debian. We'd have to delete it from the tarball and repack it with every
release which is quite cumbersome. If these tools could be shipped in a
separate tarball that would be great!

Best regards
Manuel



[OMPI users] flex.exe

2010-01-21 Thread Michael Di Domenico
openmpi-1.4.1/contrib/platform/win32/bin/flex.exe

I understand this file might be required for building on windows,
since I'm not I can just delete the file without issue.

However, for those of us under import restrictions, where binaries are
not allowed in, this file causes me to open the tarball and delete the
file (not a big deal, i know, i know).

But, can I put up a vote for a pure source only tree?

Thanks...


[OMPI users] checkpointing multi node and multi process applications

2010-01-21 Thread Jean Potsam
Hi Josh/all,

I have upgraded the openmpi to v 1.4  but still get the same error when I try 
executing the application on multiple nodes:

***
 Error: expected_component: PID information unavailable!
 Error: expected_component: Component Name information unavailable!
***

I am running my application from the node 'portal11' as follows:

mpirun -am ft-enable-cr -np 2 --hostfile hosts  myapp.

The file 'hosts' contains two host names: portal10, portal11.

I am triggering the checkpoint using ompi-checkpoint -v 'PID' from portal11.


I configured open mpi as follows:

#

./configure --prefix=/home/jean/openmpi/ --enable-picky --enable-debug 
--enable-mpi-profile --enable-mpi-cxx --enable-pretty-print-stacktrace 
--enable-binaries --enable-trace --enable-static=yes --enable-debug 
--with-devel-headers=1 --with-mpi-param-check=always --with-ft=cr 
--enable-ft-thread --with-blcr=/usr/local/blcr/ 
--with-blcr-libdir=/usr/local/blcr/lib --enable-mpi-threads=yes
#

Question:



what do you think can be wrong? Please instruct me on how to resolve this 
problem.


Thank you

Jean


 

--- On Mon, 11/1/10, Josh Hursey  wrote:

From: Josh Hursey 
Subject: Re: [OMPI users] checkpointing multi node and multi process 
applications
To: "Open MPI Users" 
List-Post: users@lists.open-mpi.org
Date: Monday, 11 January, 2010, 21:42


On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote:

> Hi Everyone,
>                        I am trying to checkpoint an mpi application running 
>on multiple nodes. However, I get some error messages when i trigger the 
>checkpointing process.
> 
> Error: expected_component: PID information unavailable!
> Error: expected_component: Component Name information unavailable!
> 
> I am using  open mpi 1.3 and blcr 0.8.1

Can you try the v1.4 release and see if the problem persists?

> 
> I execute my application as follows:
> 
> mpirun -am ft-enable-cr -np 3 --hostfile hosts gol.
> 
> My question:
> 
> Does openmpi with blcr support checkpointing of multi node execution of mpi 
> application? If so, can you provide me with some information on how to 
> achieve this.

Open MPI is able to checkpoint a multi-node application (that's what it was 
designed to do). There are some examples at the link below:
  http://www.osl.iu.edu/research/ft/ompi-cr/examples.php

-- Josh

> 
> Cheers,
> 
> Jean.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users