Re: [OMPI users] debugging with mpirun

2006-07-06 Thread Brian Barrett

On Jul 6, 2006, at 8:27 PM, Manal Helal wrote:


I am trying to debug my mpi program, but printf debugging is not doing
much, and I need something that can show me variable values, and which
line of execution (and where it is called from), something like gdb  
with

mpi,

is there anything like that?


There are a couple of options.  The first (works best with ssh, but  
can be made to work with most starting mechanisms) is to start a  
bunch of gdb sessions in xterms.  Something like:


  mpirun -np XX -d xterm -e gdb 

The '-d' option is necessary so that mpirun doesn't close the ssh  
sessions, severing its X11 forwarding channel.  This has the  
advantage of being free, but has the disadvantage of being a major  
pain.  A better option is to try a real parallel debugger, such as  
TotalView or Portland Group's PGDBG.  This has the advantage of  
working very well (I use TotalView whenever possible), but has the  
disadvantage of generally not being free.



Hope this helps,

Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




[OMPI users] debugging with mpirun

2006-07-06 Thread Manal Helal
hi

I am trying to debug my mpi program, but printf debugging is not doing
much, and I need something that can show me variable values, and which
line of execution (and where it is called from), something like gdb with
mpi, 

is there anything like that?

thank you very much for your help, 

Manal



Re: [OMPI users] MPI_Recv, is it possible to switch on/off aggresive mode during runtime?

2006-07-06 Thread Brian Barrett

On Jul 5, 2006, at 8:54 AM, Marcin Skoczylas wrote:


I saw some posts ago almost the same question as I have, but it didn't
give me satisfactional answer.
I have setup like this:

GUI program on some machine (f.e. laptop)
Head listening on tcpip socket for commands from GUI.
Workers waiting for commands from Head / processing the data.

And now it's problematic. For passing the commands from Head I'm  
using:

while(true)
{
MPI_Recv...

do whatever head said (process small portion of the data, return
result to head, wait for another commands)
}

So in the idle time workers are stuck in MPI_Recv and have 100% CPU
usage, even if they are just waiting for the commands from Head.
Normally, I would not prefer to have this situation as I sometimes  
have

to share the cluster with others. I would prefer not to stop whole mpi
program, but just go into 'idle' mode, and thus make it run again  
soon.

Also I would like to have this aggresive MPI_Recv approach switched on
when I'm alone on the cluster. So is it possible somehow to switch  
this

mode on/off during runtime? Thank you in advance!


Currently, there is not a way to do this.  Obviously, there's not  
going to be a way that is portable (ie, compiles with MPICH), but it  
may be possible to add this in the future.  It likely won't happen  
for the v1.1 release series, and I can't really speak for releases  
past that at this point.  I'll file an enhancement request in our  
internal bug tracker, and add you to the list of people to be  
notified when the ticket is updated.



Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [OMPI users] MPI_Comm_spawn error messages

2006-07-06 Thread s anwar

Ralph:

I am running the application without mpirun, i.e. ./foobar. So, according to
you definition of singleton above, I am calling comm_spawn from a singleton.

Thanks.
Saadat.


On 7/6/06, Ralph Castain  wrote:


 Thanks Saadat

Could you clarify how you are running this application? We have a known
problem with comm_spawn from a singleton (i.e., if you just did a.outinstead of 
mpirun —np 1
a.out) - the errors look somewhat like what you are showing here, hence
our curiousity.

Thanks
Ralph




On 7/6/06 3:12 PM, "s anwar"  wrote:

Ralph:

I am using Fedora Core 4 (Linux turkana 2.6.12-1.1390_FC4smp #1 SMP Tue
Jul 5 20:21:11 EDT 2005 i686 athlon i386 GNU/Linux). The machine is a dual
processor Athlon based machine. No, cluster resource manager, just an
rsh/ssh based setup.

Thanks.
Saadat.

On 7/6/06, *Ralph H Castain*  wrote:

Hi Saadat

Could you tell us something more about the system you are using? What type
of processors, operating system, any resource manager (e.g., SLURM, PBS),
etc?

Thanks
Ralph




On 7/6/06 10:49 AM, "s anwar"  wrote:

Good Day:

I am getting the following error messages every time I run a very simple
program that spawns child processes:
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/soh_base_get_proc_soh.c at line 80
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/oob_base_xcast.c at line 108
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/rmgr_base_stage_gate.c at line 276
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/soh_base_get_proc_soh.c at line 80
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/oob_base_xcast.c at line 108
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/rmgr_base_stage_gate.c at line 276

These errors are being generated by the master process. Does any body know
what do they mean?

Also, if I spawn four child processes, not all of them run to completion,
i.e. till MPI_Finalize.

Saadat.

--
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MPI_Comm_spawn error messages

2006-07-06 Thread s anwar

Ralph:

I am using Fedora Core 4 (Linux turkana 2.6.12-1.1390_FC4smp #1 SMP Tue Jul
5 20:21:11 EDT 2005 i686 athlon i386 GNU/Linux). The machine is a dual
processor Athlon based machine. No, cluster resource manager, just an
rsh/ssh based setup.

Thanks.
Saadat.

On 7/6/06, Ralph H Castain  wrote:


 Hi Saadat

Could you tell us something more about the system you are using? What type
of processors, operating system, any resource manager (e.g., SLURM, PBS),
etc?

Thanks
Ralph




On 7/6/06 10:49 AM, "s anwar"  wrote:

Good Day:

I am getting the following error messages every time I run a very simple
program that spawns child processes:
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/soh_base_get_proc_soh.c at line 80
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/oob_base_xcast.c at line 108
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/rmgr_base_stage_gate.c at line 276
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/soh_base_get_proc_soh.c at line 80
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/oob_base_xcast.c at line 108
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/rmgr_base_stage_gate.c at line 276

These errors are being generated by the master process. Does any body know
what do they mean?

Also, if I spawn four child processes, not all of them run to completion,
i.e. till MPI_Finalize.

Saadat.

--
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder

With 1.0.3a1r10670 the same problem is occuring.  Again the same configure
arguments
as before.  For clarity, the Myrinet drive we are using is 2.0.21

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$ gm_board_info
GM build ID is "2.0.21_MacOSX_rc20050429075134PDT
r...@node96.meldrew.clusters.umaine.edu:/usr/src/gm-2.0.21_MacOSX Fri Jun 16
14:39:45 EDT 2006."

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun
-np 2 xhpl
This succeeds.
||Ax-b||_oo / ( eps * ||A||_1  * N) =0.1196787 .. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0283195 .. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0063300 .. PASSED

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun
-mca btl gm -np 2 xhpl
This fails.
||Ax-b||_oo / ( eps * ||A||_1  * N) =
717370209518881444284334080.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 226686309135.4274597 ..
FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 2386641249.6518722 ..
FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . = 2037398812542965504.00
||A||_oo . . . . . . . . . . . . . . . . . . . =2561.554752
||A||_1  . . . . . . . . . . . . . . . . . . . =2558.129237
||x||_oo . . . . . . . . . . . . . . . . . . . = 300175355203841216.00
||x||_1  . . . . . . . . . . . . . . . . . . . = 31645943341479366656.00

Does anyone have a working system with OS X and Myrinet (GM)?  If so, I'd
love to hear
the configure arguments and various versions you are using.  Bonus points if
you are
using the IBM XL compilers.

Thanks,
Justin.


On 7/6/06, Justin Bronder  wrote:


Yes, that output was actually cut and pasted from an OS X run.  I'm about
to test
against 1.0.3a1r10670.

Justin.

On 7/6/06, Galen M. Shipman  wrote:

> Justin,
> Is the OS X run showing the same residual failure?
>
> - Galen
>
> On Jul 6, 2006, at 10:49 AM, Justin Bronder wrote:
>
> Disregard the failure on Linux, a rebuild from scratch of HPL and
> OpenMPI
> seems to have resolved the issue.  At least I'm not getting the errors
> during
> the residual checks.
>
> However, this is persisting under OS X.
>
> Thanks,
> Justin.
>
> On 7/6/06, Justin Bronder < jsbron...@gmail.com> wrote:
>
> > For OS X:
> > /usr/local/ompi-xl/bin/mpirun -mca btl gm -np 4 ./xhpl
> >
> > For Linux:
> > ARCH=ompi-gnu-1.1.1a
> > /usr/local/$ARCH/bin/mpiexec -mca btl gm -np 2 -path
> > /usr/local/$ARCH/bin ./xhpl
> >
> > Thanks for the speedy response,
> > Justin.
> >
> > On 7/6/06, Galen M. Shipman < gship...@lanl.gov> wrote:
> >
> > > Hey Justin,
> > Please provide us your mca parameters (if any), these could be in a
> > config file, environment variables or on the command line.
> >
> > Thanks,
> >
> > Galen
> >
> >  On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:
> >
> > As far as the nightly builds go, I'm still seeing what I believe to be
> >
> > this problem in both r10670 and r10652.  This is happening with
> > both Linux and OS X.  Below are the systems and ompi_info for the
> > newest revision 10670.
> >
> > As an example of the error, when running HPL with Myrinet I get the
> > following error.  Using tcp everything is fine and I see the results
> > I'd
> > expect.
> >
> > 
> > ||Ax-b||_oo / ( eps * ||A||_1  * N) =
> > 42820214496954887558164928727596662784.000 .. FAILED
> > ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 156556068835.2711182.. 
FAILED
> > ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558.. 
FAILED
> > ||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
> > 272683853978565028754868928512.00
> > ||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
> > ||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
> > ||x||_oo . . . . . . . . . . . . . . . . . . . =
> > 37037692483529688659798261760.00
> > ||x||_1  . . . . . . . . . . . . . . . . . . . =
> > 4102704048669982798475494948864.00
> > ===
> >
> > Finished  1 tests with the following results:
> >   0 tests completed and passed residual checks,
> >   1 tests completed and failed residual checks,
> >   0 tests skipped because of illegal input values.
> >
> > 
> >
> > Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64
> > PPC970FX, altivec supported GNU/Linux
> > jbronder@node41 ~ $ /usr/local/ompi- gnu-1.1.1a/bin/ompi_info
> > Open MPI: 1.1.1a1r10670
> >Open MPI SVN revision: r10670
> > Open RTE: 1.1.1a1r10670
> >Open RTE SVN revision: r10670
> > OPAL: 1.1.1a1r10670
> >OPAL SVN revision: r10670
> >   Prefix: /usr/local/ompi-gnu-1.1.1a
> >  Configured 

Re: [OMPI users] MPI_Comm_spawn error messages

2006-07-06 Thread Ralph H Castain
Hi Saadat

Could you tell us something more about the system you are using? What type
of processors, operating system, any resource manager (e.g., SLURM, PBS),
etc?

Thanks
Ralph



On 7/6/06 10:49 AM, "s anwar"  wrote:

> Good Day:
> 
> I am getting the following error messages every time I run a very simple
> program that spawns child processes:
> [turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
> base/soh_base_get_proc_soh.c at line 80
> [turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
> base/oob_base_xcast.c at line 108
> [turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
> base/rmgr_base_stage_gate.c at line 276
> [turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
> base/soh_base_get_proc_soh.c at line 80
> [turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
> base/oob_base_xcast.c at line 108
> [turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
> base/rmgr_base_stage_gate.c at line 276
> 
> These errors are being generated by the master process. Does any body know
> what do they mean?
> 
> Also, if I spawn four child processes, not all of them run to completion, i.e.
> till MPI_Finalize.
> 
> Saadat.
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Galen M. Shipman

Justin,

Is the OS X run showing the same residual failure?

- Galen

On Jul 6, 2006, at 10:49 AM, Justin Bronder wrote:

Disregard the failure on Linux, a rebuild from scratch of HPL and  
OpenMPI
seems to have resolved the issue.  At least I'm not getting the  
errors during

the residual checks.

However, this is persisting under OS X.

Thanks,
Justin.

On 7/6/06, Justin Bronder  wrote:
For OS X:
/usr/local/ompi-xl/bin/mpirun -mca btl gm -np 4 ./xhpl

For Linux:
ARCH=ompi-gnu-1.1.1a
/usr/local/$ARCH/bin/mpiexec -mca btl gm -np 2 -path /usr/local/ 
$ARCH/bin ./xhpl


Thanks for the speedy response,
Justin.

On 7/6/06, Galen M. Shipman < gship...@lanl.gov> wrote:
Hey Justin,

Please provide us your mca parameters (if any), these could be in a  
config file, environment variables or on the command line.


Thanks,

Galen

On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:

As far as the nightly builds go, I'm still seeing what I believe to be
this problem in both r10670 and r10652.  This is happening with
both Linux and OS X.  Below are the systems and ompi_info for the
newest revision 10670.

As an example of the error, when running HPL with Myrinet I get the
following error.  Using tcp everything is fine and I see the  
results I'd

expect.
-- 
--
||Ax-b||_oo / ( eps * ||A||_1  * N) =  
42820214496954887558164928727596662784.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =  
156556068835.2711182 .. FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =  
1156439380.5172558 .. FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =  
272683853978565028754868928512.00

||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
||x||_oo . . . . . . . . . . . . . . . . . . . =  
37037692483529688659798261760.00
||x||_1  . . . . . . . . . . . . . . . . . . . =  
4102704048669982798475494948864.00

===

Finished  1 tests with the following results:
  0 tests completed and passed residual checks,
  1 tests completed and failed residual checks,
  0 tests skipped because of illegal input values.
-- 
--


Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64  
PPC970FX, altivec supported GNU/Linux

jbronder@node41 ~ $ /usr/local/ompi- gnu-1.1.1a/bin/ompi_info
Open MPI: 1.1.1a1r10670
   Open MPI SVN revision: r10670
Open RTE: 1.1.1a1r10670
   Open RTE SVN revision: r10670
OPAL: 1.1.1a1r10670
   OPAL SVN revision: r10670
  Prefix: /usr/local/ompi-gnu-1.1.1a
 Configured architecture: powerpc64-unknown-linux-gnu
   Configured by: root
   Configured on: Thu Jul  6 10:15:37 EDT 2006
  Configure host: node41
Built by: root
Built on: Thu Jul  6 10:28:14 EDT 2006
  Built host: node41
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 
4.1.0/gfortran

  Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 
4.1.0/gfortran

 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component  
v1.1.1)

   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component  
v1.1.1)

   MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Componentv1.1.1)

   MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
   MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
   MCA mpool: sm (MCA v1.0, 

[OMPI users] MPI_Comm_spawn error messages

2006-07-06 Thread s anwar

Good Day:

I am getting the following error messages every time I run a very simple
program that spawns child processes:
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/soh_base_get_proc_soh.c at line 80
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/oob_base_xcast.c at line 108
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/rmgr_base_stage_gate.c at line 276
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/soh_base_get_proc_soh.c at line 80
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/oob_base_xcast.c at line 108
[turkana:27949] [0,0,0] ORTE_ERROR_LOG: Not found in file
base/rmgr_base_stage_gate.c at line 276

These errors are being generated by the master process. Does any body know
what do they mean?

Also, if I spawn four child processes, not all of them run to completion,
i.e. till MPI_Finalize.

Saadat.


Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder

Disregard the failure on Linux, a rebuild from scratch of HPL and OpenMPI
seems to have resolved the issue.  At least I'm not getting the errors
during
the residual checks.

However, this is persisting under OS X.

Thanks,
Justin.

On 7/6/06, Justin Bronder  wrote:


For OS X:
/usr/local/ompi-xl/bin/mpirun -mca btl gm -np 4 ./xhpl

For Linux:
ARCH=ompi-gnu-1.1.1a
/usr/local/$ARCH/bin/mpiexec -mca btl gm -np 2 -path /usr/local/$ARCH/bin
./xhpl

Thanks for the speedy response,
Justin.

On 7/6/06, Galen M. Shipman  wrote:

> Hey Justin,
> Please provide us your mca parameters (if any), these could be in a
> config file, environment variables or on the command line.
>
> Thanks,
>
> Galen
>
> On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:
>
> As far as the nightly builds go, I'm still seeing what I believe to be
> this problem in both r10670 and r10652.  This is happening with
> both Linux and OS X.  Below are the systems and ompi_info for the
> newest revision 10670.
>
> As an example of the error, when running HPL with Myrinet I get the
> following error.  Using tcp everything is fine and I see the results I'd
>
> expect.
>
> 
> ||Ax-b||_oo / ( eps * ||A||_1  * N) =
> 42820214496954887558164928727596662784.000 .. FAILED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 156556068835.2711182.. 
FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558 ..
> FAILED
> ||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
> 272683853978565028754868928512.00
> ||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
> ||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
> ||x||_oo . . . . . . . . . . . . . . . . . . . =
> 37037692483529688659798261760.00
> ||x||_1  . . . . . . . . . . . . . . . . . . . =
> 4102704048669982798475494948864.00
> ===
>
> Finished  1 tests with the following results:
>   0 tests completed and passed residual checks,
>   1 tests completed and failed residual checks,
>   0 tests skipped because of illegal input values.
>
> 
>
> Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64
> PPC970FX, altivec supported GNU/Linux
> jbronder@node41 ~ $ /usr/local/ompi- gnu-1.1.1a/bin/ompi_info
> Open MPI: 1.1.1a1r10670
>Open MPI SVN revision: r10670
> Open RTE: 1.1.1a1r10670
>Open RTE SVN revision: r10670
> OPAL: 1.1.1a1r10670
>OPAL SVN revision: r10670
>   Prefix: /usr/local/ompi-gnu-1.1.1a
>  Configured architecture: powerpc64-unknown-linux-gnu
>Configured by: root
>Configured on: Thu Jul  6 10:15:37 EDT 2006
>   Configure host: node41
> Built by: root
> Built on: Thu Jul  6 10:28:14 EDT 2006
>   Built host: node41
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>   Fortran90 bindings: yes
>  Fortran90 bindings size: small
>   C compiler: gcc
>  C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
>C++ compiler absolute: /usr/bin/g++
>   Fortran77 compiler: gfortran
>   Fortran77 compiler abs:
> /usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
>   Fortran90 compiler: gfortran
>   Fortran90 compiler abs:
> /usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: yes
>  Fortran90 profiling: yes
>   C++ exceptions: no
>   Thread support: posix (mpi: no, progress: no)
>   Internal debug support: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.1.1)
>MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
>MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.1.1)
>MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
>MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Componentv1.1.1)
>
>MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
>   MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
>MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
>   

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Galen M. Shipman

Hey Justin,

Please provide us your mca parameters (if any), these could be in a  
config file, environment variables or on the command line.


Thanks,

Galen

On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:


As far as the nightly builds go, I'm still seeing what I believe to be
this problem in both r10670 and r10652.  This is happening with
both Linux and OS X.  Below are the systems and ompi_info for the
newest revision 10670.

As an example of the error, when running HPL with Myrinet I get the
following error.  Using tcp everything is fine and I see the  
results I'd

expect.
-- 
--
||Ax-b||_oo / ( eps * ||A||_1  * N) =  
42820214496954887558164928727596662784.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =  
156556068835.2711182 .. FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =  
1156439380.5172558 .. FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =  
272683853978565028754868928512.00

||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
||x||_oo . . . . . . . . . . . . . . . . . . . =  
37037692483529688659798261760.00
||x||_1  . . . . . . . . . . . . . . . . . . . =  
4102704048669982798475494948864.00

===

Finished  1 tests with the following results:
  0 tests completed and passed residual checks,
  1 tests completed and failed residual checks,
  0 tests skipped because of illegal input values.
-- 
--


Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64  
PPC970FX, altivec supported GNU/Linux

jbronder@node41 ~ $ /usr/local/ompi-gnu-1.1.1a/bin/ompi_info
Open MPI: 1.1.1a1r10670
   Open MPI SVN revision: r10670
Open RTE: 1.1.1a1r10670
   Open RTE SVN revision: r10670
OPAL: 1.1.1a1r10670
   OPAL SVN revision: r10670
  Prefix: /usr/local/ompi-gnu-1.1.1a
 Configured architecture: powerpc64-unknown-linux-gnu
   Configured by: root
   Configured on: Thu Jul  6 10:15:37 EDT 2006
  Configure host: node41
Built by: root
Built on: Thu Jul  6 10:28:14 EDT 2006
  Built host: node41
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 
4.1.0/gfortran

  Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 
4.1.0/gfortran

 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component  
v1.1.1)

   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component  
v1.1.1)

   MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
   MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
  MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
 MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
 MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: null (MCA v1.0, API 

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder

As far as the nightly builds go, I'm still seeing what I believe to be
this problem in both r10670 and r10652.  This is happening with
both Linux and OS X.  Below are the systems and ompi_info for the
newest revision 10670.

As an example of the error, when running HPL with Myrinet I get the
following error.  Using tcp everything is fine and I see the results I'd
expect.

||Ax-b||_oo / ( eps * ||A||_1  * N) =
42820214496954887558164928727596662784.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 156556068835.2711182 ..
FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558 ..
FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
272683853978565028754868928512.00
||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
||x||_oo . . . . . . . . . . . . . . . . . . . =
37037692483529688659798261760.00
||x||_1  . . . . . . . . . . . . . . . . . . . =
4102704048669982798475494948864.00
===

Finished  1 tests with the following results:
 0 tests completed and passed residual checks,
 1 tests completed and failed residual checks,
 0 tests skipped because of illegal input values.


Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64 PPC970FX,
altivec supported GNU/Linux
jbronder@node41 ~ $ /usr/local/ompi-gnu-1.1.1a/bin/ompi_info
   Open MPI: 1.1.1a1r10670
  Open MPI SVN revision: r10670
   Open RTE: 1.1.1a1r10670
  Open RTE SVN revision: r10670
   OPAL: 1.1.1a1r10670
  OPAL SVN revision: r10670
 Prefix: /usr/local/ompi-gnu-1.1.1a
Configured architecture: powerpc64-unknown-linux-gnu
  Configured by: root
  Configured on: Thu Jul  6 10:15:37 EDT 2006
 Configure host: node41
   Built by: root
   Built on: Thu Jul  6 10:28:14 EDT 2006
 Built host: node41
 C bindings: yes
   C++ bindings: yes
 Fortran77 bindings: yes (all)
 Fortran90 bindings: yes
Fortran90 bindings size: small
 C compiler: gcc
C compiler absolute: /usr/bin/gcc
   C++ compiler: g++
  C++ compiler absolute: /usr/bin/g++
 Fortran77 compiler: gfortran
 Fortran77 compiler abs:
/usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
 Fortran90 compiler: gfortran
 Fortran90 compiler abs:
/usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
C profiling: yes
  C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
 C++ exceptions: no
 Thread support: posix (mpi: no, progress: no)
 Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
 MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.1)
  MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
  MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.1)
  MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
  MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
  MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
   MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
  MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
 MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
  MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
  MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
 MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
   MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1)
MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.1)
MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1)
 

Re: [OMPI users] (no subject)

2006-07-06 Thread Jonathan Blocksom
Check out "Windows Compute Cluster Server 2003", 
http://www.microsoft.com/windowsserver2003/ccs/default.mspx.


From the FAQ: "Windows Compute Cluster Server 2003 comes with the 
Microsoft Message Passing Interface (MS MPI), an MPI stack based on the 
MPICH2 implementation from Argonne National Labs."


I have no experience with it, just sharing the link.

Jonathan

usha devi regadi wrote:
  
hello

 I'll be glad to know if an MPI is available On WINDOWS Platform.

 Regards
usha






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] OpenMPI, debugging, and Portland Group's pgdbg

2006-07-06 Thread Jeff Squyres (jsquyres)
Thanks for looking into this!

I'm going to file a feature enhancement for OMPI to add this option once
the PGI debugger works with Open MPI (I don't want to add it before
then, because it may be misleading to users).


> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Andrew J Caird
> Sent: Wednesday, July 05, 2006 9:16 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI, debugging, and Portland 
> Group's pgdbg
> 
> 
> This took a long time for me to get to, but once I did, what 
> I found was 
> that the closest thing to working for the PGI compilers with 
> OpenMPI is 
> this command:
>mpirun --debugger "pgdbg @mpirun@ @mpirun_args@" --debug 
> -np 2 ./cpi
> 
> It appears to work, that is, you can select a process with the "proc" 
> command in pgdbg and set break points and all, but pgdbg 
> prints a lot of 
> error messages that are all the same:
> db_set_code_brk : DiBreakpointSet fails
> which is sort of annoying, but didn't impede my debugging of 
> my 100-line 
> MPI test program.
> 
> I posted this to the PGI Debugger Forum:
>http://www.pgroup.com/userforum/viewtopic.php?p=1969
> and got a response saying (hopefully Mat doesn't mind me 
> quoting him)::
> 
> >  Hi Andy,
> >  Actually I'm pleasantly surprised that PGDBG works at all 
> with OpenMPI
> >  since PGDBG currently only supports MPICH. While we're planning on
> >  adding OpenMPI and MPICH-2 support later this year, in the 
> immediate
> >  future, there isn't a work around this problem, other than to use
> >  MPICH.
> >  Thanks,
> >  Mat
> 
> So I guess the short answer is that is might sort of work if 
> you really 
> need it, otherwise it's best to wait a little while.
> 
> --andy
> 
> On Fri, 16 Jun 2006, Jeff Squyres (jsquyres) wrote:
> 
> > I'm afraid that I'm not familiar with the PG debugger, so I 
> don't know
> > how it is supposed to be launched.
> >
> > The intent with --debugger / --debug is that you could do a single
> > invocation of some command and it launches both the 
> parallel debugger
> > and tells that debugger to launch your parallel MPI process 
> (assumedly
> > allowing the parallel debugger to attach to your parallel 
> MPI process).
> > This is what fx2 and Totalview allow, for example.
> >
> > As such, the "--debug" option is simply syntactic sugar for invoking
> > another [perhaps non-obvious] command.  We figured it was 
> simpler for
> > users to add "--debug" to the already-familiar mpirun 
> command line than
> > to learn a new syntax for invoking a debugger (although both would
> > certainly work equally well).
> >
> > As such, when OMPI's mpirun sees "--debug", it ends up exec'ing
> > something else -- the parallel debugger command.  In the 
> example that I
> > gave in 
> http://www.open-mpi.org/community/lists/users/2005/11/0370.php,
> > mpirun looked for two things in your path: totalview and fx2.
> >
> > For example, if you did this:
> >
> > mpirun --debug -np 4 a.out
> >
> > If it found totalview, it would end up exec'ing:
> >
> > totalview @mpirun@ -a @mpirun_args@
> > which would get substituted to
> > totalview mpirun -a -np 4 a.out
> >
> > (note the additional "-a") Which is the totalview command 
> line syntax to
> > launch their debugger and tell it to launch your parallel 
> process.  If
> > totalview is not found in your path, it'll look for fx2.  If fx2 is
> > found, it'll invoke:
> >
> > fx2 @mpirun@ -a @mpirun_args@
> > which would get substitued to
> > fx2 mpirun -a -np 4 a.out
> >
> > You can see that fx2's syntax was probably influenced by 
> totalview's.
> >
> > So what you need is the command line that tells pgdbg to do the same
> > thing -- launch your app and attach to it.  You can then 
> substitute that
> > into the "--debugger" option (using the @mpirun@ and @mpirun_args@
> > tokens), or set the MCA parameter 
> "orte_base_user_debugger", and then
> > use --debug.  For example, if the pgdbg syntax is similar to that of
> > totalview and fx2, then you could do the following:
> >
> > mpirun --debugger pgdbg @mpirun@ -a @mpirun_args@ --debug -np 4
> > a.out
> > or (assuming tcsh)
> > shell% setenv OMPI_MCA_orte_base_user_debugger "pgdbg @mpirun@
> > -a @mpirun_args@"
> > shell% mpirun --debug -np 4 a.out
> >
> > Make sense?
> >
> > If you find a fixed format for pgdb, we'd be happy to add it to the 
> > default value of the orte_base_user_debugger MCA parameter.
> >
> > Note that OMPI currently only supports the Totalview API 
> for attaching 
> > to MPI processes -- I don't know if pgdbg requires something else.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>