Re: [OMPI users] Maximum message size for MPI_Send()/MPI_Recv() functions

2007-08-02 Thread George Bosilca
Allocating memory is one thing. Being able to use it it's a  
completely different story. Once you allocate the 8GB array can you  
fill it with some random values ? This will force the kernel to  
really give you the 8GB of memory. If this segfault, then that's the  
problem. If not ... the problem come from Open MPI I guess.


  Thanks,
george.

On Aug 2, 2007, at 6:59 PM, Juan Carlos Guzman wrote:


Jelena, George,

Thanks for your replies.


it is possible that the problem is not in MPI - I've seen similar
problem
on some of our workstations some time ago.
Juan, are you sure you can allocate more than 2x 4GB memory of  
data in

non-mpi program on your system?

Yes, I did a small program that can allocate more than 8 GB of memory
(using malloc()).

Cheers,
   Juan-Carlos.



Thanks,
Jelena

On Wed, 1 Aug 2007, George Bosilca wrote:


Juan,

I have to check to see what's wrong there. We build Open MPI with
full support for data transfer up to sizeof(size_t) bytes. so you
case should be covered. However, there are some known problems with
the MPI interface for data larger than sizeof(int). As an example  
the

_count field in the MPI_Status structure will be truncated ...

  Thanks,
george.

On Jul 30, 2007, at 1:47 AM, Juan Carlos Guzman wrote:


Hi,

Does anyone know the maximum buffer size I can use in MPI_Send()
(MPI_Recv) function?. I was doing some testing using two nodes  
on my

cluster to measure the point-to-point MPI message rate depending on
size. The test program exchanges MPI_FLOAT datatypes between two
nodes. I was able to send up to 4 GB of data (500 Mega MPI_FLOATs)
before the process crashed with a segmentation fault message.

Is the maximum size of the message limited by the sizeof(int) *
sizeof
(MPI data type) used in the MPI_Send()/MPI_Recv() functions?

My cluster has openmpi 1.2.3 installed. Each node has 2 x dual core
AMD Opteron and 12 GB RAM.

Thanks in advance.
   Juan-Carlos.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jelena Pjesivac-Grbovic, Pjesa
Graduate Research Assistant
Innovative Computing Laboratory
Computer Science Department, UTK
Claxton Complex 350
(865) 974 - 6722
(865) 974 - 6321
jpjes...@utk.edu

"The only difference between a problem and a solution is that
  people understand the solution."
   -- Charles Kettering



--

Message: 2
Date: Wed, 1 Aug 2007 15:06:56 -0500
From: "Adams, Samuel D Contr AFRL/HEDR" 
Subject: Re: [OMPI users] torque and openmpi
To: "Open MPI Users" 
Message-ID:

<8bf06a36e7ad424197195998d9a0b8e1d77...@fbrmlbr01.enterprise.afmc.ds. 
a

f.mil>

Content-Type: text/plain;   charset="us-ascii"

I reran the configure script with the --with-tm flag this time.
Thanks
for the info.  It was working before for clients with ssh properly
configured (i.e. my account only).  But now it is working without
having
to use ssh for all accounts (i.e. biologist and physicists users).

Sam Adams
General Dynamics Information Technology
Phone: 210.536.5945

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-bounces@open-
mpi.org] On
Behalf Of Jeff Squyres
Sent: Friday, July 27, 2007 2:58 PM
To: Open MPI Users
Subject: Re: [OMPI users] torque and openmpi

On Jul 27, 2007, at 2:48 PM, Galen Shipman wrote:


I set up ompi before I configured Torque.  Do I need to recompile
ompi
with appropriate torque configure options to get better  
integration?


If libtorque wasn't present on the machine at configure then yes,  
you

need to run:

./configure --with-tm=


You don't *have* to do this, of course.  If you've got it working
with ssh, that's fine.  But the integration with torque can be  
better:


- you can disable ssh for non-root accounts (assuming no other
services need rsh/ssh)
- users don't have to setup ssh keys to run MPI jobs (a small thing,
but sometimes nice when the users aren't computer scientists)
- torque knows about all processes on all nodes (not just the mother
superior) and can therefore both track and kill them if necessary

Just my $0.02...

--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--

Message: 3
Date: Wed, 1 Aug 2007 20:58:44 -0400
From: Jeff Squyres 
Subject: Re: [OMPI users] unable to compile open mpi using pgf90 in
AMD opteron system
To: Open MPI Users 
Message-ID: <5453c030-b7c9-48e1-bba7-f04bcc43c...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

On Aug 1, 2007, at 11:38 AM, S.Sundar Raman wrote:


dear 

Re: [OMPI users] values of mca parameters whilst running program

2007-08-02 Thread Brian Barrett

On Aug 2, 2007, at 4:22 PM, Glenn Carver wrote:


Hopefully an easy question to answer... is it possible to get at the
values of mca parameters whilst a program is running?   What I had in
mind was either an open-mpi function to call which would print the
current values of mca parameters or a function to call for specific
mca parameters. I don't want to interrupt the running of the
application.

Bit of background. I have a large F90 application running with
OpenMPI (as Sun Clustertools 7) on Opteron CPUs with an IB network.
We're seeing swap thrashing occurring on some of the nodes at times
and having searched the archives and read the FAQ believe we may be
seeing the problem described in:
http://www.open-mpi.org/community/lists/users/2007/01/2511.php
where the udapl free list is growing to a point where lockable  
memory runs out.


Problem is, I have no feel for the kinds of numbers  that
"btl_udapl_free_list_max" might safely get up to?  Hence the request
to print mca parameter values whilst the program is running to see if
we can tie in high values of this parameter to when we're seeing swap
thrashing.


Good news, the answer is easy.  Bad news is, it's not the one you  
want.  btl_udapl_free_list_max is the *greatest* the list will ever  
be allowed to grow to, not it's current size.  So if you don't  
specify a value and use the default of -1, it will return -1 for the  
life of the application, regardless of how big those free lists  
actually get.  If you specify value X, it'll return X for the lift of  
the application, as well.


There is not a good way for a user to find out the current size of a  
free list or the largest it got for the life of an application  
(currently those two will always be the same, but that's another  
story).  Your best bet is to set the parameter to some value (say,  
128 or 256) and see if that helps with the swapping.



Brian

--
  Brian W. Barrett
  Networking Team, CCS-1
  Los Alamos National Laboratory




Re: [OMPI users] Maximum message size for MPI_Send()/MPI_Recv() functions

2007-08-02 Thread Juan Carlos Guzman

Jelena, George,

Thanks for your replies.

it is possible that the problem is not in MPI - I've seen similar  
problem

on some of our workstations some time ago.
Juan, are you sure you can allocate more than 2x 4GB memory of data in
non-mpi program on your system?
Yes, I did a small program that can allocate more than 8 GB of memory  
(using malloc()).


Cheers,
  Juan-Carlos.



Thanks,
Jelena

On Wed, 1 Aug 2007, George Bosilca wrote:


Juan,

I have to check to see what's wrong there. We build Open MPI with
full support for data transfer up to sizeof(size_t) bytes. so you
case should be covered. However, there are some known problems with
the MPI interface for data larger than sizeof(int). As an example the
_count field in the MPI_Status structure will be truncated ...

  Thanks,
george.

On Jul 30, 2007, at 1:47 AM, Juan Carlos Guzman wrote:


Hi,

Does anyone know the maximum buffer size I can use in MPI_Send()
(MPI_Recv) function?. I was doing some testing using two nodes on my
cluster to measure the point-to-point MPI message rate depending on
size. The test program exchanges MPI_FLOAT datatypes between two
nodes. I was able to send up to 4 GB of data (500 Mega MPI_FLOATs)
before the process crashed with a segmentation fault message.

Is the maximum size of the message limited by the sizeof(int) *  
sizeof

(MPI data type) used in the MPI_Send()/MPI_Recv() functions?

My cluster has openmpi 1.2.3 installed. Each node has 2 x dual core
AMD Opteron and 12 GB RAM.

Thanks in advance.
   Juan-Carlos.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jelena Pjesivac-Grbovic, Pjesa
Graduate Research Assistant
Innovative Computing Laboratory
Computer Science Department, UTK
Claxton Complex 350
(865) 974 - 6722
(865) 974 - 6321
jpjes...@utk.edu

"The only difference between a problem and a solution is that
  people understand the solution."
   -- Charles Kettering



--

Message: 2
Date: Wed, 1 Aug 2007 15:06:56 -0500
From: "Adams, Samuel D Contr AFRL/HEDR" 
Subject: Re: [OMPI users] torque and openmpi
To: "Open MPI Users" 
Message-ID:
	 
<8bf06a36e7ad424197195998d9a0b8e1d77...@fbrmlbr01.enterprise.afmc.ds.a 
f.mil>


Content-Type: text/plain;   charset="us-ascii"

I reran the configure script with the --with-tm flag this time.   
Thanks

for the info.  It was working before for clients with ssh properly
configured (i.e. my account only).  But now it is working without  
having

to use ssh for all accounts (i.e. biologist and physicists users).

Sam Adams
General Dynamics Information Technology
Phone: 210.536.5945

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-bounces@open- 
mpi.org] On

Behalf Of Jeff Squyres
Sent: Friday, July 27, 2007 2:58 PM
To: Open MPI Users
Subject: Re: [OMPI users] torque and openmpi

On Jul 27, 2007, at 2:48 PM, Galen Shipman wrote:


I set up ompi before I configured Torque.  Do I need to recompile
ompi
with appropriate torque configure options to get better integration?


If libtorque wasn't present on the machine at configure then yes, you
need to run:

./configure --with-tm=


You don't *have* to do this, of course.  If you've got it working
with ssh, that's fine.  But the integration with torque can be better:

- you can disable ssh for non-root accounts (assuming no other
services need rsh/ssh)
- users don't have to setup ssh keys to run MPI jobs (a small thing,
but sometimes nice when the users aren't computer scientists)
- torque knows about all processes on all nodes (not just the mother
superior) and can therefore both track and kill them if necessary

Just my $0.02...

--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--

Message: 3
Date: Wed, 1 Aug 2007 20:58:44 -0400
From: Jeff Squyres 
Subject: Re: [OMPI users] unable to compile open mpi using pgf90 in
AMD opteron system
To: Open MPI Users 
Message-ID: <5453c030-b7c9-48e1-bba7-f04bcc43c...@cisco.com>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed

On Aug 1, 2007, at 11:38 AM, S.Sundar Raman wrote:


dear openmpi users,
i m trying to compile openmpi with pgf90 compiler in AMD opteron
system.
i followed the procedure given in the mailer archives.


What procedure are you referring to, specifically?


i found the following problem.
please kindly help me in this regard and i m eagerly waiting for
your reply
make[2]: Entering directory `/usr/local/openmpi-1.2.3/ompi/mpi/f90'

/bin/sh ../../../libtool --mode=link pgf90 

[OMPI users] values of mca parameters whilst running program

2007-08-02 Thread Glenn Carver
Hopefully an easy question to answer... is it possible to get at the 
values of mca parameters whilst a program is running?   What I had in 
mind was either an open-mpi function to call which would print the 
current values of mca parameters or a function to call for specific 
mca parameters. I don't want to interrupt the running of the 
application.


Bit of background. I have a large F90 application running with 
OpenMPI (as Sun Clustertools 7) on Opteron CPUs with an IB network. 
We're seeing swap thrashing occurring on some of the nodes at times 
and having searched the archives and read the FAQ believe we may be 
seeing the problem described in:

http://www.open-mpi.org/community/lists/users/2007/01/2511.php
where the udapl free list is growing to a point where lockable memory runs out.

Problem is, I have no feel for the kinds of numbers  that 
"btl_udapl_free_list_max" might safely get up to?  Hence the request 
to print mca parameter values whilst the program is running to see if 
we can tie in high values of this parameter to when we're seeing swap 
thrashing.


Thanks,
Glenn




$ ompi_info --all
Open MPI: 1.2.1r14096-ct7b030r1838
   Open MPI SVN revision: 0
Open RTE: 1.2.1r14096-ct7b030r1838
   Open RTE SVN revision: 0
OPAL: 1.2.1r14096-ct7b030r1838
   OPAL SVN revision: 0
   MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.1)
   MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
   MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
   MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
  MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
  MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
 MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
  MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
  MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
  MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1)
 MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
 MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.1)
 MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
 MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1)
 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1)
 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1)
   MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.1)
MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
 MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
 MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
 MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
 MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
 MCA sds: env (MCA v1.0, API v1.0, Component 

Re: [OMPI users] mpi daemon

2007-08-02 Thread Ralph H Castain
The daemon's name is "orted" - one will be launched on each remote node as
the application is started, but they only live for as long as the
application is executing. Then they go away.


On 8/2/07 12:47 PM, "Reuti"  wrote:

> Am 02.08.2007 um 18:32 schrieb Francesco Pietra:
> 
>> I compiled successfully the MD suite Amber9 on openmpi-1.2.3,
>> installed om
>> Debian Linux amd64 etch.
>> 
>> Although all tests for parallel amber9 passed successfully, when I run
>> 
>> ps -aux
>> 
>> I don't see any daemon referring to mpi. How is that daemon
>> identified, or how
>> should it be started?
> 
> The output of:
> 
> ps f -eo pid,ppid,pgrp,user,group,command
> 
> might be more informative.
> 
> -- Reuti
> 
> 
>> Thanks
>> 
>> francesco pietra
>> 
>> 
>>
>> __
>> __
>> Shape Yahoo! in your own image.  Join our Network Research Panel
>> today!   http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] mpi daemon

2007-08-02 Thread Reuti

Am 02.08.2007 um 18:32 schrieb Francesco Pietra:

I compiled successfully the MD suite Amber9 on openmpi-1.2.3,  
installed om

Debian Linux amd64 etch.

Although all tests for parallel amber9 passed successfully, when I run

ps -aux

I don't see any daemon referring to mpi. How is that daemon  
identified, or how

should it be started?


The output of:

ps f -eo pid,ppid,pgrp,user,group,command

might be more informative.

-- Reuti



Thanks

francesco pietra


   
__ 
__
Shape Yahoo! in your own image.  Join our Network Research Panel  
today!   http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] mpi daemon

2007-08-02 Thread Francesco Pietra
I compiled successfully the MD suite Amber9 on openmpi-1.2.3, installed om
Debian Linux amd64 etch. 

Although all tests for parallel amber9 passed successfully, when I run

ps -aux

I don't see any daemon referring to mpi. How is that daemon identified, or how
should it be started?

Thanks

francesco pietra


  

Shape Yahoo! in your own image.  Join our Network Research Panel today!   
http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7