Re: [OMPI users] Tmpdir work for first process only

2007-11-15 Thread Aurelien Bouteiller

Hi Clement,

First, if you run 400 jobs on 16 nodes you will end up with around 32  
processes on each nodes. Depending on the memory footprint of the  
application it will fail because of memory exhaustion. Usually I am  
able to oversubscribe up to 64 NAS class B processes on 2GB, and less  
than 16 class C.


About your initial problem: tmpdir is the temporary directory for the  
orte seed only. As you discovered, this parameter is ignored by all  
the other processes. However you can use the TMPDIR environment  
variable to set the tmpdir on every open MPI process. Juste use mpirun  
-X TMPDIR=/some/where to set it.


Regards,
Aurelien


Le 15 nov. 07 à 07:04, Clement Kam Man Chu a écrit :


Jeff Squyres wrote:

Thanks for your reply.  I am using pbs job scheduler and I reqested 16
cpus to run 400 processes, but I don't how many processes are  
allocated

on each cpus.  Do you think it is a problem?

Clement

Are you running all of these processes on the same machine, or
multiple different machines?

If you're running 400 processes on the same machine, it may well be
that you are simply running out of memory or other OS resources.  In
particular, I've never seem iof fail that way before (iof is our I/O
forwarding subsystem).

Looking at the iof code, the error you're seeing occurs when iof is
trying to create a pipe between our OMPI "helper daemon" and the  
newly

spawned user executable and fails.  The only reason that I can guess
for why that would happen is if a max limit of pipes have been  
created

on a machine and the OS refuses to create any more...?



On Nov 14, 2007, at 9:36 PM, Clement Kam Man Chu wrote:



Hi,

I have configured out why the tmpdir parameter works for the first
process. I got another problem if I tried to run 400 processes (no
problem if under 400 processes). I got an error "ORTE_ERROR_LOG: Out
of
resource in file base/iof_base_setup.c at line 106". I attached the
message as below:

[ac27:12442] [0,0,0] setting up session dir with
[ac27:12442] tmpdir /jobfs/z07/247752.ac-pbs
[ac27:12442] universe default-universe-12442
[ac27:12442] user kxc565
[ac27:12442] host ac27
[ac27:12442] jobid 0
[ac27:12442] procid 0
[ac27:12442] procdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-
universe-12442/0/0
[ac27:12442] jobdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-
universe-12442/0
[ac27:12442] unidir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-
universe-12442
[ac27:12442] top: openmpi-sessions-kxc565@ac27_0
[ac27:12442] tmp: ??
[ac27:12442] [0,0,0] contact_file
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-
universe-12442/universe-setup.txt
[ac27:12442] [0,0,0] wrote setup file
[ac27:12447] [0,0,1] setting up session dir with
[ac27:12447] universe default-universe-12442
[ac27:12447] user kxc565
[ac27:12447] host ac27
[ac27:12447] jobid 0
[ac27:12447] procid 1
[ac27:12447] procdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-
universe-12442/0/1
[ac27:12447] jobdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-
universe-12442/0
[ac27:12447] unidir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-
universe-12442
[ac27:12447] top: openmpi-sessions-kxc565@ac27_0
[ac27:12447] tmp: /jobfs/z07/247752.ac-pbs
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
base/iof_base_setup.c at line 106
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
odls_default_module.c at line 663
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
odls_default_module.c at line 1191
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file orted.c
at
line 594
[ac27:12442] spawn: in job_state_callback(jobid = 1, state = 0x80)
mpirun noticed that job rank 0 with PID 0 on node ac27 exited on
signal
15 (Terminated).
[ac27:12447] sess_dir_finalize: job session dir not empty - leaving
[ac27:12447] sess_dir_finalize: proc session dir not empty - leaving
[ac27:12442] sess_dir_finalize: proc session dir not empty - leaving


Thanks,
Clement

Clement Kam Man Chu wrote:


Hi,

I am using openmpi 1.2.3 under ia64 machine. I typed "mpirun -d --
tmpdir
/home/565/kxc565/tmpdir -mca btl sm -np 400 ./testprogram". I found
only
the first process can use my parameter setting to store tmp file,  
but
the second process used its default setting to store tmp file in / 
tmp

directory. How can I change all processes stored in a directory I
required? I have attached the message from openmpi for more in
details.
Thanks for any help.

Cheers,
Clement


[ac27:27928] [0,0,0] setting up session dir with
[ac27:27928] tmpdir /home/565/kxc565/tmpdir
[ac27:27928] universe default-universe-27928
[ac27:27928] user kxc565
[ac27:27928] host ac27
[ac27:27928] jobid 0
[ac27:27928] procid 0
[ac27:27928] procdir:
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default-
universe-27928/0/0
[ac27:27928] jobdir:
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default-

[OMPI users] MPI daemons suspend running job

2007-11-15 Thread Murat Knecht
Hi,
I am encountering the problem that a working child process is frozen in
the middle of its work, and continues only when its parent process (
which spawned it earlier on ) calls some MPI function.
The issue here is, that in order to accept client socket communication
the parent process is, at the same time, performing a select() on a
server socket. It expects the child to finish its work, which will be
collected and processed at a later date. This now cannot happen as the
child does not continue until the parent process calls Recv(), Probe()
or sends something itself.
Is there the possibility that some mpi daemon freezes the child process
if its parent process goes to "sleep" while listening to a socket? If
so, how can I avoid it? These are independent processes and even though
interlinked should not influence each other in this way.
Regards,
Murat


Re: [OMPI users] Suggestions on multi-compiler/multi-mpi build?

2007-11-15 Thread Brock Palen
Really modules is the only way to go, we use it to maintain no less  
than 12 versions of openmpi compiled with pgi, gcc, nag, and intel.


Yes as the other reply said just set LD_LIBRARY_PATH  you can use

ldd executable

to see which libraries the executable is going to use under the  
current environment.  We also have uses submit to torque with

#PBS -V

which copies the current environment with the job, this allows users  
to submit and have multiple jobs running each with their own  
environment.  If your using PBS or torque look at the qsub man page.


Another addition we came up with was after we load a default set a  
modules (pbs, a compiler , mpi library)  we look for a file in


$HOME/privatemodules/default

which just has a list of module commands.  This allows users to  
change their default modules for every login.

For example i have in my default file

module load git

Because i use git for managing files.

We have done other hacks, our webpage of software is built out of  
cron from the module files:

http://cac.engin.umich.edu/resources/systems/nyxV2/software.html

So we dont maintain software lists online, we just generate it  
dynamically.

Modules is the admins best friend.

Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On Nov 15, 2007, at 8:54 AM, Jim Kusznir wrote:


Hi all:

I'm trying to set up a cluster for a group of users with very
different needs.  So far, it looks like I need gcc, pgi, and intel to
work with openmpi and mpich, with each user able to control what
combination they get.  This is turning out to be much more difficult
than I expected.

Someone has pointed me to enviornment-modules ("Modules"), which looks
like it will be a critical part of the solution.  I even noticed that
the provided openmpi.spec file has some direct support for modules,
which makes me quite happy!

However, I still have many questions about how to set things up.

First, I get the impression that openmpi will need to be compiled with
each compiler that will use it.  If this is true, I'm not quite sure
how to go about it.  I could install in different directories for the
user commands, but what about the libraries?  I don't think I have a
feesable way of selecting which library to use on the fly on the
entire cluster for each user, so it seems like it would be better to
have all the libraries available.  In addition, I will need RPMs to
deploy efficiently on the cluster.  I suspect I can change the
versioning info and build with each compiler, but at this point, I
don't even know how to reliably select what compiler rpmbuild will use
(I've only succeeded in using gcc).

Finally, using modules, how do I set it up so that if a user changes
compilers, but stays with openmpi, it will load the correct openmpi
paths?  I know I can set up the openmpi module file to load after the
compiler module and based on that select different paths based on the
currently-loaded compiler module.  If the user changes the compiler
module, will that cause the mpi module to also be reloaded so the new
settings will be loaded?  Or do I need this at all?

Thanks for all your help!
--Jim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [MTT users] unable to pull tests

2007-11-15 Thread Karol Mroz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jeff Squyres wrote:
> Done.
> 
> FWIW, we don't give out access to the ompi-tests repository to anyone  
> except those on the Open MPI development team.  The repository  
> contains a bunch of MPI test suites that are all publicly available,  
> but we've never bothered to look into redistribution rights.
> 
> 
> On Nov 14, 2007, at 3:34 PM, Karol Mroz wrote:
> 
> Ethan Mallove wrote:
 Do you have access to the ompi-tests repository?
 What happens if you do this command outside of MTT?

  $ svn export https://svn.open-mpi.org/svn/ompi-tests/trunk/onesided

 You could also try using "http", instead of "https" in
 your svn_url.

 -Ethan
> Hi, Ethan.
> 
> So I just recently received access to Open MPI devel repository (ompi)
> but clearly this did not include access to ompi-tests :) I'll email  
> Jeff
> as he was responsible for granting me access to ompi.
> 
> Thanks for the help!
> 
___
mtt-users mailing list
mtt-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

Thanks very much, Jeff.

- --
Karol Mroz
km...@cs.ubc.ca
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHPIEMuoug78g/Mz8RAq9vAJwOf+r9RFNAyjaB8cWNZZZLxgViKgCfTZgG
c4o/GBC3SGKD3w1zzGzhT9Q=
=jZtc
-END PGP SIGNATURE-


Re: [OMPI users] Suggestions on multi-compiler/multi-mpi build?

2007-11-15 Thread Katherine Holcomb
We have almost exactly the situation you describe with our clusters.  I'm 
not the system administrator but I am the one who actually writes the module 
scripts.


It is best to compile OpenMPI (and any other MPI) with each compiler 
separately; this is especially necessary for the Fortran and C++ bindings. 
We simply have a directory layout like


/opt/openmpi/intel
/opt/openmpi/pgi
/opt/openmpi/gnu

etc.

To compile OpenMPI with a different compiler, you can use environment 
variables such as (for the Intel compiler)

FC=ifort
F90=ifort
CC=icc
CXX=icpc
You set these within the %build part of the rpm spec file.

Similarly for MPICH2.  (I would recommend MPICH2 over MPICH1 at this point.)

Users load a module to select a compiler+MPI combination.  Loading the 
appropriate module sets the path for the version of mpicc/cxx/f90 chosen by 
the user.  That in turn knows where its associated libraries live.


Most users link MPI libraries statically.  If dynamic libraries are needed 
then they can be added to LD_LIBRARY_PATH by the module script.


I am away from my office until after the Thanksgiving holiday but if you 
email me personally
then, I can send you some sample module scripts.  I can also send you some 
sample spec files for the rpms we use.


- Original Message - 
From: "Jim Kusznir" 

To: 
Sent: Thursday, November 15, 2007 11:54 AM
Subject: [OMPI users] Suggestions on multi-compiler/multi-mpi build?



Hi all:

I'm trying to set up a cluster for a group of users with very
different needs.  So far, it looks like I need gcc, pgi, and intel to
work with openmpi and mpich, with each user able to control what
combination they get.  This is turning out to be much more difficult
than I expected.

Someone has pointed me to enviornment-modules ("Modules"), which looks
like it will be a critical part of the solution.  I even noticed that
the provided openmpi.spec file has some direct support for modules,
which makes me quite happy!

However, I still have many questions about how to set things up.

First, I get the impression that openmpi will need to be compiled with
each compiler that will use it.  If this is true, I'm not quite sure
how to go about it.  I could install in different directories for the
user commands, but what about the libraries?  I don't think I have a
feesable way of selecting which library to use on the fly on the
entire cluster for each user, so it seems like it would be better to
have all the libraries available.  In addition, I will need RPMs to
deploy efficiently on the cluster.  I suspect I can change the
versioning info and build with each compiler, but at this point, I
don't even know how to reliably select what compiler rpmbuild will use
(I've only succeeded in using gcc).

Finally, using modules, how do I set it up so that if a user changes
compilers, but stays with openmpi, it will load the correct openmpi
paths?  I know I can set up the openmpi module file to load after the
compiler module and based on that select different paths based on the
currently-loaded compiler module.  If the user changes the compiler
module, will that cause the mpi module to also be reloaded so the new
settings will be loaded?  Or do I need this at all?

Thanks for all your help!
--Jim
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] Tmpdir work for first process only

2007-11-15 Thread Clement Kam Man Chu

Jeff Squyres wrote:

Thanks for your reply.  I am using pbs job scheduler and I reqested 16 
cpus to run 400 processes, but I don't how many processes are allocated 
on each cpus.  Do you think it is a problem?


Clement
Are you running all of these processes on the same machine, or  
multiple different machines?


If you're running 400 processes on the same machine, it may well be  
that you are simply running out of memory or other OS resources.  In  
particular, I've never seem iof fail that way before (iof is our I/O  
forwarding subsystem).


Looking at the iof code, the error you're seeing occurs when iof is  
trying to create a pipe between our OMPI "helper daemon" and the newly  
spawned user executable and fails.  The only reason that I can guess  
for why that would happen is if a max limit of pipes have been created  
on a machine and the OS refuses to create any more...?




On Nov 14, 2007, at 9:36 PM, Clement Kam Man Chu wrote:

  

Hi,

I have configured out why the tmpdir parameter works for the first
process. I got another problem if I tried to run 400 processes (no
problem if under 400 processes). I got an error "ORTE_ERROR_LOG: Out  
of

resource in file base/iof_base_setup.c at line 106". I attached the
message as below:

[ac27:12442] [0,0,0] setting up session dir with
[ac27:12442] tmpdir /jobfs/z07/247752.ac-pbs
[ac27:12442] universe default-universe-12442
[ac27:12442] user kxc565
[ac27:12442] host ac27
[ac27:12442] jobid 0
[ac27:12442] procid 0
[ac27:12442] procdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/0/0

[ac27:12442] jobdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/0

[ac27:12442] unidir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442

[ac27:12442] top: openmpi-sessions-kxc565@ac27_0
[ac27:12442] tmp: ??
[ac27:12442] [0,0,0] contact_file
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/universe-setup.txt

[ac27:12442] [0,0,0] wrote setup file
[ac27:12447] [0,0,1] setting up session dir with
[ac27:12447] universe default-universe-12442
[ac27:12447] user kxc565
[ac27:12447] host ac27
[ac27:12447] jobid 0
[ac27:12447] procid 1
[ac27:12447] procdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/0/1

[ac27:12447] jobdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/0

[ac27:12447] unidir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442

[ac27:12447] top: openmpi-sessions-kxc565@ac27_0
[ac27:12447] tmp: /jobfs/z07/247752.ac-pbs
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
base/iof_base_setup.c at line 106
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
odls_default_module.c at line 663
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
odls_default_module.c at line 1191
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file orted.c  
at

line 594
[ac27:12442] spawn: in job_state_callback(jobid = 1, state = 0x80)
mpirun noticed that job rank 0 with PID 0 on node ac27 exited on  
signal

15 (Terminated).
[ac27:12447] sess_dir_finalize: job session dir not empty - leaving
[ac27:12447] sess_dir_finalize: proc session dir not empty - leaving
[ac27:12442] sess_dir_finalize: proc session dir not empty - leaving


Thanks,
Clement

Clement Kam Man Chu wrote:


Hi,

I am using openmpi 1.2.3 under ia64 machine. I typed "mpirun -d -- 
tmpdir
/home/565/kxc565/tmpdir -mca btl sm -np 400 ./testprogram". I found  
only

the first process can use my parameter setting to store tmp file, but
the second process used its default setting to store tmp file in /tmp
directory. How can I change all processes stored in a directory I
required? I have attached the message from openmpi for more in  
details.

Thanks for any help.

Cheers,
Clement


[ac27:27928] [0,0,0] setting up session dir with
[ac27:27928] tmpdir /home/565/kxc565/tmpdir
[ac27:27928] universe default-universe-27928
[ac27:27928] user kxc565
[ac27:27928] host ac27
[ac27:27928] jobid 0
[ac27:27928] procid 0
[ac27:27928] procdir:
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default- 
universe-27928/0/0

[ac27:27928] jobdir:
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default- 
universe-27928/0

[ac27:27928] unidir:
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default- 
universe-27928

[ac27:27928] top: openmpi-sessions-kxc565@ac27_0
[ac27:27928] tmp: ?
[ac27:27928] [0,0,0] contact_file
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default- 
universe-27928/universe-setup.txt

[ac27:27928] [0,0,0] wrote setup file
[ac27:27932] [0,0,1] setting up session dir with
[ac27:27932] universe default-universe-27928
[ac27:27932] user kxc565
[ac27:27932] host ac27
[ac27:27932] jobid 0
[ac27:27932] procid 1
[ac27:27932] procdir:
/tmp/openmpi-sessions-kxc565@ac27_0/default-universe-27928/0/1
[ac27:27932] jobdir:

Re: [OMPI users] Error on running large number of processes

2007-11-15 Thread Jeff Squyres
My guess is that this is similar to the last post: you are  
oversubscribing the nodes so heavily that the OS is running out of  
some resources (perhaps regular or registered memory?) such that Open  
MPI is unable to setup its network transport layers properly.



On Nov 15, 2007, at 6:35 AM, Clement Kam Man Chu wrote:


Hi,

   I am using openmpi 1.2.3 under ia64 machine and uses pbs job
scheduler.  I can successfully run 100 processes on 16 cpus, but I got
an error If run 200 processes on the same number of cpus.  The error  
is :


 PML add procs failed
 --> Returned "Temporarily out of resource" (-3) instead of  
"Success" (0)



Please help.

Regards,
Clement

--
Clement Kam Man Chu
Research Assistant
Faculty of Information Technology
Monash University, Caulfield Campus
Ph: 61 3 9903 2355

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Tmpdir work for first process only

2007-11-15 Thread Jeff Squyres
Are you running all of these processes on the same machine, or  
multiple different machines?


If you're running 400 processes on the same machine, it may well be  
that you are simply running out of memory or other OS resources.  In  
particular, I've never seem iof fail that way before (iof is our I/O  
forwarding subsystem).


Looking at the iof code, the error you're seeing occurs when iof is  
trying to create a pipe between our OMPI "helper daemon" and the newly  
spawned user executable and fails.  The only reason that I can guess  
for why that would happen is if a max limit of pipes have been created  
on a machine and the OS refuses to create any more...?




On Nov 14, 2007, at 9:36 PM, Clement Kam Man Chu wrote:


Hi,

I have configured out why the tmpdir parameter works for the first
process. I got another problem if I tried to run 400 processes (no
problem if under 400 processes). I got an error "ORTE_ERROR_LOG: Out  
of

resource in file base/iof_base_setup.c at line 106". I attached the
message as below:

[ac27:12442] [0,0,0] setting up session dir with
[ac27:12442] tmpdir /jobfs/z07/247752.ac-pbs
[ac27:12442] universe default-universe-12442
[ac27:12442] user kxc565
[ac27:12442] host ac27
[ac27:12442] jobid 0
[ac27:12442] procid 0
[ac27:12442] procdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/0/0

[ac27:12442] jobdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/0

[ac27:12442] unidir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442

[ac27:12442] top: openmpi-sessions-kxc565@ac27_0
[ac27:12442] tmp: ??
[ac27:12442] [0,0,0] contact_file
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/universe-setup.txt

[ac27:12442] [0,0,0] wrote setup file
[ac27:12447] [0,0,1] setting up session dir with
[ac27:12447] universe default-universe-12442
[ac27:12447] user kxc565
[ac27:12447] host ac27
[ac27:12447] jobid 0
[ac27:12447] procid 1
[ac27:12447] procdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/0/1

[ac27:12447] jobdir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442/0

[ac27:12447] unidir:
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default- 
universe-12442

[ac27:12447] top: openmpi-sessions-kxc565@ac27_0
[ac27:12447] tmp: /jobfs/z07/247752.ac-pbs
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
base/iof_base_setup.c at line 106
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
odls_default_module.c at line 663
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
odls_default_module.c at line 1191
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file orted.c  
at

line 594
[ac27:12442] spawn: in job_state_callback(jobid = 1, state = 0x80)
mpirun noticed that job rank 0 with PID 0 on node ac27 exited on  
signal

15 (Terminated).
[ac27:12447] sess_dir_finalize: job session dir not empty - leaving
[ac27:12447] sess_dir_finalize: proc session dir not empty - leaving
[ac27:12442] sess_dir_finalize: proc session dir not empty - leaving


Thanks,
Clement

Clement Kam Man Chu wrote:

Hi,

I am using openmpi 1.2.3 under ia64 machine. I typed "mpirun -d -- 
tmpdir
/home/565/kxc565/tmpdir -mca btl sm -np 400 ./testprogram". I found  
only

the first process can use my parameter setting to store tmp file, but
the second process used its default setting to store tmp file in /tmp
directory. How can I change all processes stored in a directory I
required? I have attached the message from openmpi for more in  
details.

Thanks for any help.

Cheers,
Clement


[ac27:27928] [0,0,0] setting up session dir with
[ac27:27928] tmpdir /home/565/kxc565/tmpdir
[ac27:27928] universe default-universe-27928
[ac27:27928] user kxc565
[ac27:27928] host ac27
[ac27:27928] jobid 0
[ac27:27928] procid 0
[ac27:27928] procdir:
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default- 
universe-27928/0/0

[ac27:27928] jobdir:
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default- 
universe-27928/0

[ac27:27928] unidir:
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default- 
universe-27928

[ac27:27928] top: openmpi-sessions-kxc565@ac27_0
[ac27:27928] tmp: ?
[ac27:27928] [0,0,0] contact_file
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default- 
universe-27928/universe-setup.txt

[ac27:27928] [0,0,0] wrote setup file
[ac27:27932] [0,0,1] setting up session dir with
[ac27:27932] universe default-universe-27928
[ac27:27932] user kxc565
[ac27:27932] host ac27
[ac27:27932] jobid 0
[ac27:27932] procid 1
[ac27:27932] procdir:
/tmp/openmpi-sessions-kxc565@ac27_0/default-universe-27928/0/1
[ac27:27932] jobdir:
/tmp/openmpi-sessions-kxc565@ac27_0/default-universe-27928/0
[ac27:27932] unidir:
/tmp/openmpi-sessions-kxc565@ac27_0/default-universe-27928
[ac27:27932] top: openmpi-sessions-kxc565@ac27_0
[ac27:27932] tmp: /tmp
[ac27:27932] [0,0,1] 

[OMPI users] Error on running large number of processes

2007-11-15 Thread Clement Kam Man Chu

Hi,

   I am using openmpi 1.2.3 under ia64 machine and uses pbs job 
scheduler.  I can successfully run 100 processes on 16 cpus, but I got 
an error If run 200 processes on the same number of cpus.  The error is :


 PML add procs failed
 --> Returned "Temporarily out of resource" (-3) instead of "Success" (0)


Please help.

Regards,
Clement

--
Clement Kam Man Chu
Research Assistant
Faculty of Information Technology
Monash University, Caulfield Campus
Ph: 61 3 9903 2355



Re: [OMPI users] OpenMPI - compilation

2007-11-15 Thread Jeff Squyres

On Nov 14, 2007, at 10:48 PM, Sajjad wrote:


No i didn't find any executable after the issued the command "mpicc
mpitest1.c -o mpitest1"


If you're not finding the executable at all, then something else is  
very wrong.  The "mpicc" command is just a "wrapper" compiler, meaning  
that it takes your command line, adds some more flags, and then  
invokes the underlying compiler (e.g., gcc).  You can use the "-- 
showme" flag to see exactly what command mpicc actually invokes:


mpicc mpitest1.c -o mpitest1 --showme

Then try running the command that that shows manually and see what  
happens.  If your compiler is not producing executables at all, then  
you have some other (non-MPI) system-level issue.


And sorry for dumping such an irrelevant chunk of data to  the  
mailing list.



We actually request the output from ompi_info from users who are  
having run-time problems.  See the "getting help" page on the OMPI web  
site.


Also, I still strongly recommend upgrading to the latest stable  
version of Open MPI if possible.  The version you have (v1.1) should  
not be responsible for you not being able to create executables,  
though -- you may need to fix that independently of upgrading Open MPI.


--
Jeff Squyres
Cisco Systems



[OMPI users] OpenMPI - compilation

2007-11-15 Thread Sajjad
Hello Jeff,

No i didn't find any executable after the issued the command "mpicc
mpitest1.c -o mpitest1"


And sorry for dumping such an irrelevant chunk of data to  the mailing list.


Sajjad


[OMPI users] OpenMPI - compilation

2007-11-15 Thread Sajjad
Hello Jeff,

I thought that the following information will be helpful to track  the issue.


***


sajjad@sajjad:~$ ompi_info
Open MPI: 1.1
   Open MPI SVN revision: r10477
Open RTE: 1.1
   Open RTE SVN revision: r10477
OPAL: 1.1
   OPAL SVN revision: r10477
  Prefix: /usr
 Configured architecture: x86_64-pc-linux-gnu
   Configured by: buildd
   Configured on: Tue May  1 15:04:40 GMT 2007
  Configure host: yellow
Built by: buildd
Built on: Tue May  1 15:19:17 GMT 2007
  Built host: yellow
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (single underscore)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
  Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
   MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
   MCA mpool: openib (MCA v1.0, API v1.0, Component v1.1)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
  MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
 MCA btl: openib (MCA v1.0, API v1.0, Component v1.1)
 MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
  MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)
 MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)
 MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)
 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1)
 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
   MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1)
MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
 MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
 MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)
 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)
 MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1)
 MCA sds: env (MCA v1.0, API v1.0, Component v1.1)
 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)
 MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)
 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1)
 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1)


Re: [OMPI users] Tmpdir work for first process only

2007-11-15 Thread Clement Kam Man Chu

Hi,

I have configured out why the tmpdir parameter works for the first 
process. I got another problem if I tried to run 400 processes (no 
problem if under 400 processes). I got an error "ORTE_ERROR_LOG: Out of 
resource in file base/iof_base_setup.c at line 106". I attached the 
message as below:


[ac27:12442] [0,0,0] setting up session dir with
[ac27:12442] tmpdir /jobfs/z07/247752.ac-pbs
[ac27:12442] universe default-universe-12442
[ac27:12442] user kxc565
[ac27:12442] host ac27
[ac27:12442] jobid 0
[ac27:12442] procid 0
[ac27:12442] procdir: 
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-universe-12442/0/0
[ac27:12442] jobdir: 
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-universe-12442/0
[ac27:12442] unidir: 
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-universe-12442

[ac27:12442] top: openmpi-sessions-kxc565@ac27_0
[ac27:12442] tmp: ??
[ac27:12442] [0,0,0] contact_file 
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-universe-12442/universe-setup.txt

[ac27:12442] [0,0,0] wrote setup file
[ac27:12447] [0,0,1] setting up session dir with
[ac27:12447] universe default-universe-12442
[ac27:12447] user kxc565
[ac27:12447] host ac27
[ac27:12447] jobid 0
[ac27:12447] procid 1
[ac27:12447] procdir: 
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-universe-12442/0/1
[ac27:12447] jobdir: 
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-universe-12442/0
[ac27:12447] unidir: 
/jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565@ac27_0/default-universe-12442

[ac27:12447] top: openmpi-sessions-kxc565@ac27_0
[ac27:12447] tmp: /jobfs/z07/247752.ac-pbs
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file 
base/iof_base_setup.c at line 106
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file 
odls_default_module.c at line 663
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file 
odls_default_module.c at line 1191
[ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file orted.c at 
line 594

[ac27:12442] spawn: in job_state_callback(jobid = 1, state = 0x80)
mpirun noticed that job rank 0 with PID 0 on node ac27 exited on signal 
15 (Terminated).

[ac27:12447] sess_dir_finalize: job session dir not empty - leaving
[ac27:12447] sess_dir_finalize: proc session dir not empty - leaving
[ac27:12442] sess_dir_finalize: proc session dir not empty - leaving


Thanks,
Clement

Clement Kam Man Chu wrote:

Hi,

I am using openmpi 1.2.3 under ia64 machine. I typed "mpirun -d --tmpdir 
/home/565/kxc565/tmpdir -mca btl sm -np 400 ./testprogram". I found only 
the first process can use my parameter setting to store tmp file, but 
the second process used its default setting to store tmp file in /tmp 
directory. How can I change all processes stored in a directory I 
required? I have attached the message from openmpi for more in details. 
Thanks for any help.


Cheers,
Clement


[ac27:27928] [0,0,0] setting up session dir with
[ac27:27928] tmpdir /home/565/kxc565/tmpdir
[ac27:27928] universe default-universe-27928
[ac27:27928] user kxc565
[ac27:27928] host ac27
[ac27:27928] jobid 0
[ac27:27928] procid 0
[ac27:27928] procdir: 
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default-universe-27928/0/0
[ac27:27928] jobdir: 
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default-universe-27928/0
[ac27:27928] unidir: 
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default-universe-27928

[ac27:27928] top: openmpi-sessions-kxc565@ac27_0
[ac27:27928] tmp: ?
[ac27:27928] [0,0,0] contact_file 
/home/565/kxc565/tmpdir/openmpi-sessions-kxc565@ac27_0/default-universe-27928/universe-setup.txt

[ac27:27928] [0,0,0] wrote setup file
[ac27:27932] [0,0,1] setting up session dir with
[ac27:27932] universe default-universe-27928
[ac27:27932] user kxc565
[ac27:27932] host ac27
[ac27:27932] jobid 0
[ac27:27932] procid 1
[ac27:27932] procdir: 
/tmp/openmpi-sessions-kxc565@ac27_0/default-universe-27928/0/1
[ac27:27932] jobdir: 
/tmp/openmpi-sessions-kxc565@ac27_0/default-universe-27928/0
[ac27:27932] unidir: 
/tmp/openmpi-sessions-kxc565@ac27_0/default-universe-27928

[ac27:27932] top: openmpi-sessions-kxc565@ac27_0
[ac27:27932] tmp: /tmp
[ac27:27932] [0,0,1] ORTE_ERROR_LOG: Out of resource in file 
base/iof_base_setup.c at line 106
[ac27:27932] [0,0,1] ORTE_ERROR_LOG: Out of resource in file 
odls_default_module.c at line 663
[ac27:27932] [0,0,1] ORTE_ERROR_LOG: Out of resource in file 
odls_default_module.c at line 1191
[ac27:27932] [0,0,1] ORTE_ERROR_LOG: Out of resource in file orted.c at 
line 594

[ac27:27928] spawn: in job_state_callback(jobid = 1, state = 0x80)
mpirun noticed that job rank 0 with PID 0 on node ac27 exited on signal 
15 (Terminated).

[ac27:27932] sess_dir_finalize: job session dir not empty - leaving
[ac27:27932] sess_dir_finalize: proc session dir not empty - leaving
[ac27:27928] sess_dir_finalize: proc session dir not empty - leaving

  



--
Clement Kam Man Chu
Research