Re: [OMPI users] running externalprogram on same processor (Fortran)

2010-03-05 Thread Jeff Squyres
On Mar 5, 2010, at 2:38 PM, Ralph Castain wrote:

>> CALL SYSTEM("cd " // TRIM(dir) // " ; mpirun -machinefile ./machinefile -np 
>> 1 /home01/group/Execute/DLPOLY.X > job.out 2> job.err ; cd - > /dev/null")
> 
> That is guaranteed not to work. The problem is that mpirun sets environmental 
> variables for the original launch. Your system call carries over those 
> envars, causing mpirun to become confused.

You should be able to use MPI_COMM_SPAWN to launch this MPI job.  Check the man 
page for MPI_COMM_SPANW; I believe we have info keys to specify things like 
what hosts to launch on, etc.

>> Do you think MPI_COMM_SPAWN can help?
> 
> It's the only method supported by the MPI standard. If you need it to block 
> until this new executable completes, you could use a barrier or other MPI 
> method to determine it.

I believe that the user said they wanted to use the same cores as their 
original MPI job occupies for the new job -- they basically want the old job to 
block until the new job completes.  Keep in mind that OMPI busy-polls waiting 
for progress, so you might actually get hosed here (two procs competing for 
time on the same core).

I'm not immediately thinking of a good way to avoid this issue -- perhaps you 
could kludge something up such that the parent job polls on sleep() and 
checking to see if a message has arrived from the child (i.e., the last thing 
the child does before it calls MPI_FINALIZE is to send a message to its parents 
and then MPI_COMM_DISCONNECT from its parents).  If the parent finds that it 
has a message from the child(ren), it can MPI_COMM_DISCONNECT and continue 
processing.

Kinda hackey, but it might work...?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] running external program on same processor (Fortran)

2010-03-05 Thread Ralph Castain

On Mar 5, 2010, at 8:52 AM, abc def wrote:

> Hello,
> From within the MPI fortran program I run the following command:
> 
> CALL SYSTEM("cd " // TRIM(dir) // " ; mpirun -machinefile ./machinefile -np 1 
> /home01/group/Execute/DLPOLY.X > job.out 2> job.err ; cd - > /dev/null")

That is guaranteed not to work. The problem is that mpirun sets environmental 
variables for the original launch. Your system call carries over those envars, 
causing mpirun to become confused.

> 
> where "dir" is a process-number-dependent directory, to ensure the processes 
> don't over-write each other, and machinefile is written earlier by using 
> hostname to obtain the node of the current process, so this new program 
> launches using the same node as the one that launches it.
> 
> In fortran, the program automatically waits until the system call is complete.
> 
> Since you mentioned MPI_COMM_SPAWN, I looked into this. I read that it's 
> non-blocking, so somehow I'd have to prevent the program from moving forwards 
> until it was complete, and secondly, I need to "cd" into the directory I 
> mentioned above, before running the external command, and I don't know how 
> one would achieve this with this command.
> 
> Do you think MPI_COMM_SPAWN can help?

It's the only method supported by the MPI standard. If you need it to block 
until this new executable completes, you could use a barrier or other MPI 
method to determine it.


> I appreciate your time.
> 
> From: r...@open-mpi.org
> Date: Fri, 5 Mar 2010 07:55:59 -0700
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] running external programon  same
> processor   (Fortran)
> 
> How are you trying to start this external program? With an MPI_Comm_spawn? Or 
> are you just fork/exec'ing it?
> 
> How are you waiting for this external program to finish?
> 
> On Mar 5, 2010, at 7:52 AM, abc def wrote:
> 
> Hello,
> 
> Thanks for the comments. Indeed, until yesterday, I didn't realise the 
> difference between MVAPICH, MVAPICH2 and Open-MPI.
> 
> This problem has moved from mvapich2 to open-mpi now however, because I now 
> realise that the production environment uses Open-MPI, which means my 
> solution for mvapich2 doesn't work now. So if I may ask your kind assistance:
> 
> Just to re-cap, I have an MPI fortran program, which runs on N nodes, and 
> each node needs to run an external program. This is external program was 
> written for MPI, although I want to run it in serial as part of the process 
> on each node.
> 
> Is there any way at all to launch this external MPI program so it's treated 
> simply as a (serial) extension of the current node's processes? As I say, the 
> MPI originating program simply waits for the external program to finish 
> before continuing, so it it's essentially a bit like a "subroutine", in that 
> sense.
> 
> I'm having problems launching this external program from within my MPI 
> program, under the open-mpi system, even without worrying about node 
> assignment, and I think this might be because the system detects that I'm 
> trying to launch another process from one of the nodes, and stops it. I'm 
> guessing here, but it simply stops with an error saying the MPI process was 
> stopped.
> 
> Any help is very much appreciated; I have been going at this for more than a 
> day now and don't seem to be getting anywhere.
> 
> Thank you!
> 
> From: r...@open-mpi.org
> Date: Wed, 3 Mar 2010 07:24:32 -0700
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] running external program on sameprocessor   
> (Fortran)
> 
> It also would have been really helpful to know that you were using MVAPICH 
> and -not- Open MPI as this mailing list is for the latter. We could have 
> directed you to the appropriate place if we had known.
> 
> 
> On Mar 3, 2010, at 5:17 AM, abc def wrote:
> 
> I don't know (I'm a little new to this area), but I figured out how to get 
> around the problem:
> 
> Using SGE and MVAPICH2, the "-env MV2_CPU_MAPPING 0:1." option in mpiexec 
> seems to do the trick.
> 
> So when calling the external program with mpiexec, I map the called process 
> to the current core rank, and it seems to stay distributed and separated as I 
> want.
> 
> Hope someone else finds this useful in the future.
> 
> > Date: Wed, 3 Mar 2010 13:13:01 +1100
> > Subject: Re: [OMPI users] running external program on same  processor   
> > (Fortran)
> > 
> > Surely this is the problem of the scheduler that your system uses,
> > rather than MPI?
> > 
> > 
> > On Wed, 2010-03-03 at 00:48 +, abc def wrote:
> > > Hello,
> > > 
> > > I wonder if someone can help.
> > > 
> > > The situation is that I have an MPI-parallel fortran program. I run it
> > > and it's distributed on N cores, and each of these processes must call
> > > an external program.
> > > 
> > > This external program is also an MPI program, however I want to run it
> > > in serial, on the core that is calling it, as if it were part of the
> > > fortran program. The 

Re: [MTT users] [MTT bugs] [MTT] #212: Generic networklockingserver *REVIEW NEEDED*

2010-03-05 Thread Ethan Mallove
On Fri, Feb/19/2010 12:00:55PM, Ethan Mallove wrote:
> On Thu, Feb/18/2010 04:13:15PM, Jeff Squyres wrote:
> > On Feb 18, 2010, at 10:48 AM, Ethan Mallove wrote:
> > 
> > > To ensure there is never a collision between $a->{k} and $b->{k}, the
> > > user can have two MTT clients share a $scratch, but they cannot both
> > > run the same INI section simultaneously.  I setup my scheduler to run
> > > batches of MPI get, MPI install, Test get, Test build, and Test run
> > > sections in parallel with successor INI sections dependent on their
> > > predecessor INI sections (e.g., [Test run: foo] only runs after [Test
> > > build: foo] completes).  The limitation stinks, but the current
> > > limitation is much worse: two MTT clients can't even run the same
> > > *phase* out of one $scratch.
> > 
> > Maybe it might be a little nicer just to protect the user from
> > themselves -- if we ever detect a case where $a->{k} and $b->{k}
> > both exist and are not the same value, dump out everything to a file
> > and abort with an error message.  This is clearly an erroneous
> > situation, but running MTT in big parallel batches like this is a
> > worthwhile-but-complicated endeavor, and some people are likely to
> > get it wrong.  So we should at least detect the situation and fail
> > gracefully, rather than losing or corrupting results.
> > 
> > Make sense?
> 
> Yes.  I'll add this.

The check is there now.  Ready for review.

-Ethan

> 
> -Ethan
> 
> > 
> > > I originally wanted the .dump files to be completely safe, but MTT
> > > clients were getting locked out of the .dump files for way too long.
> > > E.g., MTT::MPI::LoadInstalls happens very early in client/mtt, and an
> > > hour could elapse before MTT::MPI::SaveInstalls is called in
> > > Install.pm.
> > 
> > Yep, if you lock from load->save, then that can definitely happen...
> > 
> > -- 
> > Jeff Squyres
> > jsquy...@cisco.com
> > 
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> > 
> > 
> > ___
> > mtt-users mailing list
> > mtt-us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
> ___
> mtt-users mailing list
> mtt-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
--- client/mtt  Mon Nov 09 14:38:09 2009 -0500
+++ client/mtt  Fri Mar 05 14:02:39 2010 -0500
@@ -498,6 +498,15 @@
 # execute on_start callback if exists
_do_step($ini, "mtt", "before_mtt_start_exec");

+# Process setenv, unsetenv, prepend_path, and append_path
+
+my $config;
+$config->{setenv} = Value($ini, "mtt", "setenv");
+$config->{unsetenv} = Value($ini, "mtt", "unsetenv");
+$config->{prepend_path} = Value($ini, "mtt", "prepend_path");
+$config->{append_path} = Value($ini, "mtt", "append_path");
+my @save_env;
+ProcessEnvKeys($config, \@save_env);

 # Set the logfile, if specified

--- lib/MTT/Defaults.pm Mon Nov 09 14:38:09 2009 -0500
+++ lib/MTT/Defaults.pm Fri Mar 05 14:02:39 2010 -0500
@@ -42,7 +42,7 @@

 known_compiler_names => [ "gnu", "pgi", "ibm", "intel", "kai", "absoft",
   "pathscale", "sun", "microsoft", "none", 
"unknown" ],
-known_resource_manager_names => [ "slurm", "tm", "loadleveler", "n1ge",
+known_resource_manager_names => [ "slurm", "tm", "loadleveler", "sge",
   "alps", "none", "unknown" ],
 known_network_names => [ "tcp", "udp", "ethernet", "gm", "mx", "verbs",
  "udapl", "psm", "elan", "portals", "shmem",
--- lib/MTT/MPI.pm  Mon Nov 09 14:38:09 2009 -0500
+++ lib/MTT/MPI.pm  Fri Mar 05 14:02:39 2010 -0500
@@ -16,6 +16,8 @@

 use strict;
 use MTT::Files;
+use MTT::Messages;
+use MTT::Util;

 #--

@@ -28,10 +30,13 @@
 #--

 # Filename where list of MPI sources is kept
-my $sources_data_filename = "mpi_sources.dump";
+my $sources_data_filename = "mpi_sources";

 # Filename where list of MPI installs is kept
-my $installs_data_filename = "mpi_installs.dump";
+my $installs_data_filename = "mpi_installs";
+
+# Filename extension for all the Dumper data files
+my $data_filename_extension = "dump";

 #--

@@ -42,10 +47,15 @@
 # Explicitly delete anything that was there
 $MTT::MPI::sources = undef;

-# If the file exists, read it in
-my $data;
-MTT::Files::load_dumpfile("$dir/$sources_data_filename", \$data);
-$MTT::MPI::sources = $data->{VAR1};
+my @dumpfiles = 
glob("$dir/$sources_data_filename-*.$data_filename_extension");
+foreach my $dumpfile (@dumpfiles) {
+
+# If the file exists, read it in
+my $data;
+MTT::Files::load_dumpfile($dumpfile, \$data);
+  

Re: [OMPI users] running external program on same processor (Fortran)

2010-03-05 Thread abc def

Hello,
>From within the MPI fortran program I run the following command:

CALL SYSTEM("cd " // TRIM(dir) // " ; mpirun -machinefile ./machinefile -np 1 
/home01/group/Execute/DLPOLY.X > job.out 2> job.err ; cd - > /dev/null")

where "dir" is a process-number-dependent directory, to ensure the processes 
don't over-write each other, and machinefile is written earlier by using 
hostname to obtain the node of the current process, so this new program 
launches using the same node as the one that launches it.

In fortran, the program automatically waits until the system call is complete.

Since you mentioned MPI_COMM_SPAWN, I looked into this. I read that it's 
non-blocking, so somehow I'd have to prevent the program from moving forwards 
until it was complete, and secondly, I need to "cd" into the directory I 
mentioned above, before running the external command, and I don't know how one 
would achieve this with this command.

Do you think MPI_COMM_SPAWN can help?
I appreciate your time.

From: r...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Fri, 5 Mar 2010 07:55:59 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] running external program  on  same
processor   (Fortran)



How are you trying to start this external program? With an MPI_Comm_spawn? Or 
are you just fork/exec'ing it?
How are you waiting for this external program to finish?
On Mar 5, 2010, at 7:52 AM, abc def wrote:Hello,

Thanks for the comments. Indeed, until yesterday, I didn't realise the 
difference between MVAPICH, MVAPICH2 and Open-MPI.

This problem has moved from mvapich2 to open-mpi now however, because I now 
realise that the production environment uses Open-MPI, which means my solution 
for mvapich2 doesn't work now. So if I may ask your kind assistance:

Just to re-cap, I have an MPI fortran program, which runs on N nodes, and each 
node needs to run an external program. This is external program was written for 
MPI, although I want to run it in serial as part of the process on each node.

Is there any way at all to launch this external MPI program so it's treated 
simply as a (serial) extension of the current node's processes? As I say, the 
MPI originating program simply waits for the external program to finish before 
continuing, so it it's essentially a bit like a "subroutine", in that sense.

I'm having problems launching this external program from within my MPI program, 
under the open-mpi system, even without worrying about node assignment, and I 
think this might be because the system detects that I'm trying to launch 
another process from one of the nodes, and stops it. I'm guessing here, but it 
simply stops with an error saying the MPI process was stopped.

Any help is very much appreciated; I have been going at this for more than a 
day now and don't seem to be getting anywhere.

Thank you!

From: r...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Wed, 3 Mar 2010 07:24:32 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] running external program on   sameprocessor   
(Fortran)

It also would have been really helpful to know that you were using MVAPICH and 
-not- Open MPI as this mailing list is for the latter. We could have directed 
you to the appropriate place if we had known.

On Mar 3, 2010, at 5:17 AM, abc def wrote:I don't know (I'm a little new to 
this area), but I figured out how to get around the problem:

Using SGE and MVAPICH2, the "-env MV2_CPU_MAPPING 0:1." option in mpiexec 
seems to do the trick.

So when calling the external program with mpiexec, I map the called process to 
the current core rank, and it seems to stay distributed and separated as I want.

Hope someone else finds this useful in the future.

> Date: Wed, 3 Mar 2010 13:13:01 +1100
> Subject: Re: [OMPI users] running external program on sameprocessor   
> (Fortran)
> 
> Surely this is the problem of the scheduler that your system uses,
> rather than MPI?
> 
> 
> On Wed, 2010-03-03 at 00:48 +, abc def wrote:
> > Hello,
> > 
> > I wonder if someone can help.
> > 
> > The situation is that I have an MPI-parallel fortran program. I run it
> > and it's distributed on N cores, and each of these processes must call
> > an external program.
> > 
> > This external program is also an MPI program, however I want to run it
> > in serial, on the core that is calling it, as if it were part of the
> > fortran program. The fortran program waits until the external program
> > has completed, and then continues.
> > 
> > The problem is that this external program seems to run on any core,
> > and not necessarily the (now idle) core that called it. This slows
> > things down a lot as you get one core doing multiple tasks.
> > 
> > Can anyone tell me how I can call the program and ensure it runs only
> > on the core that's calling it? Note that there are several cores per
> > node. I can ID the node by running the hostname command (I don't know
> > a way to do this for individual cores).
> > 

Re: [OMPI users] change hosts to restart the checkpoint

2010-03-05 Thread Josh Hursey
This type of failure is usually due to prelink'ing being left enabled  
on one or more of the systems. This has come up multiple times on the  
Open MPI list, but is actually a problem between BLCR and the Linux  
kernel. BLCR has a FAQ entry on this that you will want to check out:

  https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html#prelink

If that does not work, then we can look into other causes.

-- Josh

On Mar 5, 2010, at 3:06 AM, 马少杰 wrote:





2010-03-05
马少杰
Dear Sir:
   I want to use openmpi and blcr to checkpoint.However, I want  
restart the check point

on other hosts.  For example, I run mpi program using openmpi on
host1 and host2, and I save the checkpoint file at a nfs shared path.
Then I wan to restart the job (ompi-restart -machinefile ma  
ompi_global_snapshot_15865.ckpt) on host3 and
 host4. The 4 host have same hardware and software. If I change the  
hostname (host3 and host4) on machinfile, the error always  occur,

 [node182:27278] *** Process received signal ***
[node182:27278] Signal: Segmentation fault (11)
[node182:27278] Signal code: Address not mapped (1)
[node182:27278] Failing at address: 0x3b81009530
[node182:27275] *** Process received signal ***
[node182:27275] Signal: Segmentation fault (11)
[node182:27275] Signal code: Address not mapped (1)
[node182:27275] Failing at address: 0x3b81009530
[node182:27274] *** Process received signal ***
[node182:27274] Signal: Segmentation fault (11)
[node182:27274] Signal code: Address not mapped (1)
[node182:27274] Failing at address: 0x3b81009530
[node182:27276] *** Process received signal ***
[node182:27276] Signal: Segmentation fault (11)
[node182:27276] Signal code: Address not mapped (1)
[node182:27276] Failing at address: 0x3b81009530
--
mpirun noticed that process rank 9 with PID 27973 on node node183  
exited on signal 11 (Segmentation fault).


  if I comeback the hostname as host1 and host2, it can restart  
succesfully.


 my openmpi version is 1.3.4
 ./configure  --with-ft=cr --enable-mpi-threads --enable-ft-thread -- 
with-blcr=$dir --with-blcr-libdir=/$dir/lib --prefix=$dir_ompi -- 
enable-mpirun-prefix-by-default


 the command run the mpi progrom as
mpirun -np 8 --am ft-enable-cr --mca opal_cr_use_thread 0  - 
machinefile ma ./cpi


vim $HOME/.openmpi/mca-params.conf
crs_base_snapshot_dir=/tmp/cr
snapc_base_global_snapshot_dir=/disk/cr


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] running external program on same processor (Fortran)

2010-03-05 Thread Ralph Castain
How are you trying to start this external program? With an MPI_Comm_spawn? Or 
are you just fork/exec'ing it?

How are you waiting for this external program to finish?

On Mar 5, 2010, at 7:52 AM, abc def wrote:

> Hello,
> 
> Thanks for the comments. Indeed, until yesterday, I didn't realise the 
> difference between MVAPICH, MVAPICH2 and Open-MPI.
> 
> This problem has moved from mvapich2 to open-mpi now however, because I now 
> realise that the production environment uses Open-MPI, which means my 
> solution for mvapich2 doesn't work now. So if I may ask your kind assistance:
> 
> Just to re-cap, I have an MPI fortran program, which runs on N nodes, and 
> each node needs to run an external program. This is external program was 
> written for MPI, although I want to run it in serial as part of the process 
> on each node.
> 
> Is there any way at all to launch this external MPI program so it's treated 
> simply as a (serial) extension of the current node's processes? As I say, the 
> MPI originating program simply waits for the external program to finish 
> before continuing, so it it's essentially a bit like a "subroutine", in that 
> sense.
> 
> I'm having problems launching this external program from within my MPI 
> program, under the open-mpi system, even without worrying about node 
> assignment, and I think this might be because the system detects that I'm 
> trying to launch another process from one of the nodes, and stops it. I'm 
> guessing here, but it simply stops with an error saying the MPI process was 
> stopped.
> 
> Any help is very much appreciated; I have been going at this for more than a 
> day now and don't seem to be getting anywhere.
> 
> Thank you!
> 
> From: r...@open-mpi.org
> Date: Wed, 3 Mar 2010 07:24:32 -0700
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] running external program on sameprocessor   
> (Fortran)
> 
> It also would have been really helpful to know that you were using MVAPICH 
> and -not- Open MPI as this mailing list is for the latter. We could have 
> directed you to the appropriate place if we had known.
> 
> 
> On Mar 3, 2010, at 5:17 AM, abc def wrote:
> 
> I don't know (I'm a little new to this area), but I figured out how to get 
> around the problem:
> 
> Using SGE and MVAPICH2, the "-env MV2_CPU_MAPPING 0:1." option in mpiexec 
> seems to do the trick.
> 
> So when calling the external program with mpiexec, I map the called process 
> to the current core rank, and it seems to stay distributed and separated as I 
> want.
> 
> Hope someone else finds this useful in the future.
> 
> > Date: Wed, 3 Mar 2010 13:13:01 +1100
> > Subject: Re: [OMPI users] running external program on same  processor   
> > (Fortran)
> > 
> > Surely this is the problem of the scheduler that your system uses,
> > rather than MPI?
> > 
> > 
> > On Wed, 2010-03-03 at 00:48 +, abc def wrote:
> > > Hello,
> > > 
> > > I wonder if someone can help.
> > > 
> > > The situation is that I have an MPI-parallel fortran program. I run it
> > > and it's distributed on N cores, and each of these processes must call
> > > an external program.
> > > 
> > > This external program is also an MPI program, however I want to run it
> > > in serial, on the core that is calling it, as if it were part of the
> > > fortran program. The fortran program waits until the external program
> > > has completed, and then continues.
> > > 
> > > The problem is that this external program seems to run on any core,
> > > and not necessarily the (now idle) core that called it. This slows
> > > things down a lot as you get one core doing multiple tasks.
> > > 
> > > Can anyone tell me how I can call the program and ensure it runs only
> > > on the core that's calling it? Note that there are several cores per
> > > node. I can ID the node by running the hostname command (I don't know
> > > a way to do this for individual cores).
> > > 
> > > Thanks!
> > > 
> > > 
> > > 
> > > Extra information that might be helpful:
> > > 
> > > If I simply run the external program from the command line (ie, type
> > > "/path/myprogram.ex "), it runs fine. If I run it within the
> > > fortran program by calling it via
> > > 
> > > CALL SYSTEM("/path/myprogram.ex")
> > > 
> > > it doesn't run at all (doesn't even start) and everything crashes. I
> > > don't know why this is.
> > > 
> > > If I call it using mpiexec:
> > > 
> > > CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")
> > > 
> > > then it does work, but I get the problem that it can go on any core. 
> > > 
> > > __
> > > Do you want a Hotmail account? Sign-up now - Free
> > > ___
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > 

Re: [OMPI users] running external program on same processor (Fortran)

2010-03-05 Thread abc def

Hello,

Thanks for the comments. Indeed, until yesterday, I didn't realise the 
difference between MVAPICH, MVAPICH2 and Open-MPI.

This problem has moved from mvapich2 to open-mpi now however, because I now 
realise that the production environment uses Open-MPI, which means my solution 
for mvapich2 doesn't work now. So if I may ask your kind assistance:

Just to re-cap, I have an MPI fortran program, which runs on N nodes, and each 
node needs to run an external program. This is external program was written for 
MPI, although I want to run it in serial as part of the process on each node.

Is there any way at all to launch this external MPI program so it's treated 
simply as a (serial) extension of the current node's processes? As I say, the 
MPI originating program simply waits for the external program to finish before 
continuing, so it it's essentially a bit like a "subroutine", in that sense.

I'm having problems launching this external program from within my MPI program, 
under the open-mpi system, even without worrying about node assignment, and I 
think this might be because the system detects that I'm trying to launch 
another process from one of the nodes, and stops it. I'm guessing here, but it 
simply stops with an error saying the MPI process was stopped.

Any help is very much appreciated; I have been going at this for more than a 
day now and don't seem to be getting anywhere.

Thank you!

From: r...@open-mpi.org
List-Post: users@lists.open-mpi.org
Date: Wed, 3 Mar 2010 07:24:32 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] running external program on   sameprocessor   
(Fortran)



It also would have been really helpful to know that you were using MVAPICH and 
-not- Open MPI as this mailing list is for the latter. We could have directed 
you to the appropriate place if we had known.

On Mar 3, 2010, at 5:17 AM, abc def wrote:I don't know (I'm a little new to 
this area), but I figured out how to get around the problem:

Using SGE and MVAPICH2, the "-env MV2_CPU_MAPPING 0:1." option in mpiexec 
seems to do the trick.

So when calling the external program with mpiexec, I map the called process to 
the current core rank, and it seems to stay distributed and separated as I want.

Hope someone else finds this useful in the future.

> Date: Wed, 3 Mar 2010 13:13:01 +1100
> Subject: Re: [OMPI users] running external program on sameprocessor   
> (Fortran)
> 
> Surely this is the problem of the scheduler that your system uses,
> rather than MPI?
> 
> 
> On Wed, 2010-03-03 at 00:48 +, abc def wrote:
> > Hello,
> > 
> > I wonder if someone can help.
> > 
> > The situation is that I have an MPI-parallel fortran program. I run it
> > and it's distributed on N cores, and each of these processes must call
> > an external program.
> > 
> > This external program is also an MPI program, however I want to run it
> > in serial, on the core that is calling it, as if it were part of the
> > fortran program. The fortran program waits until the external program
> > has completed, and then continues.
> > 
> > The problem is that this external program seems to run on any core,
> > and not necessarily the (now idle) core that called it. This slows
> > things down a lot as you get one core doing multiple tasks.
> > 
> > Can anyone tell me how I can call the program and ensure it runs only
> > on the core that's calling it? Note that there are several cores per
> > node. I can ID the node by running the hostname command (I don't know
> > a way to do this for individual cores).
> > 
> > Thanks!
> > 
> > 
> > 
> > Extra information that might be helpful:
> > 
> > If I simply run the external program from the command line (ie, type
> > "/path/myprogram.ex "), it runs fine. If I run it within the
> > fortran program by calling it via
> > 
> > CALL SYSTEM("/path/myprogram.ex")
> > 
> > it doesn't run at all (doesn't even start) and everything crashes. I
> > don't know why this is.
> > 
> > If I call it using mpiexec:
> > 
> > CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")
> > 
> > then it does work, but I get the problem that it can go on any core. 
> > 
> > __
> > Do you want a Hotmail account? Sign-up now - Free
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Not got a Hotmail account? Sign-up now - Free 
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  
_
We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now

Re: [OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint

2010-03-05 Thread Joshua Hursey

On Mar 5, 2010, at 3:15 AM, 马少杰 wrote:

> Dear Sir:
> - What version of Open MPI are you using?
> my version is 1.3.4
>  - What configure options are you using?
> ./configure --with-ft=cr --enable-mpi-threads --enable-ft-thread 
> --with-blcr=$dir --with-blcr-libdir=/$dir/lib 
> --prefix=/public/mpi/openmpi134-gnu-cr --enable-mpirun-prefix-by-default
> make
> make install
>  - What MCA parameters are you using?
> mpirun -np 8 --am ft-enable-cr  -machinefile ma  xhpl
> vim $HOME/.openmpi/mca-params.conf
> # Local snapshot directory (not used in this scenario)
> crs_base_snapshot_dir=/home/me/tmp
> # Remote snapshot directory (globally mounted file system))
> snapc_base_global_snapshot_dir=/home/me/checkpoints
>  
>  
>  - Are you building from a release tarball or a SVN checkout?
> building from openmpi-1.3.4.tar.gz
>  
>  
> Now, I solve the problem successfully.
> I found that the mpirun command as
>  
> mpirun -np 8 --am ft-enable-cr  --mca opal_cr_use_thread 0  -machinefile ma  
> ./xhpl
>  
> the time cost is almost equal to the time cost by the command: mpirun -np 8  
> -machinefile ma  ./xhpl
>  
> I think it should be  a bug.

Since you have configured Open MPI to use the C/R thread (--enable-ft-thread) 
then Open MPI will start the concurrent C/R thread when you ask for C/R to be 
enabled. By default the thread polls very aggressively (waiting only 0 
microseconds, or the same as calling sched_yeild() on most systems). By turning 
it off you eliminate the contention the thread is causing on the system. There 
are two MCA parameters that control this behavior, links below:
  http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_thread_sleep_check
  http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_thread_sleep_wait

I agree that the default behavior is probably too aggressive for most 
applications. However by increasing these values the user is also increasing 
the amount of time before a checkpoint can begin. In my setup I usually set:
  opal_cr_thread_sleep_wait=1000
Which will throttle down the thread when the application is in the MPI library.

You might want to play around with these MCA parameters to tune the 
aggressiveness of the C/R thread to your performance needs. In the mean time I 
will look into finding better default parameters for these options.

Cheers,
Josh


>  
>  
> 2010-03-05
> 马少杰
> 发件人: Joshua Hursey
> 发送时间: 2010-03-05  00:07:19
> 收件人: Open MPI Users
> 抄送:
> 主题: Re: [OMPI users] low efficiency when we use --am ft-enable-cr tocheckpoint
> There is some overhead involved when activating the current C/R functionality 
> in Open MPI due to the wrapping of the internal point-to-point stack. The 
> wrapper (CRCP framework) tracks the signature of each message (not the 
> buffer, so constant time for any size MPI message) so that when we need to 
> quiesce the network we know of all the outstanding messages that need to be 
> drained.
>  
> So there is an overhead, but it should not be as significant as you have 
> mentioned. I looked at some of the performance aspects in the paper at the 
> link below:
>   http://www.open-mpi.org/papers/hpdc-2009/
> Though I did not look at HPL explicitly in this paper (just NPB, GROMACS, and 
> NetPipe), I have in testing and the time difference was definitely not 2x 
> (cannot recall the exact differences at the moment).
>  
> Can you tell me a bit about your setup:
>  - What version of Open MPI are you using?
>  - What configure options are you using?
>  - What MCA parameters are you using?
>  - Are you building from a release tarball or a SVN checkout?
>  
> -- Josh
>  
>  
> On Mar 3, 2010, at 10:07 PM, 马少杰 wrote:
>  
> >  
> >  
> > 2010-03-04
> > 马少杰
> > Dear Sir:
> >I want to use blcr  and openmpi to checkpoint, now I can save check 
> > point and restart my work successfully. How erver I find the option "--am 
> > ft-enable-cr" will case large cost . For example ,  when I run my HPL job  
> > without and with the option "--am ft-enable-cr" on 4 hosts (32 process, IB 
> > network) respectively , the time costed are   8m21.180sand 16m37.732s 
> > respctively. it is should be noted that I did not save the checkpoint when 
> > I run the job, the additional cost is caused by "--am ft-enable-cr" 
> > independently. Why can the optin "--am ft-enable-cr"  case so much system  
> > cost? Is it normal? How can I solve the problem.
> >   I also test  other mpi applications, the problem still exists.   
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
>  
>  
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneouscluster (32/64 bit machines)

2010-03-05 Thread TRINH Minh Hieu
Hi,

Thank you for those informations.
For the moment, I didn't encountered those problems yet. Maybe because, my
program don't use much memory (100MB) and the master machine have huge RAM
(8GB).
So meanwhile, the solution seems to be the parameter "btl_tcp_eager_limit"
but a cleaner solution is very welcome :-)

   TMHieu

2010/3/5 Aurélien Bouteiller :
> Hi,
>
> setting the eager limit to such a drastically high value will have the
effect of generating gigantic memory consumption for unexpected messages.
Any message you send which does not have a preposted ready recv will
mallocate 150mb of temporary storage, and will be memcopied from that
internal buffer to the recv buffer when the recv is posted. You should
expect very poor bandwidth and probably crash/abort due to memory exhaustion
on the receivers.
>
> Aurelien
> --
> Dr. Aurelien Bouteiller
> Innovative Computing Laboratory
> University of Tennessee
> Knoxville, TN, USA
>
>
> Le 4 mars 2010 à 09:02, TRINH Minh Hieu a écrit :
>
>> Hi,
>>
>> I have some new discovery about this problem :
>>
>> It seems that the array size sendable from a 32bit to 64bit machines
>> is proportional to the parameter "btl_tcp_eager_limit"
>> When I set it to 200 000 000 (2e08 bytes, about 190MB), I can send an
>> array up to 2e07 double (152MB).
>>
>> I didn't found much informations about btl_tcp_eager_limit other than
>> in the "ompi_info --all" command. If I let it at 2e08, will it impacts
>> the performance of OpenMPI ?
>>
>> It may be noteworth also that if the master (rank 0) is a 32bit
>> machines, I don't have segfault. I can send big array with small
>> "btl_tcp_eager_limit" from a 64bit machine to a 32bit one.
>>
>> Do I have to move this thread to devel mailing list ?
>>
>> Regards,
>>
>>   TMHieu
>>
>> On Tue, Mar 2, 2010 at 2:54 PM, TRINH Minh Hieu 
wrote:
>>> Hello,
>>>
>>> Yes, I compiled OpenMPI with --enable-heterogeneous. More precisely I
>>> compiled with :
>>> $ ./configure --prefix=/tmp/openmpi --enable-heterogeneous
>>> --enable-cxx-exceptions --enable-shared
>>> --enable-orterun-prefix-by-default
>>> $ make all install
>>>
>>> I attach the output of ompi_info of my 2 machines.
>>>
>>>TMHieu
>>>
>>> On Tue, Mar 2, 2010 at 1:57 PM, Jeff Squyres  wrote:
 Did you configure Open MPI with --enable-heterogeneous?

 On Feb 28, 2010, at 1:22 PM, TRINH Minh Hieu wrote:

> Hello,
>
> I have some problems running MPI on my heterogeneous cluster. More
> precisley i got segmentation fault when sending a large array (about
> 1) of double from a i686 machine to a x86_64 machine. It does not
> happen with small array. Here is the send/recv code source (complete
> source is in attached file) :
> code 
> if (me == 0 ) {
> for (int pe=1; pe {
> printf("Receiving from proc %d : ",pe);
fflush(stdout);
> d=(double *)malloc(sizeof(double)*n);
> MPI_Recv(d,n,MPI_DOUBLE,pe,999,MPI_COMM_WORLD,);
> printf("OK\n"); fflush(stdout);
> }
> printf("All done.\n");
> }
> else {
>   d=(double *)malloc(sizeof(double)*n);
>   MPI_Send(d,n,MPI_DOUBLE,0,999,MPI_COMM_WORLD);
> }
>  code 
>
> I got segmentation fault with n=1 but no error with n=1000
> I have 2 machines :
> sbtn155 : Intel Xeon, x86_64
> sbtn211 : Intel Pentium 4, i686
>
> The code is compiled in x86_64 and i686 machine, using OpenMPI 1.4.1,
> installed in /tmp/openmpi :
> [mhtrinh@sbtn211 heterogenous]$ make hetero
> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o
hetero.i686.o
> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
> hetero.i686.o -o hetero.i686 -lm
>
> [mhtrinh@sbtn155 heterogenous]$ make hetero
> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o
hetero.x86_64.o
> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
> hetero.x86_64.o -o hetero.x86_64 -lm
>
> I run with the code using appfile and got thoses error :
> $ cat appfile
> --host sbtn155 -np 1 hetero.x86_64
> --host sbtn155 -np 1 hetero.x86_64
> --host sbtn211 -np 1 hetero.i686
>
> $ mpirun -hetero --app appfile
> Input array length :
> 1
> Receiving from proc 1 : OK
> Receiving from proc 2 : [sbtn155:26386] *** Process received signal
***
> [sbtn155:26386] Signal: Segmentation fault (11)
> [sbtn155:26386] Signal code: Address not mapped (1)
> [sbtn155:26386] Failing at address: 0x200627bd8
> [sbtn155:26386] [ 0] /lib64/libpthread.so.0 [0x3fa4e0e540]
> [sbtn155:26386] [ 1] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so
[0x2d8d7908]
> [sbtn155:26386] [ 2] 

Re: [OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint

2010-03-05 Thread 马少杰
Dear Sir:
- What version of Open MPI are you using?
my version is 1.3.4
 - What configure options are you using?
./configure --with-ft=cr --enable-mpi-threads --enable-ft-thread 
--with-blcr=$dir --with-blcr-libdir=/$dir/lib 
--prefix=/public/mpi/openmpi134-gnu-cr --enable-mpirun-prefix-by-default
make 
make install
 - What MCA parameters are you using?
mpirun -np 8 --am ft-enable-cr  -machinefile ma  xhpl
vim $HOME/.openmpi/mca-params.conf
# Local snapshot directory (not used in this scenario)
crs_base_snapshot_dir=/home/me/tmp
# Remote snapshot directory (globally mounted file system))
snapc_base_global_snapshot_dir=/home/me/checkpoints


 - Are you building from a release tarball or a SVN checkout?
building from openmpi-1.3.4.tar.gz


Now, I solve the problem successfully.
I found that the mpirun command as

mpirun -np 8 --am ft-enable-cr  --mca opal_cr_use_thread 0  -machinefile ma  
./xhpl

the time cost is almost equal to the time cost by the command: mpirun -np 8  
-machinefile ma  ./xhpl

I think it should be  a bug. 


2010-03-05 



马少杰 



发件人: Joshua Hursey 
发送时间: 2010-03-05  00:07:19 
收件人: Open MPI Users 
抄送: 
主题: Re: [OMPI users] low efficiency when we use --am ft-enable-cr tocheckpoint 
 
There is some overhead involved when activating the current C/R functionality 
in Open MPI due to the wrapping of the internal point-to-point stack. The 
wrapper (CRCP framework) tracks the signature of each message (not the buffer, 
so constant time for any size MPI message) so that when we need to quiesce the 
network we know of all the outstanding messages that need to be drained.

So there is an overhead, but it should not be as significant as you have 
mentioned. I looked at some of the performance aspects in the paper at the link 
below:
  http://www.open-mpi.org/papers/hpdc-2009/
Though I did not look at HPL explicitly in this paper (just NPB, GROMACS, and 
NetPipe), I have in testing and the time difference was definitely not 2x 
(cannot recall the exact differences at the moment).

Can you tell me a bit about your setup:
 - What version of Open MPI are you using?
 - What configure options are you using?
 - What MCA parameters are you using?
 - Are you building from a release tarball or a SVN checkout?

-- Josh


On Mar 3, 2010, at 10:07 PM, 马少杰 wrote:

>  
>  
> 2010-03-04
> 马少杰
> Dear Sir:
>I want to use blcr  and openmpi to checkpoint, now I can save check 
> point and restart my work successfully. How erver I find the option "--am 
> ft-enable-cr" will case large cost . For example ,  when I run my HPL job  
> without and with the option "--am ft-enable-cr" on 4 hosts (32 process, IB 
> network) respectively , the time costed are   8m21.180sand 16m37.732s 
> respctively. it is should be noted that I did not save the checkpoint when I 
> run the job, the additional cost is caused by "--am ft-enable-cr" 
> independently. Why can the optin "--am ft-enable-cr"  case so much system  
> cost? Is it normal? How can I solve the problem.
>   I also test  other mpi applications, the problem still exists.   
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] change hosts to restart the checkpoint

2010-03-05 Thread 马少杰



2010-03-05 



马少杰 



Dear Sir:
   I want to use openmpi and blcr to checkpoint.However, I want restart the 
check point
on other hosts.  For example, I run mpi program using openmpi on
host1 and host2, and I save the checkpoint file at a nfs shared path.
Then I wan to restart the job (ompi-restart -machinefile ma 
ompi_global_snapshot_15865.ckpt) on host3 and
 host4. The 4 host have same hardware and software. If I change the hostname 
(host3 and host4) on machinfile, the error always  occur,
 [node182:27278] *** Process received signal ***
[node182:27278] Signal: Segmentation fault (11)
[node182:27278] Signal code: Address not mapped (1)
[node182:27278] Failing at address: 0x3b81009530
[node182:27275] *** Process received signal ***
[node182:27275] Signal: Segmentation fault (11)
[node182:27275] Signal code: Address not mapped (1)
[node182:27275] Failing at address: 0x3b81009530
[node182:27274] *** Process received signal ***
[node182:27274] Signal: Segmentation fault (11)
[node182:27274] Signal code: Address not mapped (1)
[node182:27274] Failing at address: 0x3b81009530
[node182:27276] *** Process received signal ***
[node182:27276] Signal: Segmentation fault (11)
[node182:27276] Signal code: Address not mapped (1)
[node182:27276] Failing at address: 0x3b81009530
--
mpirun noticed that process rank 9 with PID 27973 on node node183 exited on 
signal 11 (Segmentation fault).

  if I comeback the hostname as host1 and host2, it can restart succesfully.

 my openmpi version is 1.3.4
 ./configure  --with-ft=cr --enable-mpi-threads --enable-ft-thread 
--with-blcr=$dir --with-blcr-libdir=/$dir/lib --prefix=$dir_ompi 
--enable-mpirun-prefix-by-default

 the command run the mpi progrom as 
mpirun -np 8 --am ft-enable-cr --mca opal_cr_use_thread 0  -machinefile ma ./cpi

vim $HOME/.openmpi/mca-params.conf
crs_base_snapshot_dir=/tmp/cr
snapc_base_global_snapshot_dir=/disk/cr


Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneouscluster (32/64 bit machines)

2010-03-05 Thread Aurélien Bouteiller
Hi, 

setting the eager limit to such a drastically high value will have the effect 
of generating gigantic memory consumption for unexpected messages. Any message 
you send which does not have a preposted ready recv will mallocate 150mb of 
temporary storage, and will be memcopied from that internal buffer to the recv 
buffer when the recv is posted. You should expect very poor bandwidth and 
probably crash/abort due to memory exhaustion on the receivers.

Aurelien
--
Dr. Aurelien Bouteiller
Innovative Computing Laboratory
University of Tennessee
Knoxville, TN, USA


Le 4 mars 2010 à 09:02, TRINH Minh Hieu a écrit :

> Hi,
> 
> I have some new discovery about this problem :
> 
> It seems that the array size sendable from a 32bit to 64bit machines
> is proportional to the parameter "btl_tcp_eager_limit"
> When I set it to 200 000 000 (2e08 bytes, about 190MB), I can send an
> array up to 2e07 double (152MB).
> 
> I didn't found much informations about btl_tcp_eager_limit other than
> in the "ompi_info --all" command. If I let it at 2e08, will it impacts
> the performance of OpenMPI ?
> 
> It may be noteworth also that if the master (rank 0) is a 32bit
> machines, I don't have segfault. I can send big array with small
> "btl_tcp_eager_limit" from a 64bit machine to a 32bit one.
> 
> Do I have to move this thread to devel mailing list ?
> 
> Regards,
> 
>   TMHieu
> 
> On Tue, Mar 2, 2010 at 2:54 PM, TRINH Minh Hieu  wrote:
>> Hello,
>> 
>> Yes, I compiled OpenMPI with --enable-heterogeneous. More precisely I
>> compiled with :
>> $ ./configure --prefix=/tmp/openmpi --enable-heterogeneous
>> --enable-cxx-exceptions --enable-shared
>> --enable-orterun-prefix-by-default
>> $ make all install
>> 
>> I attach the output of ompi_info of my 2 machines.
>> 
>>TMHieu
>> 
>> On Tue, Mar 2, 2010 at 1:57 PM, Jeff Squyres  wrote:
>>> Did you configure Open MPI with --enable-heterogeneous?
>>> 
>>> On Feb 28, 2010, at 1:22 PM, TRINH Minh Hieu wrote:
>>> 
 Hello,
 
 I have some problems running MPI on my heterogeneous cluster. More
 precisley i got segmentation fault when sending a large array (about
 1) of double from a i686 machine to a x86_64 machine. It does not
 happen with small array. Here is the send/recv code source (complete
 source is in attached file) :
 code 
 if (me == 0 ) {
 for (int pe=1; pe