Re: [OMPI users] running externalprogram on same processor (Fortran)
On Mar 5, 2010, at 2:38 PM, Ralph Castain wrote: >> CALL SYSTEM("cd " // TRIM(dir) // " ; mpirun -machinefile ./machinefile -np >> 1 /home01/group/Execute/DLPOLY.X > job.out 2> job.err ; cd - > /dev/null") > > That is guaranteed not to work. The problem is that mpirun sets environmental > variables for the original launch. Your system call carries over those > envars, causing mpirun to become confused. You should be able to use MPI_COMM_SPAWN to launch this MPI job. Check the man page for MPI_COMM_SPANW; I believe we have info keys to specify things like what hosts to launch on, etc. >> Do you think MPI_COMM_SPAWN can help? > > It's the only method supported by the MPI standard. If you need it to block > until this new executable completes, you could use a barrier or other MPI > method to determine it. I believe that the user said they wanted to use the same cores as their original MPI job occupies for the new job -- they basically want the old job to block until the new job completes. Keep in mind that OMPI busy-polls waiting for progress, so you might actually get hosed here (two procs competing for time on the same core). I'm not immediately thinking of a good way to avoid this issue -- perhaps you could kludge something up such that the parent job polls on sleep() and checking to see if a message has arrived from the child (i.e., the last thing the child does before it calls MPI_FINALIZE is to send a message to its parents and then MPI_COMM_DISCONNECT from its parents). If the parent finds that it has a message from the child(ren), it can MPI_COMM_DISCONNECT and continue processing. Kinda hackey, but it might work...? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] running external program on same processor (Fortran)
On Mar 5, 2010, at 8:52 AM, abc def wrote: > Hello, > From within the MPI fortran program I run the following command: > > CALL SYSTEM("cd " // TRIM(dir) // " ; mpirun -machinefile ./machinefile -np 1 > /home01/group/Execute/DLPOLY.X > job.out 2> job.err ; cd - > /dev/null") That is guaranteed not to work. The problem is that mpirun sets environmental variables for the original launch. Your system call carries over those envars, causing mpirun to become confused. > > where "dir" is a process-number-dependent directory, to ensure the processes > don't over-write each other, and machinefile is written earlier by using > hostname to obtain the node of the current process, so this new program > launches using the same node as the one that launches it. > > In fortran, the program automatically waits until the system call is complete. > > Since you mentioned MPI_COMM_SPAWN, I looked into this. I read that it's > non-blocking, so somehow I'd have to prevent the program from moving forwards > until it was complete, and secondly, I need to "cd" into the directory I > mentioned above, before running the external command, and I don't know how > one would achieve this with this command. > > Do you think MPI_COMM_SPAWN can help? It's the only method supported by the MPI standard. If you need it to block until this new executable completes, you could use a barrier or other MPI method to determine it. > I appreciate your time. > > From: r...@open-mpi.org > Date: Fri, 5 Mar 2010 07:55:59 -0700 > To: us...@open-mpi.org > Subject: Re: [OMPI users] running external programon same > processor (Fortran) > > How are you trying to start this external program? With an MPI_Comm_spawn? Or > are you just fork/exec'ing it? > > How are you waiting for this external program to finish? > > On Mar 5, 2010, at 7:52 AM, abc def wrote: > > Hello, > > Thanks for the comments. Indeed, until yesterday, I didn't realise the > difference between MVAPICH, MVAPICH2 and Open-MPI. > > This problem has moved from mvapich2 to open-mpi now however, because I now > realise that the production environment uses Open-MPI, which means my > solution for mvapich2 doesn't work now. So if I may ask your kind assistance: > > Just to re-cap, I have an MPI fortran program, which runs on N nodes, and > each node needs to run an external program. This is external program was > written for MPI, although I want to run it in serial as part of the process > on each node. > > Is there any way at all to launch this external MPI program so it's treated > simply as a (serial) extension of the current node's processes? As I say, the > MPI originating program simply waits for the external program to finish > before continuing, so it it's essentially a bit like a "subroutine", in that > sense. > > I'm having problems launching this external program from within my MPI > program, under the open-mpi system, even without worrying about node > assignment, and I think this might be because the system detects that I'm > trying to launch another process from one of the nodes, and stops it. I'm > guessing here, but it simply stops with an error saying the MPI process was > stopped. > > Any help is very much appreciated; I have been going at this for more than a > day now and don't seem to be getting anywhere. > > Thank you! > > From: r...@open-mpi.org > Date: Wed, 3 Mar 2010 07:24:32 -0700 > To: us...@open-mpi.org > Subject: Re: [OMPI users] running external program on sameprocessor > (Fortran) > > It also would have been really helpful to know that you were using MVAPICH > and -not- Open MPI as this mailing list is for the latter. We could have > directed you to the appropriate place if we had known. > > > On Mar 3, 2010, at 5:17 AM, abc def wrote: > > I don't know (I'm a little new to this area), but I figured out how to get > around the problem: > > Using SGE and MVAPICH2, the "-env MV2_CPU_MAPPING 0:1." option in mpiexec > seems to do the trick. > > So when calling the external program with mpiexec, I map the called process > to the current core rank, and it seems to stay distributed and separated as I > want. > > Hope someone else finds this useful in the future. > > > Date: Wed, 3 Mar 2010 13:13:01 +1100 > > Subject: Re: [OMPI users] running external program on same processor > > (Fortran) > > > > Surely this is the problem of the scheduler that your system uses, > > rather than MPI? > > > > > > On Wed, 2010-03-03 at 00:48 +, abc def wrote: > > > Hello, > > > > > > I wonder if someone can help. > > > > > > The situation is that I have an MPI-parallel fortran program. I run it > > > and it's distributed on N cores, and each of these processes must call > > > an external program. > > > > > > This external program is also an MPI program, however I want to run it > > > in serial, on the core that is calling it, as if it were part of the > > > fortran program. The
Re: [MTT users] [MTT bugs] [MTT] #212: Generic networklockingserver *REVIEW NEEDED*
On Fri, Feb/19/2010 12:00:55PM, Ethan Mallove wrote: > On Thu, Feb/18/2010 04:13:15PM, Jeff Squyres wrote: > > On Feb 18, 2010, at 10:48 AM, Ethan Mallove wrote: > > > > > To ensure there is never a collision between $a->{k} and $b->{k}, the > > > user can have two MTT clients share a $scratch, but they cannot both > > > run the same INI section simultaneously. I setup my scheduler to run > > > batches of MPI get, MPI install, Test get, Test build, and Test run > > > sections in parallel with successor INI sections dependent on their > > > predecessor INI sections (e.g., [Test run: foo] only runs after [Test > > > build: foo] completes). The limitation stinks, but the current > > > limitation is much worse: two MTT clients can't even run the same > > > *phase* out of one $scratch. > > > > Maybe it might be a little nicer just to protect the user from > > themselves -- if we ever detect a case where $a->{k} and $b->{k} > > both exist and are not the same value, dump out everything to a file > > and abort with an error message. This is clearly an erroneous > > situation, but running MTT in big parallel batches like this is a > > worthwhile-but-complicated endeavor, and some people are likely to > > get it wrong. So we should at least detect the situation and fail > > gracefully, rather than losing or corrupting results. > > > > Make sense? > > Yes. I'll add this. The check is there now. Ready for review. -Ethan > > -Ethan > > > > > > I originally wanted the .dump files to be completely safe, but MTT > > > clients were getting locked out of the .dump files for way too long. > > > E.g., MTT::MPI::LoadInstalls happens very early in client/mtt, and an > > > hour could elapse before MTT::MPI::SaveInstalls is called in > > > Install.pm. > > > > Yep, if you lock from load->save, then that can definitely happen... > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > ___ > > mtt-users mailing list > > mtt-us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users > ___ > mtt-users mailing list > mtt-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users --- client/mtt Mon Nov 09 14:38:09 2009 -0500 +++ client/mtt Fri Mar 05 14:02:39 2010 -0500 @@ -498,6 +498,15 @@ # execute on_start callback if exists _do_step($ini, "mtt", "before_mtt_start_exec"); +# Process setenv, unsetenv, prepend_path, and append_path + +my $config; +$config->{setenv} = Value($ini, "mtt", "setenv"); +$config->{unsetenv} = Value($ini, "mtt", "unsetenv"); +$config->{prepend_path} = Value($ini, "mtt", "prepend_path"); +$config->{append_path} = Value($ini, "mtt", "append_path"); +my @save_env; +ProcessEnvKeys($config, \@save_env); # Set the logfile, if specified --- lib/MTT/Defaults.pm Mon Nov 09 14:38:09 2009 -0500 +++ lib/MTT/Defaults.pm Fri Mar 05 14:02:39 2010 -0500 @@ -42,7 +42,7 @@ known_compiler_names => [ "gnu", "pgi", "ibm", "intel", "kai", "absoft", "pathscale", "sun", "microsoft", "none", "unknown" ], -known_resource_manager_names => [ "slurm", "tm", "loadleveler", "n1ge", +known_resource_manager_names => [ "slurm", "tm", "loadleveler", "sge", "alps", "none", "unknown" ], known_network_names => [ "tcp", "udp", "ethernet", "gm", "mx", "verbs", "udapl", "psm", "elan", "portals", "shmem", --- lib/MTT/MPI.pm Mon Nov 09 14:38:09 2009 -0500 +++ lib/MTT/MPI.pm Fri Mar 05 14:02:39 2010 -0500 @@ -16,6 +16,8 @@ use strict; use MTT::Files; +use MTT::Messages; +use MTT::Util; #-- @@ -28,10 +30,13 @@ #-- # Filename where list of MPI sources is kept -my $sources_data_filename = "mpi_sources.dump"; +my $sources_data_filename = "mpi_sources"; # Filename where list of MPI installs is kept -my $installs_data_filename = "mpi_installs.dump"; +my $installs_data_filename = "mpi_installs"; + +# Filename extension for all the Dumper data files +my $data_filename_extension = "dump"; #-- @@ -42,10 +47,15 @@ # Explicitly delete anything that was there $MTT::MPI::sources = undef; -# If the file exists, read it in -my $data; -MTT::Files::load_dumpfile("$dir/$sources_data_filename", \$data); -$MTT::MPI::sources = $data->{VAR1}; +my @dumpfiles = glob("$dir/$sources_data_filename-*.$data_filename_extension"); +foreach my $dumpfile (@dumpfiles) { + +# If the file exists, read it in +my $data; +MTT::Files::load_dumpfile($dumpfile, \$data); +
Re: [OMPI users] running external program on same processor (Fortran)
Hello, >From within the MPI fortran program I run the following command: CALL SYSTEM("cd " // TRIM(dir) // " ; mpirun -machinefile ./machinefile -np 1 /home01/group/Execute/DLPOLY.X > job.out 2> job.err ; cd - > /dev/null") where "dir" is a process-number-dependent directory, to ensure the processes don't over-write each other, and machinefile is written earlier by using hostname to obtain the node of the current process, so this new program launches using the same node as the one that launches it. In fortran, the program automatically waits until the system call is complete. Since you mentioned MPI_COMM_SPAWN, I looked into this. I read that it's non-blocking, so somehow I'd have to prevent the program from moving forwards until it was complete, and secondly, I need to "cd" into the directory I mentioned above, before running the external command, and I don't know how one would achieve this with this command. Do you think MPI_COMM_SPAWN can help? I appreciate your time. From: r...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Fri, 5 Mar 2010 07:55:59 -0700 To: us...@open-mpi.org Subject: Re: [OMPI users] running external program on same processor (Fortran) How are you trying to start this external program? With an MPI_Comm_spawn? Or are you just fork/exec'ing it? How are you waiting for this external program to finish? On Mar 5, 2010, at 7:52 AM, abc def wrote:Hello, Thanks for the comments. Indeed, until yesterday, I didn't realise the difference between MVAPICH, MVAPICH2 and Open-MPI. This problem has moved from mvapich2 to open-mpi now however, because I now realise that the production environment uses Open-MPI, which means my solution for mvapich2 doesn't work now. So if I may ask your kind assistance: Just to re-cap, I have an MPI fortran program, which runs on N nodes, and each node needs to run an external program. This is external program was written for MPI, although I want to run it in serial as part of the process on each node. Is there any way at all to launch this external MPI program so it's treated simply as a (serial) extension of the current node's processes? As I say, the MPI originating program simply waits for the external program to finish before continuing, so it it's essentially a bit like a "subroutine", in that sense. I'm having problems launching this external program from within my MPI program, under the open-mpi system, even without worrying about node assignment, and I think this might be because the system detects that I'm trying to launch another process from one of the nodes, and stops it. I'm guessing here, but it simply stops with an error saying the MPI process was stopped. Any help is very much appreciated; I have been going at this for more than a day now and don't seem to be getting anywhere. Thank you! From: r...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Wed, 3 Mar 2010 07:24:32 -0700 To: us...@open-mpi.org Subject: Re: [OMPI users] running external program on sameprocessor (Fortran) It also would have been really helpful to know that you were using MVAPICH and -not- Open MPI as this mailing list is for the latter. We could have directed you to the appropriate place if we had known. On Mar 3, 2010, at 5:17 AM, abc def wrote:I don't know (I'm a little new to this area), but I figured out how to get around the problem: Using SGE and MVAPICH2, the "-env MV2_CPU_MAPPING 0:1." option in mpiexec seems to do the trick. So when calling the external program with mpiexec, I map the called process to the current core rank, and it seems to stay distributed and separated as I want. Hope someone else finds this useful in the future. > Date: Wed, 3 Mar 2010 13:13:01 +1100 > Subject: Re: [OMPI users] running external program on sameprocessor > (Fortran) > > Surely this is the problem of the scheduler that your system uses, > rather than MPI? > > > On Wed, 2010-03-03 at 00:48 +, abc def wrote: > > Hello, > > > > I wonder if someone can help. > > > > The situation is that I have an MPI-parallel fortran program. I run it > > and it's distributed on N cores, and each of these processes must call > > an external program. > > > > This external program is also an MPI program, however I want to run it > > in serial, on the core that is calling it, as if it were part of the > > fortran program. The fortran program waits until the external program > > has completed, and then continues. > > > > The problem is that this external program seems to run on any core, > > and not necessarily the (now idle) core that called it. This slows > > things down a lot as you get one core doing multiple tasks. > > > > Can anyone tell me how I can call the program and ensure it runs only > > on the core that's calling it? Note that there are several cores per > > node. I can ID the node by running the hostname command (I don't know > > a way to do this for individual cores). > >
Re: [OMPI users] change hosts to restart the checkpoint
This type of failure is usually due to prelink'ing being left enabled on one or more of the systems. This has come up multiple times on the Open MPI list, but is actually a problem between BLCR and the Linux kernel. BLCR has a FAQ entry on this that you will want to check out: https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html#prelink If that does not work, then we can look into other causes. -- Josh On Mar 5, 2010, at 3:06 AM, 马少杰 wrote: 2010-03-05 马少杰 Dear Sir: I want to use openmpi and blcr to checkpoint.However, I want restart the check point on other hosts. For example, I run mpi program using openmpi on host1 and host2, and I save the checkpoint file at a nfs shared path. Then I wan to restart the job (ompi-restart -machinefile ma ompi_global_snapshot_15865.ckpt) on host3 and host4. The 4 host have same hardware and software. If I change the hostname (host3 and host4) on machinfile, the error always occur, [node182:27278] *** Process received signal *** [node182:27278] Signal: Segmentation fault (11) [node182:27278] Signal code: Address not mapped (1) [node182:27278] Failing at address: 0x3b81009530 [node182:27275] *** Process received signal *** [node182:27275] Signal: Segmentation fault (11) [node182:27275] Signal code: Address not mapped (1) [node182:27275] Failing at address: 0x3b81009530 [node182:27274] *** Process received signal *** [node182:27274] Signal: Segmentation fault (11) [node182:27274] Signal code: Address not mapped (1) [node182:27274] Failing at address: 0x3b81009530 [node182:27276] *** Process received signal *** [node182:27276] Signal: Segmentation fault (11) [node182:27276] Signal code: Address not mapped (1) [node182:27276] Failing at address: 0x3b81009530 -- mpirun noticed that process rank 9 with PID 27973 on node node183 exited on signal 11 (Segmentation fault). if I comeback the hostname as host1 and host2, it can restart succesfully. my openmpi version is 1.3.4 ./configure --with-ft=cr --enable-mpi-threads --enable-ft-thread -- with-blcr=$dir --with-blcr-libdir=/$dir/lib --prefix=$dir_ompi -- enable-mpirun-prefix-by-default the command run the mpi progrom as mpirun -np 8 --am ft-enable-cr --mca opal_cr_use_thread 0 - machinefile ma ./cpi vim $HOME/.openmpi/mca-params.conf crs_base_snapshot_dir=/tmp/cr snapc_base_global_snapshot_dir=/disk/cr ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] running external program on same processor (Fortran)
How are you trying to start this external program? With an MPI_Comm_spawn? Or are you just fork/exec'ing it? How are you waiting for this external program to finish? On Mar 5, 2010, at 7:52 AM, abc def wrote: > Hello, > > Thanks for the comments. Indeed, until yesterday, I didn't realise the > difference between MVAPICH, MVAPICH2 and Open-MPI. > > This problem has moved from mvapich2 to open-mpi now however, because I now > realise that the production environment uses Open-MPI, which means my > solution for mvapich2 doesn't work now. So if I may ask your kind assistance: > > Just to re-cap, I have an MPI fortran program, which runs on N nodes, and > each node needs to run an external program. This is external program was > written for MPI, although I want to run it in serial as part of the process > on each node. > > Is there any way at all to launch this external MPI program so it's treated > simply as a (serial) extension of the current node's processes? As I say, the > MPI originating program simply waits for the external program to finish > before continuing, so it it's essentially a bit like a "subroutine", in that > sense. > > I'm having problems launching this external program from within my MPI > program, under the open-mpi system, even without worrying about node > assignment, and I think this might be because the system detects that I'm > trying to launch another process from one of the nodes, and stops it. I'm > guessing here, but it simply stops with an error saying the MPI process was > stopped. > > Any help is very much appreciated; I have been going at this for more than a > day now and don't seem to be getting anywhere. > > Thank you! > > From: r...@open-mpi.org > Date: Wed, 3 Mar 2010 07:24:32 -0700 > To: us...@open-mpi.org > Subject: Re: [OMPI users] running external program on sameprocessor > (Fortran) > > It also would have been really helpful to know that you were using MVAPICH > and -not- Open MPI as this mailing list is for the latter. We could have > directed you to the appropriate place if we had known. > > > On Mar 3, 2010, at 5:17 AM, abc def wrote: > > I don't know (I'm a little new to this area), but I figured out how to get > around the problem: > > Using SGE and MVAPICH2, the "-env MV2_CPU_MAPPING 0:1." option in mpiexec > seems to do the trick. > > So when calling the external program with mpiexec, I map the called process > to the current core rank, and it seems to stay distributed and separated as I > want. > > Hope someone else finds this useful in the future. > > > Date: Wed, 3 Mar 2010 13:13:01 +1100 > > Subject: Re: [OMPI users] running external program on same processor > > (Fortran) > > > > Surely this is the problem of the scheduler that your system uses, > > rather than MPI? > > > > > > On Wed, 2010-03-03 at 00:48 +, abc def wrote: > > > Hello, > > > > > > I wonder if someone can help. > > > > > > The situation is that I have an MPI-parallel fortran program. I run it > > > and it's distributed on N cores, and each of these processes must call > > > an external program. > > > > > > This external program is also an MPI program, however I want to run it > > > in serial, on the core that is calling it, as if it were part of the > > > fortran program. The fortran program waits until the external program > > > has completed, and then continues. > > > > > > The problem is that this external program seems to run on any core, > > > and not necessarily the (now idle) core that called it. This slows > > > things down a lot as you get one core doing multiple tasks. > > > > > > Can anyone tell me how I can call the program and ensure it runs only > > > on the core that's calling it? Note that there are several cores per > > > node. I can ID the node by running the hostname command (I don't know > > > a way to do this for individual cores). > > > > > > Thanks! > > > > > > > > > > > > Extra information that might be helpful: > > > > > > If I simply run the external program from the command line (ie, type > > > "/path/myprogram.ex "), it runs fine. If I run it within the > > > fortran program by calling it via > > > > > > CALL SYSTEM("/path/myprogram.ex") > > > > > > it doesn't run at all (doesn't even start) and everything crashes. I > > > don't know why this is. > > > > > > If I call it using mpiexec: > > > > > > CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex") > > > > > > then it does work, but I get the problem that it can go on any core. > > > > > > __ > > > Do you want a Hotmail account? Sign-up now - Free > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > > users mailing list > > us...@open-mpi.org > >
Re: [OMPI users] running external program on same processor (Fortran)
Hello, Thanks for the comments. Indeed, until yesterday, I didn't realise the difference between MVAPICH, MVAPICH2 and Open-MPI. This problem has moved from mvapich2 to open-mpi now however, because I now realise that the production environment uses Open-MPI, which means my solution for mvapich2 doesn't work now. So if I may ask your kind assistance: Just to re-cap, I have an MPI fortran program, which runs on N nodes, and each node needs to run an external program. This is external program was written for MPI, although I want to run it in serial as part of the process on each node. Is there any way at all to launch this external MPI program so it's treated simply as a (serial) extension of the current node's processes? As I say, the MPI originating program simply waits for the external program to finish before continuing, so it it's essentially a bit like a "subroutine", in that sense. I'm having problems launching this external program from within my MPI program, under the open-mpi system, even without worrying about node assignment, and I think this might be because the system detects that I'm trying to launch another process from one of the nodes, and stops it. I'm guessing here, but it simply stops with an error saying the MPI process was stopped. Any help is very much appreciated; I have been going at this for more than a day now and don't seem to be getting anywhere. Thank you! From: r...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Wed, 3 Mar 2010 07:24:32 -0700 To: us...@open-mpi.org Subject: Re: [OMPI users] running external program on sameprocessor (Fortran) It also would have been really helpful to know that you were using MVAPICH and -not- Open MPI as this mailing list is for the latter. We could have directed you to the appropriate place if we had known. On Mar 3, 2010, at 5:17 AM, abc def wrote:I don't know (I'm a little new to this area), but I figured out how to get around the problem: Using SGE and MVAPICH2, the "-env MV2_CPU_MAPPING 0:1." option in mpiexec seems to do the trick. So when calling the external program with mpiexec, I map the called process to the current core rank, and it seems to stay distributed and separated as I want. Hope someone else finds this useful in the future. > Date: Wed, 3 Mar 2010 13:13:01 +1100 > Subject: Re: [OMPI users] running external program on sameprocessor > (Fortran) > > Surely this is the problem of the scheduler that your system uses, > rather than MPI? > > > On Wed, 2010-03-03 at 00:48 +, abc def wrote: > > Hello, > > > > I wonder if someone can help. > > > > The situation is that I have an MPI-parallel fortran program. I run it > > and it's distributed on N cores, and each of these processes must call > > an external program. > > > > This external program is also an MPI program, however I want to run it > > in serial, on the core that is calling it, as if it were part of the > > fortran program. The fortran program waits until the external program > > has completed, and then continues. > > > > The problem is that this external program seems to run on any core, > > and not necessarily the (now idle) core that called it. This slows > > things down a lot as you get one core doing multiple tasks. > > > > Can anyone tell me how I can call the program and ensure it runs only > > on the core that's calling it? Note that there are several cores per > > node. I can ID the node by running the hostname command (I don't know > > a way to do this for individual cores). > > > > Thanks! > > > > > > > > Extra information that might be helpful: > > > > If I simply run the external program from the command line (ie, type > > "/path/myprogram.ex "), it runs fine. If I run it within the > > fortran program by calling it via > > > > CALL SYSTEM("/path/myprogram.ex") > > > > it doesn't run at all (doesn't even start) and everything crashes. I > > don't know why this is. > > > > If I call it using mpiexec: > > > > CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex") > > > > then it does work, but I get the problem that it can go on any core. > > > > __ > > Do you want a Hotmail account? Sign-up now - Free > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users Not got a Hotmail account? Sign-up now - Free ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users _ We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
Re: [OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint
On Mar 5, 2010, at 3:15 AM, 马少杰 wrote: > Dear Sir: > - What version of Open MPI are you using? > my version is 1.3.4 > - What configure options are you using? > ./configure --with-ft=cr --enable-mpi-threads --enable-ft-thread > --with-blcr=$dir --with-blcr-libdir=/$dir/lib > --prefix=/public/mpi/openmpi134-gnu-cr --enable-mpirun-prefix-by-default > make > make install > - What MCA parameters are you using? > mpirun -np 8 --am ft-enable-cr -machinefile ma xhpl > vim $HOME/.openmpi/mca-params.conf > # Local snapshot directory (not used in this scenario) > crs_base_snapshot_dir=/home/me/tmp > # Remote snapshot directory (globally mounted file system)) > snapc_base_global_snapshot_dir=/home/me/checkpoints > > > - Are you building from a release tarball or a SVN checkout? > building from openmpi-1.3.4.tar.gz > > > Now, I solve the problem successfully. > I found that the mpirun command as > > mpirun -np 8 --am ft-enable-cr --mca opal_cr_use_thread 0 -machinefile ma > ./xhpl > > the time cost is almost equal to the time cost by the command: mpirun -np 8 > -machinefile ma ./xhpl > > I think it should be a bug. Since you have configured Open MPI to use the C/R thread (--enable-ft-thread) then Open MPI will start the concurrent C/R thread when you ask for C/R to be enabled. By default the thread polls very aggressively (waiting only 0 microseconds, or the same as calling sched_yeild() on most systems). By turning it off you eliminate the contention the thread is causing on the system. There are two MCA parameters that control this behavior, links below: http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_thread_sleep_check http://osl.iu.edu/research/ft/ompi-cr/api.php#mca-opal_cr_thread_sleep_wait I agree that the default behavior is probably too aggressive for most applications. However by increasing these values the user is also increasing the amount of time before a checkpoint can begin. In my setup I usually set: opal_cr_thread_sleep_wait=1000 Which will throttle down the thread when the application is in the MPI library. You might want to play around with these MCA parameters to tune the aggressiveness of the C/R thread to your performance needs. In the mean time I will look into finding better default parameters for these options. Cheers, Josh > > > 2010-03-05 > 马少杰 > 发件人: Joshua Hursey > 发送时间: 2010-03-05 00:07:19 > 收件人: Open MPI Users > 抄送: > 主题: Re: [OMPI users] low efficiency when we use --am ft-enable-cr tocheckpoint > There is some overhead involved when activating the current C/R functionality > in Open MPI due to the wrapping of the internal point-to-point stack. The > wrapper (CRCP framework) tracks the signature of each message (not the > buffer, so constant time for any size MPI message) so that when we need to > quiesce the network we know of all the outstanding messages that need to be > drained. > > So there is an overhead, but it should not be as significant as you have > mentioned. I looked at some of the performance aspects in the paper at the > link below: > http://www.open-mpi.org/papers/hpdc-2009/ > Though I did not look at HPL explicitly in this paper (just NPB, GROMACS, and > NetPipe), I have in testing and the time difference was definitely not 2x > (cannot recall the exact differences at the moment). > > Can you tell me a bit about your setup: > - What version of Open MPI are you using? > - What configure options are you using? > - What MCA parameters are you using? > - Are you building from a release tarball or a SVN checkout? > > -- Josh > > > On Mar 3, 2010, at 10:07 PM, 马少杰 wrote: > > > > > > > 2010-03-04 > > 马少杰 > > Dear Sir: > >I want to use blcr and openmpi to checkpoint, now I can save check > > point and restart my work successfully. How erver I find the option "--am > > ft-enable-cr" will case large cost . For example , when I run my HPL job > > without and with the option "--am ft-enable-cr" on 4 hosts (32 process, IB > > network) respectively , the time costed are 8m21.180sand 16m37.732s > > respctively. it is should be noted that I did not save the checkpoint when > > I run the job, the additional cost is caused by "--am ft-enable-cr" > > independently. Why can the optin "--am ft-enable-cr" case so much system > > cost? Is it normal? How can I solve the problem. > > I also test other mpi applications, the problem still exists. > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneouscluster (32/64 bit machines)
Hi, Thank you for those informations. For the moment, I didn't encountered those problems yet. Maybe because, my program don't use much memory (100MB) and the master machine have huge RAM (8GB). So meanwhile, the solution seems to be the parameter "btl_tcp_eager_limit" but a cleaner solution is very welcome :-) TMHieu 2010/3/5 Aurélien Bouteiller: > Hi, > > setting the eager limit to such a drastically high value will have the effect of generating gigantic memory consumption for unexpected messages. Any message you send which does not have a preposted ready recv will mallocate 150mb of temporary storage, and will be memcopied from that internal buffer to the recv buffer when the recv is posted. You should expect very poor bandwidth and probably crash/abort due to memory exhaustion on the receivers. > > Aurelien > -- > Dr. Aurelien Bouteiller > Innovative Computing Laboratory > University of Tennessee > Knoxville, TN, USA > > > Le 4 mars 2010 à 09:02, TRINH Minh Hieu a écrit : > >> Hi, >> >> I have some new discovery about this problem : >> >> It seems that the array size sendable from a 32bit to 64bit machines >> is proportional to the parameter "btl_tcp_eager_limit" >> When I set it to 200 000 000 (2e08 bytes, about 190MB), I can send an >> array up to 2e07 double (152MB). >> >> I didn't found much informations about btl_tcp_eager_limit other than >> in the "ompi_info --all" command. If I let it at 2e08, will it impacts >> the performance of OpenMPI ? >> >> It may be noteworth also that if the master (rank 0) is a 32bit >> machines, I don't have segfault. I can send big array with small >> "btl_tcp_eager_limit" from a 64bit machine to a 32bit one. >> >> Do I have to move this thread to devel mailing list ? >> >> Regards, >> >> TMHieu >> >> On Tue, Mar 2, 2010 at 2:54 PM, TRINH Minh Hieu wrote: >>> Hello, >>> >>> Yes, I compiled OpenMPI with --enable-heterogeneous. More precisely I >>> compiled with : >>> $ ./configure --prefix=/tmp/openmpi --enable-heterogeneous >>> --enable-cxx-exceptions --enable-shared >>> --enable-orterun-prefix-by-default >>> $ make all install >>> >>> I attach the output of ompi_info of my 2 machines. >>> >>>TMHieu >>> >>> On Tue, Mar 2, 2010 at 1:57 PM, Jeff Squyres wrote: Did you configure Open MPI with --enable-heterogeneous? On Feb 28, 2010, at 1:22 PM, TRINH Minh Hieu wrote: > Hello, > > I have some problems running MPI on my heterogeneous cluster. More > precisley i got segmentation fault when sending a large array (about > 1) of double from a i686 machine to a x86_64 machine. It does not > happen with small array. Here is the send/recv code source (complete > source is in attached file) : > code > if (me == 0 ) { > for (int pe=1; pe { > printf("Receiving from proc %d : ",pe); fflush(stdout); > d=(double *)malloc(sizeof(double)*n); > MPI_Recv(d,n,MPI_DOUBLE,pe,999,MPI_COMM_WORLD,); > printf("OK\n"); fflush(stdout); > } > printf("All done.\n"); > } > else { > d=(double *)malloc(sizeof(double)*n); > MPI_Send(d,n,MPI_DOUBLE,0,999,MPI_COMM_WORLD); > } > code > > I got segmentation fault with n=1 but no error with n=1000 > I have 2 machines : > sbtn155 : Intel Xeon, x86_64 > sbtn211 : Intel Pentium 4, i686 > > The code is compiled in x86_64 and i686 machine, using OpenMPI 1.4.1, > installed in /tmp/openmpi : > [mhtrinh@sbtn211 heterogenous]$ make hetero > gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o hetero.i686.o > /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include > hetero.i686.o -o hetero.i686 -lm > > [mhtrinh@sbtn155 heterogenous]$ make hetero > gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o hetero.x86_64.o > /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include > hetero.x86_64.o -o hetero.x86_64 -lm > > I run with the code using appfile and got thoses error : > $ cat appfile > --host sbtn155 -np 1 hetero.x86_64 > --host sbtn155 -np 1 hetero.x86_64 > --host sbtn211 -np 1 hetero.i686 > > $ mpirun -hetero --app appfile > Input array length : > 1 > Receiving from proc 1 : OK > Receiving from proc 2 : [sbtn155:26386] *** Process received signal *** > [sbtn155:26386] Signal: Segmentation fault (11) > [sbtn155:26386] Signal code: Address not mapped (1) > [sbtn155:26386] Failing at address: 0x200627bd8 > [sbtn155:26386] [ 0] /lib64/libpthread.so.0 [0x3fa4e0e540] > [sbtn155:26386] [ 1] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so [0x2d8d7908] > [sbtn155:26386] [ 2]
Re: [OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint
Dear Sir: - What version of Open MPI are you using? my version is 1.3.4 - What configure options are you using? ./configure --with-ft=cr --enable-mpi-threads --enable-ft-thread --with-blcr=$dir --with-blcr-libdir=/$dir/lib --prefix=/public/mpi/openmpi134-gnu-cr --enable-mpirun-prefix-by-default make make install - What MCA parameters are you using? mpirun -np 8 --am ft-enable-cr -machinefile ma xhpl vim $HOME/.openmpi/mca-params.conf # Local snapshot directory (not used in this scenario) crs_base_snapshot_dir=/home/me/tmp # Remote snapshot directory (globally mounted file system)) snapc_base_global_snapshot_dir=/home/me/checkpoints - Are you building from a release tarball or a SVN checkout? building from openmpi-1.3.4.tar.gz Now, I solve the problem successfully. I found that the mpirun command as mpirun -np 8 --am ft-enable-cr --mca opal_cr_use_thread 0 -machinefile ma ./xhpl the time cost is almost equal to the time cost by the command: mpirun -np 8 -machinefile ma ./xhpl I think it should be a bug. 2010-03-05 马少杰 发件人: Joshua Hursey 发送时间: 2010-03-05 00:07:19 收件人: Open MPI Users 抄送: 主题: Re: [OMPI users] low efficiency when we use --am ft-enable-cr tocheckpoint There is some overhead involved when activating the current C/R functionality in Open MPI due to the wrapping of the internal point-to-point stack. The wrapper (CRCP framework) tracks the signature of each message (not the buffer, so constant time for any size MPI message) so that when we need to quiesce the network we know of all the outstanding messages that need to be drained. So there is an overhead, but it should not be as significant as you have mentioned. I looked at some of the performance aspects in the paper at the link below: http://www.open-mpi.org/papers/hpdc-2009/ Though I did not look at HPL explicitly in this paper (just NPB, GROMACS, and NetPipe), I have in testing and the time difference was definitely not 2x (cannot recall the exact differences at the moment). Can you tell me a bit about your setup: - What version of Open MPI are you using? - What configure options are you using? - What MCA parameters are you using? - Are you building from a release tarball or a SVN checkout? -- Josh On Mar 3, 2010, at 10:07 PM, 马少杰 wrote: > > > 2010-03-04 > 马少杰 > Dear Sir: >I want to use blcr and openmpi to checkpoint, now I can save check > point and restart my work successfully. How erver I find the option "--am > ft-enable-cr" will case large cost . For example , when I run my HPL job > without and with the option "--am ft-enable-cr" on 4 hosts (32 process, IB > network) respectively , the time costed are 8m21.180sand 16m37.732s > respctively. it is should be noted that I did not save the checkpoint when I > run the job, the additional cost is caused by "--am ft-enable-cr" > independently. Why can the optin "--am ft-enable-cr" case so much system > cost? Is it normal? How can I solve the problem. > I also test other mpi applications, the problem still exists. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] change hosts to restart the checkpoint
2010-03-05 马少杰 Dear Sir: I want to use openmpi and blcr to checkpoint.However, I want restart the check point on other hosts. For example, I run mpi program using openmpi on host1 and host2, and I save the checkpoint file at a nfs shared path. Then I wan to restart the job (ompi-restart -machinefile ma ompi_global_snapshot_15865.ckpt) on host3 and host4. The 4 host have same hardware and software. If I change the hostname (host3 and host4) on machinfile, the error always occur, [node182:27278] *** Process received signal *** [node182:27278] Signal: Segmentation fault (11) [node182:27278] Signal code: Address not mapped (1) [node182:27278] Failing at address: 0x3b81009530 [node182:27275] *** Process received signal *** [node182:27275] Signal: Segmentation fault (11) [node182:27275] Signal code: Address not mapped (1) [node182:27275] Failing at address: 0x3b81009530 [node182:27274] *** Process received signal *** [node182:27274] Signal: Segmentation fault (11) [node182:27274] Signal code: Address not mapped (1) [node182:27274] Failing at address: 0x3b81009530 [node182:27276] *** Process received signal *** [node182:27276] Signal: Segmentation fault (11) [node182:27276] Signal code: Address not mapped (1) [node182:27276] Failing at address: 0x3b81009530 -- mpirun noticed that process rank 9 with PID 27973 on node node183 exited on signal 11 (Segmentation fault). if I comeback the hostname as host1 and host2, it can restart succesfully. my openmpi version is 1.3.4 ./configure --with-ft=cr --enable-mpi-threads --enable-ft-thread --with-blcr=$dir --with-blcr-libdir=/$dir/lib --prefix=$dir_ompi --enable-mpirun-prefix-by-default the command run the mpi progrom as mpirun -np 8 --am ft-enable-cr --mca opal_cr_use_thread 0 -machinefile ma ./cpi vim $HOME/.openmpi/mca-params.conf crs_base_snapshot_dir=/tmp/cr snapc_base_global_snapshot_dir=/disk/cr
Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneouscluster (32/64 bit machines)
Hi, setting the eager limit to such a drastically high value will have the effect of generating gigantic memory consumption for unexpected messages. Any message you send which does not have a preposted ready recv will mallocate 150mb of temporary storage, and will be memcopied from that internal buffer to the recv buffer when the recv is posted. You should expect very poor bandwidth and probably crash/abort due to memory exhaustion on the receivers. Aurelien -- Dr. Aurelien Bouteiller Innovative Computing Laboratory University of Tennessee Knoxville, TN, USA Le 4 mars 2010 à 09:02, TRINH Minh Hieu a écrit : > Hi, > > I have some new discovery about this problem : > > It seems that the array size sendable from a 32bit to 64bit machines > is proportional to the parameter "btl_tcp_eager_limit" > When I set it to 200 000 000 (2e08 bytes, about 190MB), I can send an > array up to 2e07 double (152MB). > > I didn't found much informations about btl_tcp_eager_limit other than > in the "ompi_info --all" command. If I let it at 2e08, will it impacts > the performance of OpenMPI ? > > It may be noteworth also that if the master (rank 0) is a 32bit > machines, I don't have segfault. I can send big array with small > "btl_tcp_eager_limit" from a 64bit machine to a 32bit one. > > Do I have to move this thread to devel mailing list ? > > Regards, > > TMHieu > > On Tue, Mar 2, 2010 at 2:54 PM, TRINH Minh Hieuwrote: >> Hello, >> >> Yes, I compiled OpenMPI with --enable-heterogeneous. More precisely I >> compiled with : >> $ ./configure --prefix=/tmp/openmpi --enable-heterogeneous >> --enable-cxx-exceptions --enable-shared >> --enable-orterun-prefix-by-default >> $ make all install >> >> I attach the output of ompi_info of my 2 machines. >> >>TMHieu >> >> On Tue, Mar 2, 2010 at 1:57 PM, Jeff Squyres wrote: >>> Did you configure Open MPI with --enable-heterogeneous? >>> >>> On Feb 28, 2010, at 1:22 PM, TRINH Minh Hieu wrote: >>> Hello, I have some problems running MPI on my heterogeneous cluster. More precisley i got segmentation fault when sending a large array (about 1) of double from a i686 machine to a x86_64 machine. It does not happen with small array. Here is the send/recv code source (complete source is in attached file) : code if (me == 0 ) { for (int pe=1; pe