[OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint

2010-03-03 Thread 马少杰


2010-03-04 



马少杰 
Dear Sir:
   I want to use blcr  and openmpi to checkpoint, now I can save check 
point and restart my work successfully. How erver I find the option "--am 
ft-enable-cr" will case large cost . For example ,  when I run my HPL job  
without and with the option "--am ft-enable-cr" on 4 hosts (32 process, IB 
network) respectively , the time costed are   8m21.180sand 16m37.732s 
respctively. it is should be noted that I did not save the checkpoint when I 
run the job, the additional cost is caused by "--am ft-enable-cr" 
independently. Why can the optin "--am ft-enable-cr"  case so much system  
cost? Is it normal? How can I solve the problem. 
  I also test  other mpi applications, the problem still exists.   


Re: [OMPI users] running external program on same processor (Fortran)

2010-03-03 Thread Terry Frankcombe
On Wed, 2010-03-03 at 12:57 -0500, Prentice Bisbal wrote:
> Reuti wrote:
> > Are you speaking of the same?
> 
> Good point, Reuti. I was thinking of a cluster scheduler like SGE or
> Torque.


Yeah, I meant the scheduler in the CPU time slice sense.

http://en.wikipedia.org/wiki/Scheduling_(computing)
vs.
http://en.wikipedia.org/wiki/Job_scheduler




> > Am 03.03.2010 um 17:32 schrieb Prentice Bisbal:
> > 
> >> Terry Frankcombe wrote:
> >>> Surely this is the problem of the scheduler that your system uses,
> > 
> > This I would also state.
> > 
> > 
> >>> rather than MPI?
> > 
> > Scheduler in the Linux kernel?
> > 
> > 
> >> That's not true. The scheduler only assigns the initial processes to
> >> nodes
> > 
> > Scheduler in MPI?
> > 
> > 
> >> and starts them. It can kill the processes it starts if they use
> >> too much memory or run too long, but doesn't prevent them from spawning
> >> more processes, and once spawned,
> > 
> > When the processes are bound to one and the same core, these addititonal
> > processes won't intefere with other jobs' processes on the same node
> > which run on the other cores.
> > 
> > -- Reuti
> > 
> > 
> >> unless they are spawned through the
> >> scheduler, it has no control over them.
> >>>
> >>>
> >>> On Wed, 2010-03-03 at 00:48 +, abc def wrote:
>  Hello,
> 
>  I wonder if someone can help.
> 
>  The situation is that I have an MPI-parallel fortran program. I run it
>  and it's distributed on N cores, and each of these processes must call
>  an external program.
> 
>  This external program is also an MPI program, however I want to run it
>  in serial, on the core that is calling it, as if it were part of the
>  fortran program. The fortran program waits until the external program
>  has completed, and then continues.
> 
>  The problem is that this external program seems to run on any core,
>  and not necessarily the (now idle) core that called it. This slows
>  things down a lot as you get one core doing multiple tasks.
> 
>  Can anyone tell me how I can call the program and ensure it runs only
>  on the core that's calling it? Note that there are several cores per
>  node. I can ID the node by running the hostname command (I don't know
>  a way to do this for individual cores).
> 
>  Thanks!
> 
>  
> 
>  Extra information that might be helpful:
> 
>  If I simply run the external program from the command line (ie, type
>  "/path/myprogram.ex "), it runs fine. If I run it within the
>  fortran program by calling it via
> 
>  CALL SYSTEM("/path/myprogram.ex")
> 
>  it doesn't run at all (doesn't even start) and everything crashes. I
>  don't know why this is.
> 
>  If I call it using mpiexec:
> 
>  CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")
> 
>  then it does work, but I get the problem that it can go on any core.
> 
>  __
>  Do you want a Hotmail account? Sign-up now - Free
>  ___
>  users mailing list
>  us...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>
> >> -- 
> >> Prentice Bisbal
> >> Linux Software Support Specialist/System Administrator
> >> School of Natural Sciences
> >> Institute for Advanced Study
> >> Princeton, NJ
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> 



[OMPI users] checkpointing multi node and multi process applications

2010-03-03 Thread Fernando Lemos
Hi,


First, I'm hoping setting the subject of this e-mail will get it
attached to the thread that starts with this e-mail:

http://www.open-mpi.org/community/lists/users/2009/12/11608.php

The reason I'm not replying to that thread is that I wasn't subscribed
to the list at the time.


My environment is detailed in another thread, not related at all to this issue:

http://www.open-mpi.org/community/lists/users/2010/03/12199.php


I'm running into the same problem Jean described (though I'm running
1.4.1). Note that taking and restarting from checkpoints works fine
now when I'm using only a single node.

This is what I get by running the job on two nodes, also showing the
output after the checkpoint is taken:

root@debian1# mpirun -am ft-enable-cr -mca btl_tcp_if_include eth1 -np
2 --host debian1,debian2 ring

>>> Process 1 sending 2460 to 0
>>> Process 1 received 2459
>>> Process 1 sending 2459 to 0
[debian1:01817] Error: expected_component: PID information unavailable!
[debian1:01817] Error: expected_component: Component Name information
unavailable!
--
mpirun noticed that process rank 0 with PID 1819 on node debian1
exited on signal 0 (Unknown signal 0).
--

Now taking the checkpoint:

root@debian1# ompi-checkpoint --term `ps ax | grep mpirun | grep -v
grep | awk '{print $1}'`
Snapshot Ref.:   0 ompi_global_snapshot_1817.ckpt

Restarting from the checkpoint:

root@debian1:~# ompi-restart ompi_global_snapshot_1817.ckpt
[debian1:01832] Error: Unable to access the path
[/root/ompi_global_snapshot_1817.ckpt/0/opal_snapshot_1.ckpt]!
--
Error: The filename (opal_snapshot_1.ckpt) is invalid because either
you have not provided a filename
   or provided an invalid filename.
   Please see --help for usage.

--

After spitting that error message, ompi-restart just hangs forever.


Here's something that may or may not matter. debian1 and debian2 are
two virtual machines. They have two network interfaces each:

- eth0: Connected through NAT so that the machine can access the
internet. It gets an address by DHCP (VirtualBox magic), which is
always 10.0.2.15/24 (for both machines). They have no connection to
each other through this interface, they can only access the outside.

- eth1: Connected to an internal VirtualBox interface. Only debian1
and debian2 are members of that internal network (more VirtualBox
magic). The IPs are statically configured, 192.168.200.1/24 for
debian1, 192.168.200.2/24 for debian2.

Since both machines have an IP in the same subnet on eth0 (actually
the same IP), OpenMPI thinks they're in the same network connected
through eth0 too. That's why I need to specify btl_tcp_if_include
eth1, otherwise running jobs across the two nodes will not work
properly (sends and recvs time out).


Is there anything I can do to provide more information about this bug?
E.g. try to compile the code in the SVN trunk? I also have kept the
snapshots intact, I can tar them up and upload them somewhere in case
you guys need it. I can also provide the source code to the ring
program, but it's really the canonical ring MPI example.

As usual, any info you might need, just ask and I'll provide.


Thanks in advance,


Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-03 Thread Yuanyuan ZHANG
Hi guys,

Thanks for your help, but unfortunately I am still not clear.

> You are right Dave, FUNNELED allows the application to have multiple
> threads but only the man thread calls MPI.
My understanding is that even if I use SINGLE or MPI_Init, I can still
have multiple threads if I use OpenMP PARALLEL directive, and only
the main thread makes MPI calls. Am I correct?

> An OpenMP/MPI hybrid program that makes MPI calls only in between parallel
> sections is usually a FUNNELED user of MPI
For an OpenMP/MPI hybrid program, if I only want to make MPI calls using
the main thread, ie., only in between parallel sections, can I just use
SINGLE or MPI_Init? What's the benefit of FUNNELED?

Thanks.




Re: [OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Ralph Castain

On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:

> Eugene Loh wrote:
>> Prentice Bisbal wrote:
>>> Eugene Loh wrote:
>>> 
 Prentice Bisbal wrote:
 
> Is there a limit on how many MPI processes can run on a single host?
> 
>> Depending on which OMPI release you're using, I think you need something
>> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
>> 1000+ descriptors.  You're quite possibly up against your limit, though
>> I don't know for sure that that's the problem here.
>> 
>> You say you're running 1.2.8.  That's "a while ago", so would you
>> consider updating as a first step?  Among other things, newer OMPIs will
>> generate a much clearer error message if the descriptor limit is the
>> problem.
> 
> While 1.2.8 might be "a while ago", upgrading software just because it's
> "old" is not a valid argument.
> 
> I can install the lastest version of OpenMPI, but it will take a little
> while.

Maybe not because it is "old", but Eugene is correct. The old versions of OMPI 
required more file descriptors than the newer versions.

That said, you'll still need a minimum of 4x the number of procs on the node 
even with the latest release. I suggest talking to your sys admin about getting 
the limit increased. It sounds like it has been set unrealistically low.


> 
> 
> I have a user trying to test his code on the command-line on a single
> host before running it on our cluster like so:
> 
> mpirun -np X foo
> 
> When he tries to run it on large number of process (X = 256, 512), the
> program fails, and I can reproduce this with a simple "Hello, World"
> program:
> 
> $ mpirun -np 256 mpihello
> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
> exited on signal 15 (Terminated).
> 252 additional processes aborted (not shown)
> 
> I've done some testing and found that X <155 for this program to work.
> Is this a bug, part of the standard, or design/implementation decision?
> 
> 
> 
 One possible issue is the limit on the number of descriptors.  The error
 message should be pretty helpful and descriptive, but perhaps you're
 using an older version of OMPI.  If this is your problem, one workaround
 is something like this:
 
 unlimit descriptors
 mpirun -np 256 mpihello
 
>>> 
>>> Looks like I'm not allowed to set that as a regular user:
>>> 
>>> $ ulimit -n 2048
>>> -bash: ulimit: open files: cannot modify limit: Operation not permitted
>>> 
>>> Since I am the admin, I could change that elsewhere, but I'd rather not
>>> do that system-wide unless absolutely necessary.
>>> 
 though I guess the syntax depends on what shell you're running.  Another
 is to set the MCA parameter opal_set_max_sys_limits to 1.
 
>>> That didn't work either:
>>> 
>>> $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>>> exited on signal 15 (Terminated).
>>> 252 additional processes aborted (not shown)
>>> 
>>> 
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> -- 
> Prentice Bisbal
> Linux Software Support Specialist/System Administrator
> School of Natural Sciences
> Institute for Advanced Study
> Princeton, NJ
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Segfault in ompi-restart (ft-enable-cr)

2010-03-03 Thread Joshua Hursey

On Mar 3, 2010, at 3:42 PM, Fernando Lemos wrote:

> On Wed, Mar 3, 2010 at 5:31 PM, Joshua Hursey  wrote:
> 
>> 
>> Yes, ompi-restart should be printing a helpful message and exiting normally. 
>> Thanks for the bug report. I believe that I have seen and fixed this on a 
>> development branch making its way to the trunk. I'll make sure to move the 
>> fix to the 1.4 series once it has been applied to the trunk.
>> 
>> I filed a ticket on this if you wanted to track the issue.
>>  https://svn.open-mpi.org/trac/ompi/ticket/2329
> 
> Ah, that's great. Just wondering, do you have any idea why blcr-util
> is required? That package only contains the cr_* binaries (cr_restart,
> cr_checkpoint, cr_run) and some docs (manpages, changelog, etc.). I've
> filled a Debian bug (#572229) about making openmpi-checkpoint depend
> on blcr-util, but the package maintainer told me he found it unusual
> that ompi-restart would depend on the cr_* binaries since libcr
> supposedly provides all the functionality ompi-restart needs.
> 
> I'm about to compile OpenMPI in debug mode and take a look at the
> backtrace to see if I can understand what's going on.
> 
> Btw, this is the list of files in the blcr-util package:
> http://packages.debian.org/sid/amd64/blcr-util/filelist . As you can
> see, only cr_* binaries and docs.

Open MPI currently calls 'cr_restart' for each process it restarts, exec'ed 
from the 'opal-restart' binary (LAM/MPI also used cr_restart directly, in case 
anyone is interested). We use the internal library interface for checkpoint, 
but not restarting at this time.

If I recall correctly, it wasn't until relatively recently that BLCR added the 
ability to restart a process from a library call. We have not put in the code 
to use this functionality (though all of the framework interfaces are in place 
to do so). On my development branch I will add the ability to use the BLCR 
library interface if available. That functionality will not likely make it to 
the v1.4 release series since it is not really a bug fix, but I will plan on 
including it in the v1.5 and later releases. And just so I don't lose track of 
it, I created an enhancement ticket for this:
  https://svn.open-mpi.org/trac/ompi/ticket/2330

Cheers,
Josh

> 
>> 
>> Thanks again,
>> Josh
> 
> Thank you!
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-03 Thread Richard Treumann

You are right Dave, FUNNELED allows the application to have multiple
threads but only the man thread calls MPI.

An OpenMP/MPI hybrid program that makes MPI calls only in between parallel
sections is usually a FUNNELED user of MPI

Thanks

Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363



   
  From:   Dave Goodell    
   
  To: Open MPI Users   
   
  Date:   03/03/2010 03:08 PM  
   
  Subject:Re: [OMPI users] MPI_Init() and MPI_Init_thread()
   
  Sent by:users-boun...@open-mpi.org   
   





On Mar 3, 2010, at 11:35 AM, Richard Treumann wrote:
> If the application will make MPI calls from multiple threads and
> MPI_INIT_THREAD has returned FUNNELED, the application must be
> willing to take the steps that ensure there will never be concurrent
> calls to MPI from the threads. The threads will take turns - without
> fail.
>
Minor nitpick: if the implementation returns FUNNELED, only the main
thread (basically the thread that called MPI_INIT_THREAD, see MPI-2.2
pg 386 for def'n) may make MPI calls.  Dick's paragraph above is
correct if you replace FUNNELED with SERIALIZED.

-Dave

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Segfault in ompi-restart (ft-enable-cr)

2010-03-03 Thread Fernando Lemos
On Wed, Mar 3, 2010 at 5:31 PM, Joshua Hursey  wrote:

>
> Yes, ompi-restart should be printing a helpful message and exiting normally. 
> Thanks for the bug report. I believe that I have seen and fixed this on a 
> development branch making its way to the trunk. I'll make sure to move the 
> fix to the 1.4 series once it has been applied to the trunk.
>
> I filed a ticket on this if you wanted to track the issue.
>  https://svn.open-mpi.org/trac/ompi/ticket/2329

Ah, that's great. Just wondering, do you have any idea why blcr-util
is required? That package only contains the cr_* binaries (cr_restart,
cr_checkpoint, cr_run) and some docs (manpages, changelog, etc.). I've
filled a Debian bug (#572229) about making openmpi-checkpoint depend
on blcr-util, but the package maintainer told me he found it unusual
that ompi-restart would depend on the cr_* binaries since libcr
supposedly provides all the functionality ompi-restart needs.

I'm about to compile OpenMPI in debug mode and take a look at the
backtrace to see if I can understand what's going on.

Btw, this is the list of files in the blcr-util package:
http://packages.debian.org/sid/amd64/blcr-util/filelist . As you can
see, only cr_* binaries and docs.

>
> Thanks again,
> Josh

Thank you!



Re: [OMPI users] Segfault in ompi-restart (ft-enable-cr)

2010-03-03 Thread Joshua Hursey

On Mar 2, 2010, at 9:17 AM, Fernando Lemos wrote:

> On Sun, Feb 28, 2010 at 11:11 PM, Fernando Lemos  
> wrote:
>> Hello,
>> 
>> 
>> I'm trying to come up with a fault tolerant OpenMPI setup for research
>> purposes. I'm doing some tests now, but I'm stuck with a segfault when
>> I try to restart my test program from a checkpoint.
>> 
>> My test program is the "ring" program, where messages are sent to the
>> next node in the ring N times. It's pretty simple, I can supply the
>> source code if needed. I'm running it like this:
>> 
>> # mpirun -np 4 -am ft-enable-cr ring
>> ...
> Process 1 sending 703 to 2
> Process 3 received 704
> Process 3 sending 704 to 0
> Process 3 received 703
> Process 3 sending 703 to 0
>> --
>> mpirun noticed that process rank 0 with PID 18358 on node debian1
>> exited on signal 0 (Unknown signal 0).
>> --
>> 4 total processes killed (some possibly by mpirun during cleanup)
>> 
>> That's the output when I ompi-checkpoint the mpirun PID from another 
>> terminal.
>> 
>> The checkpoint is taken just fine in maybe 1.5 seconds. I can see the
>> checkpoint directory has been created in $HOME.
>> 
>> This is what I get when I try to run ompi-restart
>> 
>> ps axroot@debian1:~# ps ax | grep mpirun
>> 18357 pts/0R+ 0:01 mpirun -np 4 -am ft-enable-cr ring
>> 18378 pts/5S+ 0:00 grep mpirun
>> root@debian1:~# ompi-checkpoint 18357
>> Snapshot Ref.:   0 ompi_global_snapshot_18357.ckpt
>> root@debian1:~# ompi-checkpoint --term 18357
>> Snapshot Ref.:   1 ompi_global_snapshot_18357.ckpt
>> root@debian1:~# ompi-restart ompi_global_snapshot_18357.ckpt
>> --
>> Error: Unable to obtain the proper restart command to restart from the
>>   checkpoint file (opal_snapshot_2.ckpt). Returned -1.
>> 
>> --
>> [debian1:18384] *** Process received signal ***
>> [debian1:18384] Signal: Segmentation fault (11)
>> [debian1:18384] Signal code: Address not mapped (1)
>> [debian1:18384] Failing at address: 0x725f725f
>> [debian1:18384] [ 0] [0xb775f40c]
>> [debian1:18384] [ 1]
>> /usr/local/lib/libopen-pal.so.0(opal_argv_free+0x33) [0xb771ea63]
>> [debian1:18384] [ 2]
>> /usr/local/lib/libopen-pal.so.0(opal_event_fini+0x30) [0xb77150a0]
>> [debian1:18384] [ 3]
>> /usr/local/lib/libopen-pal.so.0(opal_finalize+0x35) [0xb7708fa5]
>> [debian1:18384] [ 4] opal-restart [0x804908e]
>> [debian1:18384] [ 5] /lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)
>> [0xb7568b55]
>> [debian1:18384] [ 6] opal-restart [0x8048fc1]
>> [debian1:18384] *** End of error message ***
>> --
>> mpirun noticed that process rank 2 with PID 18384 on node debian1
>> exited on signal 11 (Segmentat
>> --
>> 
>> I used a clean install of Debian Squeeze (testing) to make sure my
>> environment was ok. Those are the steps I took:
>> 
>> - Installed Debian Squeeze, only base packages
>> - Installed build-essential, libcr0, libcr-dev, blcr-dkms (build
>> tools, BLCR dev and run-time environment)
>> - Compiled openmpi-1.4.1
>> 
>> Note that I did compile openmpi-1.4.1 because the Debian package
>> (openmpi-checkpoint) doesn't seem to be usable at the moment. There
>> are no leftovers from any previous install of Debian packages
>> supplying OpenMPI because this is a fresh install, no openmpi package
>> had been installed before.
>> 
>> I used the following configure options:
>> 
>> # ./configure --with-ft=cr --enable-ft-thread --enable-mpi-threads
>> 
>> I also tried to add the option --with-memory-manager=none because I
>> saw an e-mail on the mailing list that described this as a possible
>> solution to an (apparently) not related problem, but the problem
>> remains the same.
>> 
>> I don't have config.log (I rm'ed the build dir), but if you think it's
>> necessary I can recompile OpenMPI and provide it.
>> 
>> Some information about the system (VirtualBox virtual machine, single
>> processor, btw):
>> 
>> Kernel version 2.6.32-trunk-686
>> 
>> root@debian1:~# lsmod | grep blcr
>> blcr   79084  0
>> blcr_imports2077  1 blcr
>> 
>> libcr (BLCR) is version 0.8.2-9.
>> 
>> gcc is version 4.4.3.
>> 
>> 
>> Please let me know of any other information you might need.
>> 
>> 
>> Thanks in advance,
>> 
> 
> Hello,
> 
> I figured it out. The problem is that the Debian package brcl-utils,
> which contains the BLCR binaries (cr_restart, cr_checkpoint, etc.)
> wasn't installed. I believe OpenMPI could perhaps show a more
> descriptive message instead of segfaulting, though? Also, you might
> want to add that information to the FAQ.
> 
> Anyways, 

Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-03 Thread Dave Goodell

On Mar 3, 2010, at 11:35 AM, Richard Treumann wrote:
If the application will make MPI calls from multiple threads and  
MPI_INIT_THREAD has returned FUNNELED, the application must be  
willing to take the steps that ensure there will never be concurrent  
calls to MPI from the threads. The threads will take turns - without  
fail.


Minor nitpick: if the implementation returns FUNNELED, only the main  
thread (basically the thread that called MPI_INIT_THREAD, see MPI-2.2  
pg 386 for def'n) may make MPI calls.  Dick's paragraph above is  
correct if you replace FUNNELED with SERIALIZED.


-Dave



Re: [OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Prentice Bisbal
Eugene Loh wrote:
> Prentice Bisbal wrote:
>> Eugene Loh wrote:
>>   
>>> Prentice Bisbal wrote:
>>> 
 Is there a limit on how many MPI processes can run on a single host?
   
> Depending on which OMPI release you're using, I think you need something
> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
> 1000+ descriptors.  You're quite possibly up against your limit, though
> I don't know for sure that that's the problem here.
> 
> You say you're running 1.2.8.  That's "a while ago", so would you
> consider updating as a first step?  Among other things, newer OMPIs will
> generate a much clearer error message if the descriptor limit is the
> problem.

While 1.2.8 might be "a while ago", upgrading software just because it's
"old" is not a valid argument.

I can install the lastest version of OpenMPI, but it will take a little
while.


 I have a user trying to test his code on the command-line on a single
 host before running it on our cluster like so:

 mpirun -np X foo

 When he tries to run it on large number of process (X = 256, 512), the
 program fails, and I can reproduce this with a simple "Hello, World"
 program:

 $ mpirun -np 256 mpihello
 mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
 exited on signal 15 (Terminated).
 252 additional processes aborted (not shown)

 I've done some testing and found that X <155 for this program to work.
 Is this a bug, part of the standard, or design/implementation decision?
  

   
>>> One possible issue is the limit on the number of descriptors.  The error
>>> message should be pretty helpful and descriptive, but perhaps you're
>>> using an older version of OMPI.  If this is your problem, one workaround
>>> is something like this:
>>>
>>> unlimit descriptors
>>> mpirun -np 256 mpihello
>>> 
>>
>> Looks like I'm not allowed to set that as a regular user:
>>
>> $ ulimit -n 2048
>> -bash: ulimit: open files: cannot modify limit: Operation not permitted
>>
>> Since I am the admin, I could change that elsewhere, but I'd rather not
>> do that system-wide unless absolutely necessary.
>>   
>>> though I guess the syntax depends on what shell you're running.  Another
>>> is to set the MCA parameter opal_set_max_sys_limits to 1.
>>> 
>> That didn't work either:
>>
>> $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>> exited on signal 15 (Terminated).
>> 252 additional processes aborted (not shown)
>>
>>   
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Eugene Loh




Prentice Bisbal wrote:

  Eugene Loh wrote:
  
  
Prentice Bisbal wrote:


  Is there a limit on how many MPI processes can run on a single host?
  

  

Depending on which OMPI release you're using, I think you need
something like 4*np up to 7*np (plus a few) descriptors.  So, with 256,
you need 1000+ descriptors.  You're quite possibly up against your
limit, though I don't know for sure that that's the problem here.

You say you're running 1.2.8.  That's "a while ago", so would you
consider updating as a first step?  Among other things, newer OMPIs
will generate a much clearer error message if the descriptor limit is
the problem.

  

  
I have a user trying to test his code on the command-line on a single
host before running it on our cluster like so:

mpirun -np X foo

When he tries to run it on large number of process (X = 256, 512), the
program fails, and I can reproduce this with a simple "Hello, World"
program:

$ mpirun -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)

I've done some testing and found that X <155 for this program to work.
Is this a bug, part of the standard, or design/implementation decision?


  

One possible issue is the limit on the number of descriptors.  The error
message should be pretty helpful and descriptive, but perhaps you're
using an older version of OMPI.  If this is your problem, one workaround
is something like this:

unlimit descriptors
mpirun -np 256 mpihello

  
  
Looks like I'm not allowed to set that as a regular user:

$ ulimit -n 2048
-bash: ulimit: open files: cannot modify limit: Operation not permitted

Since I am the admin, I could change that elsewhere, but I'd rather not
do that system-wide unless absolutely necessary.
  
  
though I guess the syntax depends on what shell you're running.  Another
is to set the MCA parameter opal_set_max_sys_limits to 1.

  
  That didn't work either:

$ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)

  






Re: [OMPI users] running external program on same processor (Fortran)

2010-03-03 Thread Prentice Bisbal
Reuti wrote:
> Are you speaking of the same?

Good point, Reuti. I was thinking of a cluster scheduler like SGE or
Torque.
> 
> Am 03.03.2010 um 17:32 schrieb Prentice Bisbal:
> 
>> Terry Frankcombe wrote:
>>> Surely this is the problem of the scheduler that your system uses,
> 
> This I would also state.
> 
> 
>>> rather than MPI?
> 
> Scheduler in the Linux kernel?
> 
> 
>> That's not true. The scheduler only assigns the initial processes to
>> nodes
> 
> Scheduler in MPI?
> 
> 
>> and starts them. It can kill the processes it starts if they use
>> too much memory or run too long, but doesn't prevent them from spawning
>> more processes, and once spawned,
> 
> When the processes are bound to one and the same core, these addititonal
> processes won't intefere with other jobs' processes on the same node
> which run on the other cores.
> 
> -- Reuti
> 
> 
>> unless they are spawned through the
>> scheduler, it has no control over them.
>>>
>>>
>>> On Wed, 2010-03-03 at 00:48 +, abc def wrote:
 Hello,

 I wonder if someone can help.

 The situation is that I have an MPI-parallel fortran program. I run it
 and it's distributed on N cores, and each of these processes must call
 an external program.

 This external program is also an MPI program, however I want to run it
 in serial, on the core that is calling it, as if it were part of the
 fortran program. The fortran program waits until the external program
 has completed, and then continues.

 The problem is that this external program seems to run on any core,
 and not necessarily the (now idle) core that called it. This slows
 things down a lot as you get one core doing multiple tasks.

 Can anyone tell me how I can call the program and ensure it runs only
 on the core that's calling it? Note that there are several cores per
 node. I can ID the node by running the hostname command (I don't know
 a way to do this for individual cores).

 Thanks!

 

 Extra information that might be helpful:

 If I simply run the external program from the command line (ie, type
 "/path/myprogram.ex "), it runs fine. If I run it within the
 fortran program by calling it via

 CALL SYSTEM("/path/myprogram.ex")

 it doesn't run at all (doesn't even start) and everything crashes. I
 don't know why this is.

 If I call it using mpiexec:

 CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")

 then it does work, but I get the problem that it can go on any core.

 __
 Do you want a Hotmail account? Sign-up now - Free
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> -- 
>> Prentice Bisbal
>> Linux Software Support Specialist/System Administrator
>> School of Natural Sciences
>> Institute for Advanced Study
>> Princeton, NJ
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Prentice Bisbal
Sorry. I meant to include that. I'm using version 1.2.8.

Ralph Castain wrote:
> It helps to have some idea what version you are talking about...
> 
> On Mar 3, 2010, at 9:51 AM, Prentice Bisbal wrote:
> 
>> Is there a limit on how many MPI processes can run on a single host?
>>
>> I have a user trying to test his code on the command-line on a single
>> host before running it on our cluster like so:
>>
>> mpirun -np X foo
>>
>> When he tries to run it on large number of process (X = 256, 512), the
>> program fails, and I can reproduce this with a simple "Hello, World"
>> program:
>>
>> $ mpirun -np 256 mpihello
>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>> exited on signal 15 (Terminated).
>> 252 additional processes aborted (not shown)
>>
>> I've done some testing and found that X <155 for this program to work.
>> Is this a bug, part of the standard, or design/implementation decision?
>>
>>
>> -- 
>> Prentice Bisbal
>> Linux Software Support Specialist/System Administrator
>> School of Natural Sciences
>> Institute for Advanced Study
>> Princeton, NJ
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-03 Thread Richard Treumann

The caller of MPI_INIT_THREAD says what level of thread safety he would
like to get from the MPI implementation. The implementation says what level
of thread safety it provides.

The implementation is free to provide more or less thread safety than
requested.  The caller of MPI_INIT_THREAD should look at the level the
implementation says it is providing and act accordingly. If the application
needs and  asks for  THREAD_MULTIPLE and gets less than THREAD_MULTIPLE, it
must terminate itself.  If the application has a preferred mode that uses
THREAD_MULTIPLE and a mode that uses FUNNELED then when the MPI_INIT_THREAD
call returns FUNNELED, the application must adopt the FUNNELED mode.

An application that asks for THREAD_SINGLE may hope there is a
THREAD_SINGLE mode that gives better performance but nothing in a
THREAD_SINGLE application can be made incorrect by an implementation
providing THREAD_MULTIPLE.

If the application will make MPI calls from multiple threads and
MPI_INIT_THREAD has returned FUNNELED, the application must be willing to
take the steps that ensure there will never be concurrent calls to MPI from
the threads. The threads will take turns - without fail.


Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


users-boun...@open-mpi.org wrote on 03/03/2010 11:59:45 AM:

> [image removed]
>
> Re: [OMPI users] MPI_Init() and MPI_Init_thread()
>
> Brian Budge
>
> to:
>
> Open MPI Users
>
> 03/03/2010 12:04 PM
>
> Sent by:
>
> users-boun...@open-mpi.org
>
> Please respond to Open MPI Users
>
> I believe that it specifies the *minimum* threading model
> supported.  If I recall, opmi is already funnel safe even in single
> mode.  However, if mpi calls are made from outside the main thread,
> you should specify funneled for portability
>   Brian
> On Mar 2, 2010 11:59 PM, "Terry Frankcombe"  wrote:
>
> I can't speak for the developers.  However, I think it's to do with the
> library internals.
>
>
>
> >From here: http://www.mpi-forum.org/docs/mpi-20-html/node165.htm
>
> "Advice to implementors.
>
> "If provided is not MPI_THREAD_SINGLE then the MPI library should not
> invoke C/ C++/Fortran library calls that are not thread safe, e.g., in
> an environment where malloc is not thread safe, then malloc should not
> be used by the MPI library.
>
> "Some implementors may want to use different MPI libraries for different
> levels of thread support. They can do so using dynamic linking and
> selecting which library will be linked when MPI_INIT_THREAD is invoked.
> If this is not possible, then optimizations for lower levels of thread
> support will occur only when the level of thread support required is
> specified at link time. ( End of advice to implementors.)"
>
>
>
> On Wed, 2010-03-03 at 16:33 +0900, Yuanyuan ZHANG wrote:
> > Hi all,
> >
> > I am a beginner of MPI
an...___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] running external program on same processor (Fortran)

2010-03-03 Thread Reuti

Are you speaking of the same?

Am 03.03.2010 um 17:32 schrieb Prentice Bisbal:


Terry Frankcombe wrote:

Surely this is the problem of the scheduler that your system uses,


This I would also state.



rather than MPI?


Scheduler in the Linux kernel?



That's not true. The scheduler only assigns the initial processes to
nodes


Scheduler in MPI?



and starts them. It can kill the processes it starts if they use
too much memory or run too long, but doesn't prevent them from  
spawning

more processes, and once spawned,


When the processes are bound to one and the same core, these  
addititonal processes won't intefere with other jobs' processes on  
the same node which run on the other cores.


-- Reuti



unless they are spawned through the
scheduler, it has no control over them.



On Wed, 2010-03-03 at 00:48 +, abc def wrote:

Hello,

I wonder if someone can help.

The situation is that I have an MPI-parallel fortran program. I  
run it
and it's distributed on N cores, and each of these processes must  
call

an external program.

This external program is also an MPI program, however I want to  
run it

in serial, on the core that is calling it, as if it were part of the
fortran program. The fortran program waits until the external  
program

has completed, and then continues.

The problem is that this external program seems to run on any core,
and not necessarily the (now idle) core that called it. This slows
things down a lot as you get one core doing multiple tasks.

Can anyone tell me how I can call the program and ensure it runs  
only

on the core that's calling it? Note that there are several cores per
node. I can ID the node by running the hostname command (I don't  
know

a way to do this for individual cores).

Thanks!



Extra information that might be helpful:

If I simply run the external program from the command line (ie, type
"/path/myprogram.ex "), it runs fine. If I run it within the
fortran program by calling it via

CALL SYSTEM("/path/myprogram.ex")

it doesn't run at all (doesn't even start) and everything crashes. I
don't know why this is.

If I call it using mpiexec:

CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")

then it does work, but I get the problem that it can go on any core.

 
__

Do you want a Hotmail account? Sign-up now - Free
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Eugene Loh

Prentice Bisbal wrote:


Is there a limit on how many MPI processes can run on a single host?

I have a user trying to test his code on the command-line on a single
host before running it on our cluster like so:

mpirun -np X foo

When he tries to run it on large number of process (X = 256, 512), the
program fails, and I can reproduce this with a simple "Hello, World"
program:

$ mpirun -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)

I've done some testing and found that X <155 for this program to work.
Is this a bug, part of the standard, or design/implementation decision?
 

One possible issue is the limit on the number of descriptors.  The error 
message should be pretty helpful and descriptive, but perhaps you're 
using an older version of OMPI.  If this is your problem, one workaround 
is something like this:


unlimit descriptors
mpirun -np 256 mpihello

though I guess the syntax depends on what shell you're running.  Another 
is to set the MCA parameter opal_set_max_sys_limits to 1.


Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-03 Thread Brian Budge
I believe that it specifies the *minimum* threading model supported.  If I
recall, opmi is already funnel safe even in single mode.  However, if mpi
calls are made from outside the main thread, you should specify funneled for
portability

  Brian

On Mar 2, 2010 11:59 PM, "Terry Frankcombe"  wrote:

I can't speak for the developers.  However, I think it's to do with the
library internals.



>From here: http://www.mpi-forum.org/docs/mpi-20-html/node165.htm

"Advice to implementors.

"If provided is not MPI_THREAD_SINGLE then the MPI library should not
invoke C/ C++/Fortran library calls that are not thread safe, e.g., in
an environment where malloc is not thread safe, then malloc should not
be used by the MPI library.

"Some implementors may want to use different MPI libraries for different
levels of thread support. They can do so using dynamic linking and
selecting which library will be linked when MPI_INIT_THREAD is invoked.
If this is not possible, then optimizations for lower levels of thread
support will occur only when the level of thread support required is
specified at link time. ( End of advice to implementors.)"




On Wed, 2010-03-03 at 16:33 +0900, Yuanyuan ZHANG wrote:
> Hi all,
>
> I am a beginner of MPI an...


Re: [OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Ralph Castain
It helps to have some idea what version you are talking about...

On Mar 3, 2010, at 9:51 AM, Prentice Bisbal wrote:

> Is there a limit on how many MPI processes can run on a single host?
> 
> I have a user trying to test his code on the command-line on a single
> host before running it on our cluster like so:
> 
> mpirun -np X foo
> 
> When he tries to run it on large number of process (X = 256, 512), the
> program fails, and I can reproduce this with a simple "Hello, World"
> program:
> 
> $ mpirun -np 256 mpihello
> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
> exited on signal 15 (Terminated).
> 252 additional processes aborted (not shown)
> 
> I've done some testing and found that X <155 for this program to work.
> Is this a bug, part of the standard, or design/implementation decision?
> 
> 
> -- 
> Prentice Bisbal
> Linux Software Support Specialist/System Administrator
> School of Natural Sciences
> Institute for Advanced Study
> Princeton, NJ
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Limit to number of processes on one node?

2010-03-03 Thread Prentice Bisbal
Is there a limit on how many MPI processes can run on a single host?

I have a user trying to test his code on the command-line on a single
host before running it on our cluster like so:

mpirun -np X foo

When he tries to run it on large number of process (X = 256, 512), the
program fails, and I can reproduce this with a simple "Hello, World"
program:

$ mpirun -np 256 mpihello
mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
exited on signal 15 (Terminated).
252 additional processes aborted (not shown)

I've done some testing and found that X <155 for this program to work.
Is this a bug, part of the standard, or design/implementation decision?


-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] running external program on same processor (Fortran)

2010-03-03 Thread Prentice Bisbal
Terry Frankcombe wrote:
> Surely this is the problem of the scheduler that your system uses,
> rather than MPI?

That's not true. The scheduler only assigns the initial processes to
nodes and starts them. It can kill the processes it starts if they use
too much memory or run too long, but doesn't prevent them from spawning
more processes, and once spawned, unless they are spawned through the
scheduler, it has no control over them.
> 
> 
> On Wed, 2010-03-03 at 00:48 +, abc def wrote:
>> Hello,
>>
>> I wonder if someone can help.
>>
>> The situation is that I have an MPI-parallel fortran program. I run it
>> and it's distributed on N cores, and each of these processes must call
>> an external program.
>>
>> This external program is also an MPI program, however I want to run it
>> in serial, on the core that is calling it, as if it were part of the
>> fortran program. The fortran program waits until the external program
>> has completed, and then continues.
>>
>> The problem is that this external program seems to run on any core,
>> and not necessarily the (now idle) core that called it. This slows
>> things down a lot as you get one core doing multiple tasks.
>>
>> Can anyone tell me how I can call the program and ensure it runs only
>> on the core that's calling it? Note that there are several cores per
>> node. I can ID the node by running the hostname command (I don't know
>> a way to do this for individual cores).
>>
>> Thanks!
>>
>> 
>>
>> Extra information that might be helpful:
>>
>> If I simply run the external program from the command line (ie, type
>> "/path/myprogram.ex "), it runs fine. If I run it within the
>> fortran program by calling it via
>>
>> CALL SYSTEM("/path/myprogram.ex")
>>
>> it doesn't run at all (doesn't even start) and everything crashes. I
>> don't know why this is.
>>
>> If I call it using mpiexec:
>>
>> CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")
>>
>> then it does work, but I get the problem that it can go on any core. 
>>
>> __
>> Do you want a Hotmail account? Sign-up now - Free
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] running external program on same processor (Fortran)

2010-03-03 Thread abc def

I don't know (I'm a little new to this area), but I figured out how to get 
around the problem:

Using SGE and MVAPICH2, the "-env MV2_CPU_MAPPING 0:1." option in mpiexec 
seems to do the trick.



So when calling the external program with mpiexec, I map the called
process to the current core rank, and it seems to stay distributed and
separated as I want.

Hope someone else finds this useful in the future.

> Date: Wed, 3 Mar 2010 13:13:01 +1100
> Subject: Re: [OMPI users] running external program on sameprocessor   
> (Fortran)
> 
> Surely this is the problem of the scheduler that your system uses,
> rather than MPI?
> 
> 
> On Wed, 2010-03-03 at 00:48 +, abc def wrote:
> > Hello,
> > 
> > I wonder if someone can help.
> > 
> > The situation is that I have an MPI-parallel fortran program. I run it
> > and it's distributed on N cores, and each of these processes must call
> > an external program.
> > 
> > This external program is also an MPI program, however I want to run it
> > in serial, on the core that is calling it, as if it were part of the
> > fortran program. The fortran program waits until the external program
> > has completed, and then continues.
> > 
> > The problem is that this external program seems to run on any core,
> > and not necessarily the (now idle) core that called it. This slows
> > things down a lot as you get one core doing multiple tasks.
> > 
> > Can anyone tell me how I can call the program and ensure it runs only
> > on the core that's calling it? Note that there are several cores per
> > node. I can ID the node by running the hostname command (I don't know
> > a way to do this for individual cores).
> > 
> > Thanks!
> > 
> > 
> > 
> > Extra information that might be helpful:
> > 
> > If I simply run the external program from the command line (ie, type
> > "/path/myprogram.ex "), it runs fine. If I run it within the
> > fortran program by calling it via
> > 
> > CALL SYSTEM("/path/myprogram.ex")
> > 
> > it doesn't run at all (doesn't even start) and everything crashes. I
> > don't know why this is.
> > 
> > If I call it using mpiexec:
> > 
> > CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")
> > 
> > then it does work, but I get the problem that it can go on any core. 
> > 
> > __
> > Do you want a Hotmail account? Sign-up now - Free
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
  
_
We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-03 Thread Terry Frankcombe
I can't speak for the developers.  However, I think it's to do with the
library internals.



>From here: http://www.mpi-forum.org/docs/mpi-20-html/node165.htm

"Advice to implementors. 

"If provided is not MPI_THREAD_SINGLE then the MPI library should not
invoke C/ C++/Fortran library calls that are not thread safe, e.g., in
an environment where malloc is not thread safe, then malloc should not
be used by the MPI library. 

"Some implementors may want to use different MPI libraries for different
levels of thread support. They can do so using dynamic linking and
selecting which library will be linked when MPI_INIT_THREAD is invoked.
If this is not possible, then optimizations for lower levels of thread
support will occur only when the level of thread support required is
specified at link time. ( End of advice to implementors.)"



On Wed, 2010-03-03 at 16:33 +0900, Yuanyuan ZHANG wrote:
> Hi all,
> 
> I am a beginner of MPI and a little confused with
> MPI_Init_thread() function:
> 
> If we use MPI_Init() or MPI_Init_thread(MPI_THREAD_SINGLE, provided)
> to initialize MPI environment, when we use OpenMP
> PARALLEL directive each process is forked to multiple
> threads and when an MPI function is called, one thread
> is used to execute the call. It seems that this
> has same effect when we use MPI_Init_Thread(MPI_THREAD_FUNNELED,
> provided). So what's the difference between MPI_Init() and
> MPI_Init_thread(MPI_THREAD_FUNNELED, provided)?
> 
> Thanks in advance,
> 
> Yuanyuan
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-03 Thread Yuanyuan ZHANG
Hi all,

I am a beginner of MPI and a little confused with
MPI_Init_thread() function:

If we use MPI_Init() or MPI_Init_thread(MPI_THREAD_SINGLE, provided)
to initialize MPI environment, when we use OpenMP
PARALLEL directive each process is forked to multiple
threads and when an MPI function is called, one thread
is used to execute the call. It seems that this
has same effect when we use MPI_Init_Thread(MPI_THREAD_FUNNELED,
provided). So what's the difference between MPI_Init() and
MPI_Init_thread(MPI_THREAD_FUNNELED, provided)?

Thanks in advance,

Yuanyuan




[OMPI users] noob warning - problems testing MPI_Comm_spawn

2010-03-03 Thread Damien Hocking

Hi all,

I'm playing around with MPI_Comm_spawn, trying to do something simple 
with a master-slave example.  I get a LOCAL DAEMON SPAWN IS CURRENTLY 
UNSUPPORTED error when it tries to spawn the slave.  This is on Windows, 
OpenMPI version 1.4.1, r22421.


Here's the master code:

int main(int argc, char* argv[])
{
  int myid, ierr;
  MPI_Comm maincomm;
  ierr = MPI_Init(, );
  ierr = MPI_Comm_rank(MPI_COMM_WORLD, );

  if (myid == 0)
  {
 std::cout << "\n Hello from the boss " << myid;
 std::cout.flush();
  }

  MPI_Info* spawninfo;
  MPI_Info_create(spawninfo);
  MPI_Info_set(*spawninfo, "add-host", "127.0.0.1");

  if (myid == 0)
  {
 std::cout << "\n About to MPI_Comm_spawn." << myid;
 std::cout.flush();
  }
  MPI_Comm_spawn("slave.exe", MPI_ARGV_NULL, 1, *spawninfo, 0, 
MPI_COMM_SELF, , MPI_ERRCODES_IGNORE);

  if (myid == 0)
  {
 std::cout << "\n MPI_Comm_spawn successful." << myid;
 std::cout.flush();
  }
  ierr = MPI_Finalize();
  return 0;
}

Here's the slave code:

int main(int argc, char* argv[])
{
  int myid, ierr;

  MPI_Comm parent;

  ierr = MPI_Init(, );
  MPI_Comm_get_parent();

  if (parent == MPI_COMM_NULL)
  {
 std::cout << "\n No parent.";
  }
  ierr = MPI_Comm_rank(MPI_COMM_WORLD, );

  std::cout << "\n Hello from a worker " << myid;
  std::cout.flush();   


  ierr = MPI_Finalize();

  return 0;
}

Also, this only starts up correctly if I kick it off with orterun.  
Ideally I'd like to run it as "master.exe" and have it initialise the 
MPI environment from there.  Can anyone tell me what setup I need to do 
that? 


Thanks in advance,

Damien