Re: [OMPI users] Limit to number of processes on one node?

2010-03-04 Thread Prentice Bisbal


Ralph Castain wrote:
> On Mar 4, 2010, at 7:51 AM, Prentice Bisbal wrote:
> 
>>
>> Ralph Castain wrote:
>>> On Mar 4, 2010, at 7:27 AM, Prentice Bisbal wrote:
>>>
 Ralph Castain wrote:
> On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:
>
>> Eugene Loh wrote:
>>> Prentice Bisbal wrote:
 Eugene Loh wrote:

> Prentice Bisbal wrote:
>
>> Is there a limit on how many MPI processes can run on a single host?
>>
>>> Depending on which OMPI release you're using, I think you need something
>>> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
>>> 1000+ descriptors.  You're quite possibly up against your limit, though
>>> I don't know for sure that that's the problem here.
>>>
>>> You say you're running 1.2.8.  That's "a while ago", so would you
>>> consider updating as a first step?  Among other things, newer OMPIs will
>>> generate a much clearer error message if the descriptor limit is the
>>> problem.
>> While 1.2.8 might be "a while ago", upgrading software just because it's
>> "old" is not a valid argument.
>>
>> I can install the lastest version of OpenMPI, but it will take a little
>> while.
> Maybe not because it is "old", but Eugene is correct. The old versions of 
> OMPI required more file descriptors than the newer versions.
>
> That said, you'll still need a minimum of 4x the number of procs on the 
> node even with the latest release. I suggest talking to your sys admin 
> about getting the limit increased. It sounds like it has been set 
> unrealistically low.
>
>
 I *am* the system admin! ;)

 The file descriptor limit is the default for RHEL,  1024, so I would not
 characterize it as "unrealistically low".  I assume someone with much
 more knowledge of OS design and administration than me came up with this
 default, so I'm hesitant to change it without good reason. If there was
 good reason, I'd have no problem changing it. I have read that setting
 it to more than 8192 can lead to system instability.
>>> Never heard that, and most HPC systems have it set a great deal higher 
>>> without trouble.
>> I just read that the other day. Not sure where, though. Probably a forum
>> posting somewhere. I'll take your word for it that it's safe to increase
>> if necessary.
>>> However, the choice is yours. If you have a large SMP system, you'll 
>>> eventually be forced to change it or severely limit its usefulness for MPI. 
>>> RHEL sets it that low arbitrarily as a way of saving memory by keeping the 
>>> fd table small, not because the OS can't handle it.
>>>
>>> Anyway, that is the problem. Nothing we (or any MPI) can do about it as the 
>>> fd's are required for socket-based communications and to forward I/O.
>> Thanks, Ralph, that's exactly the answer I was looking for - where this
>> limit was coming from.
>>
>> I can see how on a large SMP system the fd limit would have to be
>> increased. In normal circumstances, my cluster nodes should never have
>> more than 8 MPI processes running at once (per node), so I shouldn't be
>> hitting that limit on my cluster.
> 
> Ah, okay! That helps a great deal in figuring out what to advise you. In your 
> earlier note, it sounded like you were running all 512 procs on one node, so 
> I assumed you had a large single-node SMP.
> 
> In this case, though, the problem is solely that you are using the 1.2 
> series. In that series, mpirun and each process opened many more sockets to 
> all processes in the job. That's why you are overrunning your limit.
> 
> Starting with 1.3, the number of sockets being opened on each is only 3 times 
> the number of procs on the node, plus a couple for the daemon. If you are 
> using TCP for MPI communications, then each MPI connection will open another 
> socket as these messages are direct and not routed.
> 
> Upgrading to the 1.4 series should resolve the problem you saw.

After upgrading to 1.4.1, I can start up to 253 processes on one host:

mpirun -np 253 mpihello

This is an increase of ~100 over 1.2.8. When it does fail, it gives more
useful error message:

$ mpirun -np 254 mpihello
[juno.sns.ias.edu:22862] [[6399,0],0] ORTE_ERROR_LOG: The system limit
on number of network connections a process can open was reached in file
../../../../../orte/mca/oob/tcp/oob_tcp.c at line 447
--
Error: system limit exceeded on number of network connections that can
be open

This can be resolved by setting the mca parameter
opal_set_max_sys_limits to 1,
increasing your limit descriptor setting (using limit or ulimit commands),
or asking the system administrator to increase the system limit.
--


Case closed, court adjourned. Thanks for all the help and explanations.

Prentice


> 
> HTH
> R

Re: [OMPI users] Option to use only 7 cores out of 8 on each node

2010-03-04 Thread Dave Love
"Addepalli, Srirangam V"  writes:

> It works after creating a new pe and even from the command prompt with
> out using SGE.

You shouldn't need anything special -- I don't.  (It's common to run,
say, one process per core for benchmarking.)  Running

  mpirun -tag-output -np 14 -npernode 7 hostname

in 16 slots in the following PE (which confines the run to our 8-core
nodes) it shows 7 processes on each of 2 nodes.  The nodes have
exclusive access for MPI jobs, but I don't think that's relevant.

  pe_nameopenmpi-8
  slots  999
  user_lists NONE
  xuser_listsNONE
  start_proc_args/bin/true
  stop_proc_args /bin/true
  allocation_rule8
  control_slaves TRUE
  job_is_first_task  FALSE
  urgency_slots  min
  accounting_summary FALSE

In this case, you'd get the same effect allocating processes -bynode as
using -npernode.



Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-04 Thread Dave Goodell

On Mar 4, 2010, at 10:52 AM, Anthony Chan wrote:


- "Yuanyuan ZHANG"  wrote:


For an OpenMP/MPI hybrid program, if I only want to make MPI calls
using the main thread, ie., only in between parallel sections, can  
I just

use SINGLE or MPI_Init?


If your MPI calls is NOT within OpenMP directives, MPI does not even
know you are using threads.  So calling MPI_Init is good enough.


This is *not true*.  Please read Dick's previous post for a good  
example of why this is not the case.


In practice, on most platforms, implementation support for SINGLE and  
FUNNELED are identical (true for stock MPICH2, for example).  However  
Dick's example of thread-safe versus non-thread-safe malloc options  
clearly shows why programs need to request (and check "provided" for)  
>=FUNNELED in this scenario if they wish to be truly portable.


-Dave



Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-04 Thread Anthony Chan

- "Yuanyuan ZHANG"  wrote:


> For an OpenMP/MPI hybrid program, if I only want to make MPI calls
> using the main thread, ie., only in between parallel sections, can I just
> use SINGLE or MPI_Init? 

If your MPI calls is NOT within OpenMP directives, MPI does not even
know you are using threads.  So calling MPI_Init is good enough.

A.Chan


Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-04 Thread Dave Goodell

On Mar 4, 2010, at 7:36 AM, Richard Treumann wrote:
A call to MPI_Init allows the MPI library to return any level of  
thread support it chooses.


This is correct, insofar as the MPI implementation can always choose  
any level of thread support.
This MPI 1.1 call does not let the application say what it wants and  
does not let the implementation reply with what it can guarantee.



Well, sort of.  MPI-2.2, sec 12.4.3, page 385, lines 24-25:

--8<--
24|  A call to MPI_INIT has the same effect as a call to  
MPI_INIT_THREAD with a required

25|  = MPI_THREAD_SINGLE.
--8<--

So even though there is no explicit request and response for thread  
level support, it is implicitly asking for MPI_THREAD_SINGLE.  Since  
all implementations must be able to support at least SINGLE (0 threads  
running doesn't really make sense), SINGLE will be provided at a  
minimum.  Callers to plain-old "MPI_Init" should not expect any higher  
level of thread support if they wish to maintain portability.


[...snip...]

Consider a made up example:

Imagine some system supports Mutex lock/unlock but with terrible  
performance. As a work around, it offers a non-standard substitute  
for malloc called st_malloc (single thread malloc) that does not do  
locking.



[...snip...]

Dick's example is a great illustration of why FUNNELED might be  
necessary.  The moral of the story is "don't lie to the MPI  
implementation" :)


-Dave



Re: [OMPI users] low efficiency when we use --am ft-enable-cr to checkpoint

2010-03-04 Thread Joshua Hursey
There is some overhead involved when activating the current C/R functionality 
in Open MPI due to the wrapping of the internal point-to-point stack. The 
wrapper (CRCP framework) tracks the signature of each message (not the buffer, 
so constant time for any size MPI message) so that when we need to quiesce the 
network we know of all the outstanding messages that need to be drained.

So there is an overhead, but it should not be as significant as you have 
mentioned. I looked at some of the performance aspects in the paper at the link 
below:
  http://www.open-mpi.org/papers/hpdc-2009/
Though I did not look at HPL explicitly in this paper (just NPB, GROMACS, and 
NetPipe), I have in testing and the time difference was definitely not 2x 
(cannot recall the exact differences at the moment).

Can you tell me a bit about your setup:
 - What version of Open MPI are you using?
 - What configure options are you using?
 - What MCA parameters are you using?
 - Are you building from a release tarball or a SVN checkout?

-- Josh


On Mar 3, 2010, at 10:07 PM, 马少杰 wrote:

>  
>  
> 2010-03-04
> 马少杰
> Dear Sir:
>I want to use blcr  and openmpi to checkpoint, now I can save check 
> point and restart my work successfully. How erver I find the option "--am 
> ft-enable-cr" will case large cost . For example ,  when I run my HPL job  
> without and with the option "--am ft-enable-cr" on 4 hosts (32 process, IB 
> network) respectively , the time costed are   8m21.180sand 16m37.732s 
> respctively. it is should be noted that I did not save the checkpoint when I 
> run the job, the additional cost is caused by "--am ft-enable-cr" 
> independently. Why can the optin "--am ft-enable-cr"  case so much system  
> cost? Is it normal? How can I solve the problem.
>   I also test  other mpi applications, the problem still exists.   
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] checkpointing multi node and multi process applications

2010-03-04 Thread Joshua Hursey

On Mar 4, 2010, at 8:17 AM, Fernando Lemos wrote:

> On Wed, Mar 3, 2010 at 10:24 PM, Fernando Lemos  wrote:
> 
>> Is there anything I can do to provide more information about this bug?
>> E.g. try to compile the code in the SVN trunk? I also have kept the
>> snapshots intact, I can tar them up and upload them somewhere in case
>> you guys need it. I can also provide the source code to the ring
>> program, but it's really the canonical ring MPI example.
>> 
> 
> I tried 1.5 (1.5a1r22754 nightly snapshot, same compilation flags).
> This time taking the checkpoint didn't generate any error message:
> 
> root@debian1:~# mpirun -am ft-enable-cr -mca btl_tcp_if_include eth1
> -np 2 --host debian1,debian2 ring
> 
 Process 1 sending 2761 to 0
 Process 1 received 2760
 Process 1 sending 2760 to 0
> root@debian1:~#
> 
> But restoring it did:
> 
> root@debian1:~# ompi-restart ompi_global_snapshot_23071.ckpt
> [debian1:23129] Error: Unable to access the path
> [/root/ompi_global_snapshot_23071.ckpt/0/opal_snapshot_1.ckpt]!
> --
> Error: The filename (opal_snapshot_1.ckpt) is invalid because either
> you have not provided a filename
>   or provided an invalid filename.
>   Please see --help for usage.
> 
> --
> --
> mpirun has exited due to process rank 1 with PID 23129 on
> node debian1 exiting improperly. There are two reasons this could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
> 
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --
> root@debian1:~#
> 
> Indeed, opal_snapshot_1.ckpt does not exist exist:
> 
> root@debian1:~# find ompi_global_snapshot_23071.ckpt/
> ompi_global_snapshot_23071.ckpt/
> ompi_global_snapshot_23071.ckpt/global_snapshot_meta.data
> ompi_global_snapshot_23071.ckpt/restart-appfile
> ompi_global_snapshot_23071.ckpt/0
> ompi_global_snapshot_23071.ckpt/0/opal_snapshot_0.ckpt
> ompi_global_snapshot_23071.ckpt/0/opal_snapshot_0.ckpt/ompi_blcr_context.23073
> ompi_global_snapshot_23071.ckpt/0/opal_snapshot_0.ckpt/snapshot_meta.data
> root@debian1:~#
> 
> It can be found in debian2:
> 
> root@debian2:~# find ompi_global_snapshot_23071.ckpt/
> ompi_global_snapshot_23071.ckpt/
> ompi_global_snapshot_23071.ckpt/0
> ompi_global_snapshot_23071.ckpt/0/opal_snapshot_1.ckpt
> ompi_global_snapshot_23071.ckpt/0/opal_snapshot_1.ckpt/snapshot_meta.data
> ompi_global_snapshot_23071.ckpt/0/opal_snapshot_1.ckpt/ompi_blcr_context.6501
> root@debian2:~#

By default, Open MPI requires a shared file system to save checkpoint files. So 
by default the local snapshot is moved, since the system assumes that it is 
writing to the same directory on a shared file system. If you want to use the 
local disk staging functionality (which is known to be broken in the 1.4 
series), check out the example on the webpage below:
  http://osl.iu.edu/research/ft/ompi-cr/examples.php#uc-ckpt-local

> 
> Then I tried supplying a hostfile for ompi-run and it worked just
> fine! I thought the checkpoint included the hosts information?

We intentionally do not save the hostfile as part of the checkpoint. Typically 
folks will want to restart on different nodes than those they checkpointed on 
(such as in a batch scheduling environment). If we saved the hostfile then it 
could lead to unexpected user behavior on restart if the machines that they 
wish to restart on change.

If you need to pass a hostfile, the you can pass one to ompi-restart just as 
you would mpirun.

> 
> So I think it's fixed in 1.5. Should I try the 1.4 branch in SVN?

The file staging functionality is known to be broken in the 1.4 series at this 
time, per the ticket below:
  https://svn.open-mpi.org/trac/ompi/ticket/2139

Unfortunately the fix is likely to be both custom for the branch (since we 
redesigned the functionality for the trunk and v1.5) and fairly involved. I 
don't have the time at the moment to work on fix, but hopefully in the coming 
months I will be able to look into this issue. In the mean time, patches are 
always welcome :)

Hope that helps,
Josh


> 
> 
> Thanks a bunch,
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] noob warning - problems testing MPI_Comm_spawn

2010-03-04 Thread Damien Hocking

Thanks Shiqing.  I'll checkout a trunk copy and try that.

Damien

On 04/03/2010 7:29 AM, Shiqing Fan wrote:


Hi Damien,

Sorry for late reply, I was trying to dig inside the code and got some 
information.


First of all, in your example, it's not correct to define the MPI_Info 
as an pointer, it will cause the initialization violation at run time. 
The message "LOCAL DAEMON SPAWN IS CURRENTLY UNSUPPORTED" is just a 
warning which won't block the execution. In order to make the 
master-slave work, you have to disable the CCP support, it seems to 
have conflicts with comm_spawn operation, I'm still checking it.


To disable CCP in Open MPI 1.4.1, you have to exclude the source files 
manually, i.e. excluding ccp files in orte/mca/plm/ccp and 
orte/mca/ras/ccp, then also remove the ccp related lines in 
orte/mca/plm/base/static-components.h and 
orte/mca/ras/base/static-components.h. There is an option to do so in 
the trunk version, but not for 1.4.1. Sorry for the inconvenience.


For the "singleton" run with master.exe, it's still not working under 
Windows.



Best Regards,
Shiqing


Damien Hocking wrote:

Hi all,

I'm playing around with MPI_Comm_spawn, trying to do something simple 
with a master-slave example.  I get a LOCAL DAEMON SPAWN IS CURRENTLY 
UNSUPPORTED error when it tries to spawn the slave.  This is on 
Windows, OpenMPI version 1.4.1, r22421.


Here's the master code:

int main(int argc, char* argv[])
{
  int myid, ierr;
  MPI_Comm maincomm;
  ierr = MPI_Init(&argc, &argv);
  ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid);

  if (myid == 0)
  {
 std::cout << "\n Hello from the boss " << myid;
 std::cout.flush();
  }

  MPI_Info* spawninfo;
  MPI_Info_create(spawninfo);
  MPI_Info_set(*spawninfo, "add-host", "127.0.0.1");

  if (myid == 0)
  {
 std::cout << "\n About to MPI_Comm_spawn." << myid;
 std::cout.flush();
  }
  MPI_Comm_spawn("slave.exe", MPI_ARGV_NULL, 1, *spawninfo, 0, 
MPI_COMM_SELF, &maincomm, MPI_ERRCODES_IGNORE);

  if (myid == 0)
  {
 std::cout << "\n MPI_Comm_spawn successful." << myid;
 std::cout.flush();
  }
  ierr = MPI_Finalize();
  return 0;
}

Here's the slave code:

int main(int argc, char* argv[])
{
  int myid, ierr;

  MPI_Comm parent;

  ierr = MPI_Init(&argc, &argv);
  MPI_Comm_get_parent(&parent);

  if (parent == MPI_COMM_NULL)
  {
 std::cout << "\n No parent.";
  }
  ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid);

  std::cout << "\n Hello from a worker " << myid;
  std::cout.flush();  ierr = MPI_Finalize();

  return 0;
}

Also, this only starts up correctly if I kick it off with orterun.  
Ideally I'd like to run it as "master.exe" and have it initialise the 
MPI environment from there.  Can anyone tell me what setup I need to 
do that?

Thanks in advance,

Damien
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users






Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-04 Thread Richard Treumann

A call to MPI_Init allows the MPI library to return any level of thread
support it chooses. This MPI 1.1 call does not let the application say what
it wants and does not let the implementation reply with what it can
guarantee.

If you are using only one MPI implementation and your code will never be
run on another implementation you can check the user doc for that MPI
implementation and see if says that you can do what you want to do.  If
your application is single threaded you can use MPI_Init and be portable.

If your application has threads, you should use MPI_Init_thread and check
the response to be portable or to make sure you get thread safety from an
MPI implementation that has both a thread safe and a thread unsafe mode.  A
thread unsafe mode can skip some locking and maybe give better performance.

The MPI standard wanted to allow for the possibility that some
implementation of MPI would not be able to tolerate threads in the
application at all so SINGLE was one answer that could be returned.  For
what you want to do, you need a library that can return FUNNELED or better.
If you ask for FUNNELED and the MPI implementation you are using returns
SINGLE it is telling you that using OpenMP threads is not allowed. It does
not say why.  It may work fine but the implementation is telling you not to
do it and you cannot know if there are good reasons.

Consider a made up example:

Imagine some system supports Mutex lock/unlock but with terrible
performance. As a work around, it offers a non-standard substitute for
malloc called st_malloc (single thread malloc) that does not do locking.
Normal malloc is also available and it does use locking.  The documentation
for this system says that it is safe to use both st_malloc and malloc in a
single threaded application because the only difference is that st_malloc
skips the locking. It  also warns that if one thread calls st_malloc while
another calls malloc, heap corruption is likely.

Next imagine that the MPI implementation uses st_malloc to provide best
performance and the OpenMP threads use malloc.  When this particular MPI
implementation returns SINGLE, it really means SINGLE. If there is only one
application thread using regular malloc it is safe but if there is a malloc
call on one thread while the main thread is in an MPI call, heap corruption
is likely.




Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363



   
  From:   Yuanyuan ZHANG
   
  To: Open MPI Users   
   
  Date:   03/03/2010 07:34 PM  
   
  Subject:Re: [OMPI users] MPI_Init() and MPI_Init_thread()
   
  Sent by:users-boun...@open-mpi.org   
   





Hi guys,

Thanks for your help, but unfortunately I am still not clear.

> You are right Dave, FUNNELED allows the application to have multiple
> threads but only the man thread calls MPI.
My understanding is that even if I use SINGLE or MPI_Init, I can still
have multiple threads if I use OpenMP PARALLEL directive, and only
the main thread makes MPI calls. Am I correct?

> An OpenMP/MPI hybrid program that makes MPI calls only in between
parallel
> sections is usually a FUNNELED user of MPI
For an OpenMP/MPI hybrid program, if I only want to make MPI calls using
the main thread, ie., only in between parallel sections, can I just use
SINGLE or MPI_Init? What's the benefit of FUNNELED?

Thanks.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Limit to number of processes on one node?

2010-03-04 Thread Ralph Castain

On Mar 4, 2010, at 7:51 AM, Prentice Bisbal wrote:

> 
> 
> Ralph Castain wrote:
>> On Mar 4, 2010, at 7:27 AM, Prentice Bisbal wrote:
>> 
>>> 
>>> Ralph Castain wrote:
 On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:
 
> Eugene Loh wrote:
>> Prentice Bisbal wrote:
>>> Eugene Loh wrote:
>>> 
 Prentice Bisbal wrote:
 
> Is there a limit on how many MPI processes can run on a single host?
> 
>> Depending on which OMPI release you're using, I think you need something
>> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
>> 1000+ descriptors.  You're quite possibly up against your limit, though
>> I don't know for sure that that's the problem here.
>> 
>> You say you're running 1.2.8.  That's "a while ago", so would you
>> consider updating as a first step?  Among other things, newer OMPIs will
>> generate a much clearer error message if the descriptor limit is the
>> problem.
> While 1.2.8 might be "a while ago", upgrading software just because it's
> "old" is not a valid argument.
> 
> I can install the lastest version of OpenMPI, but it will take a little
> while.
 Maybe not because it is "old", but Eugene is correct. The old versions of 
 OMPI required more file descriptors than the newer versions.
 
 That said, you'll still need a minimum of 4x the number of procs on the 
 node even with the latest release. I suggest talking to your sys admin 
 about getting the limit increased. It sounds like it has been set 
 unrealistically low.
 
 
>>> I *am* the system admin! ;)
>>> 
>>> The file descriptor limit is the default for RHEL,  1024, so I would not
>>> characterize it as "unrealistically low".  I assume someone with much
>>> more knowledge of OS design and administration than me came up with this
>>> default, so I'm hesitant to change it without good reason. If there was
>>> good reason, I'd have no problem changing it. I have read that setting
>>> it to more than 8192 can lead to system instability.
>> 
>> Never heard that, and most HPC systems have it set a great deal higher 
>> without trouble.
> 
> I just read that the other day. Not sure where, though. Probably a forum
> posting somewhere. I'll take your word for it that it's safe to increase
> if necessary.
>> 
>> However, the choice is yours. If you have a large SMP system, you'll 
>> eventually be forced to change it or severely limit its usefulness for MPI. 
>> RHEL sets it that low arbitrarily as a way of saving memory by keeping the 
>> fd table small, not because the OS can't handle it.
>> 
>> Anyway, that is the problem. Nothing we (or any MPI) can do about it as the 
>> fd's are required for socket-based communications and to forward I/O.
> 
> Thanks, Ralph, that's exactly the answer I was looking for - where this
> limit was coming from.
> 
> I can see how on a large SMP system the fd limit would have to be
> increased. In normal circumstances, my cluster nodes should never have
> more than 8 MPI processes running at once (per node), so I shouldn't be
> hitting that limit on my cluster.

Ah, okay! That helps a great deal in figuring out what to advise you. In your 
earlier note, it sounded like you were running all 512 procs on one node, so I 
assumed you had a large single-node SMP.

In this case, though, the problem is solely that you are using the 1.2 series. 
In that series, mpirun and each process opened many more sockets to all 
processes in the job. That's why you are overrunning your limit.

Starting with 1.3, the number of sockets being opened on each is only 3 times 
the number of procs on the node, plus a couple for the daemon. If you are using 
TCP for MPI communications, then each MPI connection will open another socket 
as these messages are direct and not routed.

Upgrading to the 1.4 series should resolve the problem you saw.

HTH
Ralph

> 
>> 
>> 
>>> This is admittedly unusual situation - in normal use, no one would ever
>>> want to run that many processes on a single system - so I don't see any
>>> justification for modifying that setting.
>>> 
>>> Yesterday I spoke to the researcher who originally asked me this limit -
>>> he just wanted to know what the limit was, and doesn't actually plan to
>>> do any "real" work with that many processes on a single node, rendering
>>> this whole discussion academic.
>>> 
>>> I did install OpenMPI 1.4.1 yesterday, but I haven't had a chance to
>>> test it yet. I'll post the results of testing here.
>>> 
> I have a user trying to test his code on the command-line on a single
> host before running it on our cluster like so:
> 
> mpirun -np X foo
> 
> When he tries to run it on large number of process (X = 256, 512), the
> program fails, and I can reproduce this with a simple "Hello, World"
> program:
> 
> $ mpirun -np 256 mpihello

Re: [OMPI users] Limit to number of processes on one node?

2010-03-04 Thread Prentice Bisbal


Ralph Castain wrote:
> On Mar 4, 2010, at 7:27 AM, Prentice Bisbal wrote:
> 
>>
>> Ralph Castain wrote:
>>> On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:
>>>
 Eugene Loh wrote:
> Prentice Bisbal wrote:
>> Eugene Loh wrote:
>>
>>> Prentice Bisbal wrote:
>>>
 Is there a limit on how many MPI processes can run on a single host?

> Depending on which OMPI release you're using, I think you need something
> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
> 1000+ descriptors.  You're quite possibly up against your limit, though
> I don't know for sure that that's the problem here.
>
> You say you're running 1.2.8.  That's "a while ago", so would you
> consider updating as a first step?  Among other things, newer OMPIs will
> generate a much clearer error message if the descriptor limit is the
> problem.
 While 1.2.8 might be "a while ago", upgrading software just because it's
 "old" is not a valid argument.

 I can install the lastest version of OpenMPI, but it will take a little
 while.
>>> Maybe not because it is "old", but Eugene is correct. The old versions of 
>>> OMPI required more file descriptors than the newer versions.
>>>
>>> That said, you'll still need a minimum of 4x the number of procs on the 
>>> node even with the latest release. I suggest talking to your sys admin 
>>> about getting the limit increased. It sounds like it has been set 
>>> unrealistically low.
>>>
>>>
>> I *am* the system admin! ;)
>>
>> The file descriptor limit is the default for RHEL,  1024, so I would not
>> characterize it as "unrealistically low".  I assume someone with much
>> more knowledge of OS design and administration than me came up with this
>> default, so I'm hesitant to change it without good reason. If there was
>> good reason, I'd have no problem changing it. I have read that setting
>> it to more than 8192 can lead to system instability.
> 
> Never heard that, and most HPC systems have it set a great deal higher 
> without trouble.

I just read that the other day. Not sure where, though. Probably a forum
posting somewhere. I'll take your word for it that it's safe to increase
if necessary.
> 
> However, the choice is yours. If you have a large SMP system, you'll 
> eventually be forced to change it or severely limit its usefulness for MPI. 
> RHEL sets it that low arbitrarily as a way of saving memory by keeping the fd 
> table small, not because the OS can't handle it.
> 
> Anyway, that is the problem. Nothing we (or any MPI) can do about it as the 
> fd's are required for socket-based communications and to forward I/O.

Thanks, Ralph, that's exactly the answer I was looking for - where this
limit was coming from.

I can see how on a large SMP system the fd limit would have to be
increased. In normal circumstances, my cluster nodes should never have
more than 8 MPI processes running at once (per node), so I shouldn't be
hitting that limit on my cluster.

> 
> 
>> This is admittedly unusual situation - in normal use, no one would ever
>> want to run that many processes on a single system - so I don't see any
>> justification for modifying that setting.
>>
>> Yesterday I spoke to the researcher who originally asked me this limit -
>> he just wanted to know what the limit was, and doesn't actually plan to
>> do any "real" work with that many processes on a single node, rendering
>> this whole discussion academic.
>>
>> I did install OpenMPI 1.4.1 yesterday, but I haven't had a chance to
>> test it yet. I'll post the results of testing here.
>>
 I have a user trying to test his code on the command-line on a single
 host before running it on our cluster like so:

 mpirun -np X foo

 When he tries to run it on large number of process (X = 256, 512), the
 program fails, and I can reproduce this with a simple "Hello, World"
 program:

 $ mpirun -np 256 mpihello
 mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
 exited on signal 15 (Terminated).
 252 additional processes aborted (not shown)

 I've done some testing and found that X <155 for this program to work.
 Is this a bug, part of the standard, or design/implementation decision?



>>> One possible issue is the limit on the number of descriptors.  The error
>>> message should be pretty helpful and descriptive, but perhaps you're
>>> using an older version of OMPI.  If this is your problem, one workaround
>>> is something like this:
>>>
>>> unlimit descriptors
>>> mpirun -np 256 mpihello
>>>
>> Looks like I'm not allowed to set that as a regular user:
>>
>> $ ulimit -n 2048
>> -bash: ulimit: open files: cannot modify limit: Operation not permitted
>>
>> Since I am the admin, I could change that elsewhere, but I'd rathe

Re: [OMPI users] Limit to number of processes on one node?

2010-03-04 Thread Ralph Castain

On Mar 4, 2010, at 7:27 AM, Prentice Bisbal wrote:

> 
> 
> Ralph Castain wrote:
>> On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:
>> 
>>> Eugene Loh wrote:
 Prentice Bisbal wrote:
> Eugene Loh wrote:
> 
>> Prentice Bisbal wrote:
>> 
>>> Is there a limit on how many MPI processes can run on a single host?
>>> 
 Depending on which OMPI release you're using, I think you need something
 like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
 1000+ descriptors.  You're quite possibly up against your limit, though
 I don't know for sure that that's the problem here.
 
 You say you're running 1.2.8.  That's "a while ago", so would you
 consider updating as a first step?  Among other things, newer OMPIs will
 generate a much clearer error message if the descriptor limit is the
 problem.
>>> While 1.2.8 might be "a while ago", upgrading software just because it's
>>> "old" is not a valid argument.
>>> 
>>> I can install the lastest version of OpenMPI, but it will take a little
>>> while.
>> 
>> Maybe not because it is "old", but Eugene is correct. The old versions of 
>> OMPI required more file descriptors than the newer versions.
>> 
>> That said, you'll still need a minimum of 4x the number of procs on the node 
>> even with the latest release. I suggest talking to your sys admin about 
>> getting the limit increased. It sounds like it has been set unrealistically 
>> low.
>> 
>> 
> I *am* the system admin! ;)
> 
> The file descriptor limit is the default for RHEL,  1024, so I would not
> characterize it as "unrealistically low".  I assume someone with much
> more knowledge of OS design and administration than me came up with this
> default, so I'm hesitant to change it without good reason. If there was
> good reason, I'd have no problem changing it. I have read that setting
> it to more than 8192 can lead to system instability.

Never heard that, and most HPC systems have it set a great deal higher without 
trouble.

However, the choice is yours. If you have a large SMP system, you'll eventually 
be forced to change it or severely limit its usefulness for MPI. RHEL sets it 
that low arbitrarily as a way of saving memory by keeping the fd table small, 
not because the OS can't handle it.

Anyway, that is the problem. Nothing we (or any MPI) can do about it as the 
fd's are required for socket-based communications and to forward I/O.


> 
> This is admittedly unusual situation - in normal use, no one would ever
> want to run that many processes on a single system - so I don't see any
> justification for modifying that setting.
> 
> Yesterday I spoke to the researcher who originally asked me this limit -
> he just wanted to know what the limit was, and doesn't actually plan to
> do any "real" work with that many processes on a single node, rendering
> this whole discussion academic.
> 
> I did install OpenMPI 1.4.1 yesterday, but I haven't had a chance to
> test it yet. I'll post the results of testing here.
> 
>>> 
>>> I have a user trying to test his code on the command-line on a single
>>> host before running it on our cluster like so:
>>> 
>>> mpirun -np X foo
>>> 
>>> When he tries to run it on large number of process (X = 256, 512), the
>>> program fails, and I can reproduce this with a simple "Hello, World"
>>> program:
>>> 
>>> $ mpirun -np 256 mpihello
>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>>> exited on signal 15 (Terminated).
>>> 252 additional processes aborted (not shown)
>>> 
>>> I've done some testing and found that X <155 for this program to work.
>>> Is this a bug, part of the standard, or design/implementation decision?
>>> 
>>> 
>>> 
>> One possible issue is the limit on the number of descriptors.  The error
>> message should be pretty helpful and descriptive, but perhaps you're
>> using an older version of OMPI.  If this is your problem, one workaround
>> is something like this:
>> 
>> unlimit descriptors
>> mpirun -np 256 mpihello
>> 
> Looks like I'm not allowed to set that as a regular user:
> 
> $ ulimit -n 2048
> -bash: ulimit: open files: cannot modify limit: Operation not permitted
> 
> Since I am the admin, I could change that elsewhere, but I'd rather not
> do that system-wide unless absolutely necessary.
> 
>> though I guess the syntax depends on what shell you're running.  Another
>> is to set the MCA parameter opal_set_max_sys_limits to 1.
>> 
> That didn't work either:
> 
> $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
> exited on signal 15 (Terminated).
> 252 additional processes aborted (not shown)
> 
> 
> -- 
> Prentice Bisbal
> Linux Software Support Specialist/System Administrator
> School of Natural Sciences

Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-04 Thread François Trahay
On Thursday 04 March 2010 01:32:39 Yuanyuan ZHANG wrote:
> Hi guys,
> 
> Thanks for your help, but unfortunately I am still not clear.
> 
> > You are right Dave, FUNNELED allows the application to have multiple
> > threads but only the man thread calls MPI.
> 
> My understanding is that even if I use SINGLE or MPI_Init, I can still
> have multiple threads if I use OpenMP PARALLEL directive, and only
> the main thread makes MPI calls. Am I correct?

Actually if your application asks for MPI_THREAD_SINGLE, it specifies that it 
won't use any thread, so you shouldn't use OpenMP threads. If you asked SINGLE 
and use threads (even if only the main thread calls MPI) the behavior is 
unspecified (we can imagine that some MPI implementation cannot support any 
thread within the process for some reason). 

> 
> > An OpenMP/MPI hybrid program that makes MPI calls only in between
> > parallel sections is usually a FUNNELED user of MPI
> 
> For an OpenMP/MPI hybrid program, if I only want to make MPI calls using
> the main thread, ie., only in between parallel sections, can I just use
> SINGLE or MPI_Init? What's the benefit of FUNNELED?

Asking for the FUNNELED thread-safety level informs the MPI library that your 
application is going to run threads. If you ask for SINGLE, then you say "I 
promise, I won't use any thread". The difference may be unclear from an 
application developer point of view, but it is important from the MPI library 
point of view. 

However, I guess most modern MPI libraries support FUNNELED by default and the 
performance should be the same for FUNNELED and SINGLE


Francois Trahay


Re: [OMPI users] noob warning - problems testing MPI_Comm_spawn

2010-03-04 Thread Shiqing Fan


Hi Damien,

Sorry for late reply, I was trying to dig inside the code and got some 
information.


First of all, in your example, it's not correct to define the MPI_Info 
as an pointer, it will cause the initialization violation at run time. 
The message "LOCAL DAEMON SPAWN IS CURRENTLY UNSUPPORTED" is just a 
warning which won't block the execution. In order to make the 
master-slave work, you have to disable the CCP support, it seems to have 
conflicts with comm_spawn operation, I'm still checking it.


To disable CCP in Open MPI 1.4.1, you have to exclude the source files 
manually, i.e. excluding ccp files in orte/mca/plm/ccp and 
orte/mca/ras/ccp, then also remove the ccp related lines in 
orte/mca/plm/base/static-components.h and 
orte/mca/ras/base/static-components.h. There is an option to do so in 
the trunk version, but not for 1.4.1. Sorry for the inconvenience.


For the "singleton" run with master.exe, it's still not working under 
Windows.



Best Regards,
Shiqing


Damien Hocking wrote:

Hi all,

I'm playing around with MPI_Comm_spawn, trying to do something simple 
with a master-slave example.  I get a LOCAL DAEMON SPAWN IS CURRENTLY 
UNSUPPORTED error when it tries to spawn the slave.  This is on 
Windows, OpenMPI version 1.4.1, r22421.


Here's the master code:

int main(int argc, char* argv[])
{
  int myid, ierr;
  MPI_Comm maincomm;
  ierr = MPI_Init(&argc, &argv);
  ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid);
 
  if (myid == 0)

  {
 std::cout << "\n Hello from the boss " << myid;
 std::cout.flush();
  }
 
  MPI_Info* spawninfo;

  MPI_Info_create(spawninfo);
  MPI_Info_set(*spawninfo, "add-host", "127.0.0.1");
 
  if (myid == 0)

  {
 std::cout << "\n About to MPI_Comm_spawn." << myid;
 std::cout.flush();
  }
  MPI_Comm_spawn("slave.exe", MPI_ARGV_NULL, 1, *spawninfo, 0, 
MPI_COMM_SELF, &maincomm, MPI_ERRCODES_IGNORE);

  if (myid == 0)
  {
 std::cout << "\n MPI_Comm_spawn successful." << myid;
 std::cout.flush();
  }
  ierr = MPI_Finalize();
  return 0;
}

Here's the slave code:

int main(int argc, char* argv[])
{
  int myid, ierr;
 
  MPI_Comm parent;
 
  ierr = MPI_Init(&argc, &argv);

  MPI_Comm_get_parent(&parent);
 
  if (parent == MPI_COMM_NULL)

  {
 std::cout << "\n No parent.";
  }
  ierr = MPI_Comm_rank(MPI_COMM_WORLD, &myid);
 
  std::cout << "\n Hello from a worker " << myid;
  std::cout.flush();
  ierr = MPI_Finalize();
 
  return 0;

}

Also, this only starts up correctly if I kick it off with orterun.  
Ideally I'd like to run it as "master.exe" and have it initialise the 
MPI environment from there.  Can anyone tell me what setup I need to 
do that?

Thanks in advance,

Damien
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
--
Shiqing Fan  http://www.hlrs.de/people/fan
High Performance Computing   Tel.: +49 711 685 87234
 Center Stuttgart (HLRS)Fax.: +49 711 685 65832
Address:Allmandring 30   email: f...@hlrs.de
70569 Stuttgart




Re: [OMPI users] Limit to number of processes on one node?

2010-03-04 Thread Prentice Bisbal


Ralph Castain wrote:
> On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote:
> 
>> Eugene Loh wrote:
>>> Prentice Bisbal wrote:
 Eugene Loh wrote:

> Prentice Bisbal wrote:
>
>> Is there a limit on how many MPI processes can run on a single host?
>>
>>> Depending on which OMPI release you're using, I think you need something
>>> like 4*np up to 7*np (plus a few) descriptors.  So, with 256, you need
>>> 1000+ descriptors.  You're quite possibly up against your limit, though
>>> I don't know for sure that that's the problem here.
>>>
>>> You say you're running 1.2.8.  That's "a while ago", so would you
>>> consider updating as a first step?  Among other things, newer OMPIs will
>>> generate a much clearer error message if the descriptor limit is the
>>> problem.
>> While 1.2.8 might be "a while ago", upgrading software just because it's
>> "old" is not a valid argument.
>>
>> I can install the lastest version of OpenMPI, but it will take a little
>> while.
> 
> Maybe not because it is "old", but Eugene is correct. The old versions of 
> OMPI required more file descriptors than the newer versions.
> 
> That said, you'll still need a minimum of 4x the number of procs on the node 
> even with the latest release. I suggest talking to your sys admin about 
> getting the limit increased. It sounds like it has been set unrealistically 
> low.
> 
> 
I *am* the system admin! ;)

The file descriptor limit is the default for RHEL,  1024, so I would not
characterize it as "unrealistically low".  I assume someone with much
more knowledge of OS design and administration than me came up with this
default, so I'm hesitant to change it without good reason. If there was
good reason, I'd have no problem changing it. I have read that setting
it to more than 8192 can lead to system instability.

This is admittedly unusual situation - in normal use, no one would ever
want to run that many processes on a single system - so I don't see any
justification for modifying that setting.

Yesterday I spoke to the researcher who originally asked me this limit -
he just wanted to know what the limit was, and doesn't actually plan to
do any "real" work with that many processes on a single node, rendering
this whole discussion academic.

I did install OpenMPI 1.4.1 yesterday, but I haven't had a chance to
test it yet. I'll post the results of testing here.

>>
>> I have a user trying to test his code on the command-line on a single
>> host before running it on our cluster like so:
>>
>> mpirun -np X foo
>>
>> When he tries to run it on large number of process (X = 256, 512), the
>> program fails, and I can reproduce this with a simple "Hello, World"
>> program:
>>
>> $ mpirun -np 256 mpihello
>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
>> exited on signal 15 (Terminated).
>> 252 additional processes aborted (not shown)
>>
>> I've done some testing and found that X <155 for this program to work.
>> Is this a bug, part of the standard, or design/implementation decision?
>>
>>
>>
> One possible issue is the limit on the number of descriptors.  The error
> message should be pretty helpful and descriptive, but perhaps you're
> using an older version of OMPI.  If this is your problem, one workaround
> is something like this:
>
> unlimit descriptors
> mpirun -np 256 mpihello
>
 Looks like I'm not allowed to set that as a regular user:

 $ ulimit -n 2048
 -bash: ulimit: open files: cannot modify limit: Operation not permitted

 Since I am the admin, I could change that elsewhere, but I'd rather not
 do that system-wide unless absolutely necessary.

> though I guess the syntax depends on what shell you're running.  Another
> is to set the MCA parameter opal_set_max_sys_limits to 1.
>
 That didn't work either:

 $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello
 mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu
 exited on signal 15 (Terminated).
 252 additional processes aborted (not shown)


-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ


Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneouscluster (32/64 bit machines)

2010-03-04 Thread TRINH Minh Hieu
Hi,

I have some new discovery about this problem :

It seems that the array size sendable from a 32bit to 64bit machines
is proportional to the parameter "btl_tcp_eager_limit"
When I set it to 200 000 000 (2e08 bytes, about 190MB), I can send an
array up to 2e07 double (152MB).

I didn't found much informations about btl_tcp_eager_limit other than
in the "ompi_info --all" command. If I let it at 2e08, will it impacts
the performance of OpenMPI ?

It may be noteworth also that if the master (rank 0) is a 32bit
machines, I don't have segfault. I can send big array with small
"btl_tcp_eager_limit" from a 64bit machine to a 32bit one.

Do I have to move this thread to devel mailing list ?

Regards,

   TMHieu

On Tue, Mar 2, 2010 at 2:54 PM, TRINH Minh Hieu  wrote:
> Hello,
>
> Yes, I compiled OpenMPI with --enable-heterogeneous. More precisely I
> compiled with :
> $ ./configure --prefix=/tmp/openmpi --enable-heterogeneous
> --enable-cxx-exceptions --enable-shared
> --enable-orterun-prefix-by-default
> $ make all install
>
> I attach the output of ompi_info of my 2 machines.
>
>    TMHieu
>
> On Tue, Mar 2, 2010 at 1:57 PM, Jeff Squyres  wrote:
>> Did you configure Open MPI with --enable-heterogeneous?
>>
>> On Feb 28, 2010, at 1:22 PM, TRINH Minh Hieu wrote:
>>
>>> Hello,
>>>
>>> I have some problems running MPI on my heterogeneous cluster. More
>>> precisley i got segmentation fault when sending a large array (about
>>> 1) of double from a i686 machine to a x86_64 machine. It does not
>>> happen with small array. Here is the send/recv code source (complete
>>> source is in attached file) :
>>> code 
>>>     if (me == 0 ) {
>>>         for (int pe=1; pe>>         {
>>>                 printf("Receiving from proc %d : ",pe); fflush(stdout);
>>>             d=(double *)malloc(sizeof(double)*n);
>>>             MPI_Recv(d,n,MPI_DOUBLE,pe,999,MPI_COMM_WORLD,&status);
>>>             printf("OK\n"); fflush(stdout);
>>>         }
>>>         printf("All done.\n");
>>>     }
>>>     else {
>>>       d=(double *)malloc(sizeof(double)*n);
>>>       MPI_Send(d,n,MPI_DOUBLE,0,999,MPI_COMM_WORLD);
>>>     }
>>>  code 
>>>
>>> I got segmentation fault with n=1 but no error with n=1000
>>> I have 2 machines :
>>> sbtn155 : Intel Xeon,         x86_64
>>> sbtn211 : Intel Pentium 4, i686
>>>
>>> The code is compiled in x86_64 and i686 machine, using OpenMPI 1.4.1,
>>> installed in /tmp/openmpi :
>>> [mhtrinh@sbtn211 heterogenous]$ make hetero
>>> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o 
>>> hetero.i686.o
>>> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
>>> hetero.i686.o -o hetero.i686 -lm
>>>
>>> [mhtrinh@sbtn155 heterogenous]$ make hetero
>>> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o 
>>> hetero.x86_64.o
>>> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
>>> hetero.x86_64.o -o hetero.x86_64 -lm
>>>
>>> I run with the code using appfile and got thoses error :
>>> $ cat appfile
>>> --host sbtn155 -np 1 hetero.x86_64
>>> --host sbtn155 -np 1 hetero.x86_64
>>> --host sbtn211 -np 1 hetero.i686
>>>
>>> $ mpirun -hetero --app appfile
>>> Input array length :
>>> 1
>>> Receiving from proc 1 : OK
>>> Receiving from proc 2 : [sbtn155:26386] *** Process received signal ***
>>> [sbtn155:26386] Signal: Segmentation fault (11)
>>> [sbtn155:26386] Signal code: Address not mapped (1)
>>> [sbtn155:26386] Failing at address: 0x200627bd8
>>> [sbtn155:26386] [ 0] /lib64/libpthread.so.0 [0x3fa4e0e540]
>>> [sbtn155:26386] [ 1] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so 
>>> [0x2d8d7908]
>>> [sbtn155:26386] [ 2] /tmp/openmpi/lib/openmpi/mca_btl_tcp.so 
>>> [0x2e2fc6e3]
>>> [sbtn155:26386] [ 3] /tmp/openmpi/lib/libopen-pal.so.0 [0x2afe39db]
>>> [sbtn155:26386] [ 4]
>>> /tmp/openmpi/lib/libopen-pal.so.0(opal_progress+0x9e) [0x2afd8b9e]
>>> [sbtn155:26386] [ 5] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so 
>>> [0x2d8d4b25]
>>> [sbtn155:26386] [ 6] /tmp/openmpi/lib/libmpi.so.0(MPI_Recv+0x13b)
>>> [0x2ab30f9b]
>>> [sbtn155:26386] [ 7] hetero.x86_64(main+0xde) [0x400cbe]
>>> [sbtn155:26386] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3fa421e074]
>>> [sbtn155:26386] [ 9] hetero.x86_64 [0x400b29]
>>> [sbtn155:26386] *** End of error message ***
>>> --
>>> mpirun noticed that process rank 0 with PID 26386 on node sbtn155
>>> exited on signal 11 (Segmentation fault).
>>> --
>>>
>>> Am I missing an option in order to run in heterogenous cluster ?
>>> MPI_Send/Recv have limit array size when using heterogeneous cluster ?
>>> Thanks for your help. Regards
>>>
>>> --
>>> 
>>>    M. TRINH Minh Hieu
>>>    CEA, IBEB, SBTN/LIRM,
>>>    F-30207 Bagnols-sur-Cèze, FRANCE
>>> ==

Re: [OMPI users] checkpointing multi node and multi process applications

2010-03-04 Thread Fernando Lemos
On Wed, Mar 3, 2010 at 10:24 PM, Fernando Lemos  wrote:

> Is there anything I can do to provide more information about this bug?
> E.g. try to compile the code in the SVN trunk? I also have kept the
> snapshots intact, I can tar them up and upload them somewhere in case
> you guys need it. I can also provide the source code to the ring
> program, but it's really the canonical ring MPI example.
>

I tried 1.5 (1.5a1r22754 nightly snapshot, same compilation flags).
This time taking the checkpoint didn't generate any error message:

root@debian1:~# mpirun -am ft-enable-cr -mca btl_tcp_if_include eth1
-np 2 --host debian1,debian2 ring

>>> Process 1 sending 2761 to 0
>>> Process 1 received 2760
>>> Process 1 sending 2760 to 0
root@debian1:~#

But restoring it did:

root@debian1:~# ompi-restart ompi_global_snapshot_23071.ckpt
[debian1:23129] Error: Unable to access the path
[/root/ompi_global_snapshot_23071.ckpt/0/opal_snapshot_1.ckpt]!
--
Error: The filename (opal_snapshot_1.ckpt) is invalid because either
you have not provided a filename
   or provided an invalid filename.
   Please see --help for usage.

--
--
mpirun has exited due to process rank 1 with PID 23129 on
node debian1 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--
root@debian1:~#

Indeed, opal_snapshot_1.ckpt does not exist exist:

root@debian1:~# find ompi_global_snapshot_23071.ckpt/
ompi_global_snapshot_23071.ckpt/
ompi_global_snapshot_23071.ckpt/global_snapshot_meta.data
ompi_global_snapshot_23071.ckpt/restart-appfile
ompi_global_snapshot_23071.ckpt/0
ompi_global_snapshot_23071.ckpt/0/opal_snapshot_0.ckpt
ompi_global_snapshot_23071.ckpt/0/opal_snapshot_0.ckpt/ompi_blcr_context.23073
ompi_global_snapshot_23071.ckpt/0/opal_snapshot_0.ckpt/snapshot_meta.data
root@debian1:~#

It can be found in debian2:

root@debian2:~# find ompi_global_snapshot_23071.ckpt/
ompi_global_snapshot_23071.ckpt/
ompi_global_snapshot_23071.ckpt/0
ompi_global_snapshot_23071.ckpt/0/opal_snapshot_1.ckpt
ompi_global_snapshot_23071.ckpt/0/opal_snapshot_1.ckpt/snapshot_meta.data
ompi_global_snapshot_23071.ckpt/0/opal_snapshot_1.ckpt/ompi_blcr_context.6501
root@debian2:~#

Then I tried supplying a hostfile for ompi-run and it worked just
fine! I thought the checkpoint included the hosts information?

So I think it's fixed in 1.5. Should I try the 1.4 branch in SVN?


Thanks a bunch,